Pasky’s Log

ld.so Scopes

March 10th, 2011 No comments

Recently, I have spent quite a bit of my time debugging an evil ld.so bug involving mis-handling of scopes and I have noticed precious lack of documentation of any internal ld.so data structures. So again, this comes for the benefit of the googlers, an intro that could have saved me another quite bit of time spent poking the code.

Of course, the dynamic linker features a wide variety of fun hacks. The most interesting mechanism is probably how lazy relocation is performed, but things like that have already been described plenty of times before. The question we shall look into is what data structures are used when a new symbol is to be searched for and linker has already taken control. There are two important internal concepts of ld.so related to this – the link_map and the scope. You can see the data structures in include/link.h.

The struct link_map describes a single loaded object; it may be ld.so, the main program, libc, or any other shared object loaded afterwards, during startup or later. It has many members, like its name, its mates in global linked list of all objects, or its state. But the most interesting attribute is its scope.

The scope describes which libraries should be searched for symbol lookups occuring within the scope owner. (By the way, given that lookup scope may differ by caller, implementing dlsym() is not that trivial.) It is further divided into scope elements (struct r_scope_elem) – a single scope element basically describes a single search list of libraries, and the scope (link_map.l_scope is the scope used for symbol lookup) is list of such scope elements.

To reiterate, a symbol lookup scope is a list of lists! Then, when looking up a symbol, the linker walks the lists in the order they are listed in the scope. But what really are the scope elements? There are two usual kinds:

The “global scope” – all libraries (ahem, link_maps) that have been requested to be loaded by the main program (what ldd on the binary file of the main program would print out, plus dlopen()ed stuff).
The “local scope” – DT_NEEDED library dependencies of the current link_map (what ldd on the binary file of the library would print out, plus dlopen()ed stuff).

The global scope is shared between all link_maps (in the current namespace), while the local scope is owned by a particular library. (FIXME) If a library has local scope element in its scope, it adds itself to that scope. E.g. assume libA dlopen()ing libB (with RTLD_LOCAL) – libB will get and own a fresh local scope element, and all libraries loaded by libB will inherit and add themselves to that local scope element.

There are then four common situations:

The main program has only single scope element, the global scope. (At least I would expect so, I have not verified this.)
A library has been loaded with RTLD_LOCAL (the default case). Then its link_map has two scope elements, first comes the global scope, then comes the local scope.
A library has been loaded with RTLD_LOCAL | RTLD_DEEPBIND. In that case, the link_map has again the two scope elements, but the order is switched – the local scope comes first.
A library has been loaded with RTLD_GLOBAL. The link_map lists only the global scope.

(Another concept is namespace; each has its own id and linked list of link_maps, but usually there are just two, one for the ld.so and another for the application. Unless you are calling dlmopen() explicitly or using the LD_AUDIT interface, you can usually assume there is only a single namespace that matters.)

Just for fun – the bug I have been hunting has been caused by ld.so not handling local scopes quite properly. Normally, when unloading the library opened with RTLD_LOCAL, all its local scope members would be unloaded too. However, such a member could be flagged as RTLD_NODELETE, and in that case, it would stay around. The problem is, the code did not expect that and would remove the local scope owner and the local scope would go along with it. This means the nodelete library dependencies would disappear from its local scope and the next time it got called (e.g. within its static destructor), trying to resolve such a symbol would cause a “unresolved symbol” fatal error.

Categories: linux, software Tags: glibc, ld.so, suse

pulseaudio – quick’n’dirty playback over the network

May 9th, 2010 2 comments

The joyful lives of many Linux desktop users are clouded by many packages and frameworks that are well-intentioned and try to solve real and painful problems, but which are immature, not designed in the UNIX spirit, poorly documented and most importantly, do not really have a working implementation. Oh well. I have taken the stance of patience and ascetic acceptance of the new burdens – instead of trying to purge my systems of all that is unholy and evil, I spend that time trying to debug and fix up the problems they incur (often in vain). Sometimes I even file bugs, but that can be rather… unrewarding experience – more about that at another time.

So, I have two OpenSUSE 11.2 machines – a notebook with GNOME and a workstation with KDE 4.2 and some real speakers. My notebook uses PulseAudio semi-automagically, but after many perpetual problems with pulseaudio, phonon, java and flash, I really gave up on the workstation and turned pulseaudio off. However, it is desirable to coil up in the bed and watch a movie on the notebook while NOT listening to the notebook speakers. So I want to play sound over the network, notebook to workstation.

What to do on the notebook:

$ echo default-server = $IP_of_workstation >>/etc/pulse/client.conf
$ mplayer -ao pulse ...

That was easy. Your girlfriend watches you type along over your head.

What to do on the workstation? Surely that will also be piece of cake!

$ echo load-module module-native-protocol-tcp auth-anonymous=1 >>/etc/pulse/default.pa
$ pulseaudio -v

Ok. Try to fire up mplayer and… it’s all silent! You stare at the log for a bit, then you see it:

I: sink-input.c: Created input 0 "audio stream" on alsa_output.pci-0000_01_00.1.hdmi-stereo with sample spec s16le 2ch 44100Hz and channel map front-left,front-right

But, that’s wrong! There is nothing hooked up on the HDMI! The speakers are analog. Why is it playing over the HDMI? You click around a bit, google around a bit, nothing comes up. Your girlfriend stirs impatiently.

You go through the log, see that pulseaudio first sees the HDMI sink, then the analog sink. Hm. You find the set-default-sink command somewhere and do

$ echo set-default-sink alsa_output.pci-0000_00_1b.0.analog-stereo >>/etc/pulse/default.pa

Pulseaudio restart. Nice red message:

E: main.c: Sink alsa_output.pci-0000_00_1b.0.analog-stereo does not exist

Aha! Pulseaudio sees HDMI right away, sets it up, _then_ finishes initializing and prints this error, and only about two seconds later it goes all “oh, look, there’s another card plugged in here!”. What the heck?

At this point, you either give up or try to google around again madly. After 10 minutes, while your girlfriend is browsing DeviantArt bored, sound finally comes from the speakers after you figure out to issue

pacmd  'set-default-sink alsa_output.pci-0000_00_1b.0.analog-stereo'

while having a running pulseaudio.

You will probably need to have a dbus connection to your pulseaudio if you want to do this. If you are setting up the workstation remotely, you need to either create your own dbus session or hook up to a running one if you are logged in physically as well. This is a very simple, user friendly step:

$ ps axu
...
chidori   3036  0.0  0.4 151192 16564 ?        S    09:40   0:00 kdeinit4: kded4 [kdeinit]
...
$ cat /proc/3036/environ | tr '\0' '\n' | grep DBUS
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-wYzWEttyro,guid=28e3a1c77f077a230071a5974be666db
$ export `cat /proc/3036/environ | tr '\0' '\n' | grep DBUS`

Yay, what a nice, user-friendly, easy to set up piece of software we have here.

(BTW, the movie stutters every two minutes or so anyway; another time I feel shiny and optimistic, I will try to figure out if using some compression for the network audio is possible nowadays.)

Categories: linux Tags: audio, dbus, networking, pulseaudio, streaming, suse

Benchmarking string functions

November 6th, 2009 1 comment

Just in case someone will ever need to benchmark glibc string routines, I hacked together a simple framework for that, strbench.

In SUSE, we carry some ancient AMD-provided patches that replace strlen(), memcmp(), strcmp() and strncmp() on x86_64 with different implementation, in the last glibc update to 2.11 I have hoped to get rid of the AMD patch finally, but the benchmark have shown that in fact glibc-2.11 has quite massive performance regression here…

Categories: linux, software Tags: c, glibc, shell, suse

Make your glibc do Blowfish

September 7th, 2009 No comments

Since long long ago, SUSE glibc supports blowfish crypt() extension – just start your crypts with $2a$ etc. and crypt() will fry them using Blowfish. We base this functionality on a rather ancient OWL patch. I wonder if anyone actually makes use of this feature. ;-)

The trouble is, the OWL patch is pretty dirty and introduces its own wrapper crypt() that proxies between glibc’s MD5/DES crypt() and its Blowfish backend. And it has a lot of extra functionality noone cannot use since the appropriate symbols aren’t actually exported anymore. The patch is based on glibc-2.3.x, I assume back then exporting them worked differently.

However, glibc-2.7 got support for SHA256/SHA512 and with it more flexible crypt() implementation, making it quite easy to plug in more crypt() methods. The trouble is, we didn’t upgrade our Blowfish patch, so SHA256/SHA512 was actually blocked-out by the wrapper. Jan Engelhardt pointed out the problem, so I reworked the original OWL patch to take advantage of the new infrastructure (but keeping crypt_blowfish.c intact up to turning off BF_ASM).

So, if you want to teach your glibc Blowfish hashing, feel free to use http://pasky.or.cz/~pasky/dev/glibc/crypt_blowfish-1.0-suse.diff :-)

Update: Dmitry V. Levin ported the complete old patch to glibc-2.10.1. I will not make use of this for SUSE since I think wrapper.c is rather ugly hack which is not properly integrated to the infrastructure, and it retains potential for future maintenance problems; I don’t see why shouldn’t the new API rather integrate into the existing code instead of wrapping around it. The new API is required for tcb but there is no other support for it in SUSE anyway (and noone missed the API for many, many years).

Categories: linux, software Tags: blowfish, glibc, owl, patch, suse

glibc/pb-stable.git

May 22nd, 2009 No comments

glibc is kept in git now, which makes following it much more convenient. Also, it’s much more practical trying to track bugfix commits with git. glibc releases are frequently in fairly rough state, so it’s worthwhile for me when putting glibc in distribution to accumulate further bugfixes committed shortly after the release.

So, with an idea that other system integrators might find this useful as well, I created a small fork of glibc last night – pb-stable. Its master branch is the same as in glibc.git, but in addition it has a glibc-2.10-branch with many cherry-picked bugfixes committed on master since the release.

I intend to maintain this branch long-term at least for openSUSE usage (purely as cherry-picks except when a bug will need to be fixed that’s not applicable for master for some reason anymore), others are welcome to use it as well. I hope it gets upstream as well, I have sent a pull request… Well, we shall see, I have no expectations.

Categories: linux, software Tags: glibc, suse

Pasky’s Log

Archive

ld.so Scopes

pulseaudio – quick’n’dirty playback over the network

Benchmarking string functions

Make your glibc do Blowfish

glibc/pb-stable.git

Recent Comments

Categories

Blogroll

Licence