Pasky’s Log

I use 6to4 – why are my applications still preferring IPv4?

July 4th, 2011 1 comment

I found out about this curious behavior almost a month ago during the World IPv6 Day. I was surprised about this, even though I really shouldn’t be, given that I was fixing some bugs in the glibc implementation of this mechanism only few months earlier. ;-)

If you are not bothering with tunnel brokers anymore and are using 6to4 for your IPv6 connectivity like me, you might have noticed that your applications still prefer IPv4, disappontingly. You can use getent ahosts www.brmlab.cz (or a different host) to see the list of addresses in the order your applications will most likely try to connect by default.

The key mechanism in play here is the RFC3484 getaddrinfo(3) address selection mechanism; on GNU/Linux system, it is described (and configurable) in /etc/gai.conf. The aim of the mechanism is to choose the most suitable pair of source and destination addresses; this is the place where we can choose whether to prefer IPv4 or IPv6, that if we can talk to localhost, we should do it that way, or to talk to link-local addresses using link-local addresses too.

When choosing a destination address, each is marked by a label and preference. First, if there is a destination address with the same label as its “best” source address, such addresses are preferred. From these candidates, the address with the highest preference is picked.

You can read up on the full details in the RFC. In a sense, the label differentiates between multiple transports; IPv4, normal IPv6 (2001::/16, or rather ::/0 minus a lot of exceptions), 6to4, link-local and localhost are all such separate transports. This mechanism for example makes sure that IPv4 is preferred to normal IPv6 in case we have IPv4 address, but only link-local IPv6 available. And the important point is that 6to4 is differentiated. If a system has both normal IPv6 and 6to4 configured, normal IPv6 is used for normal IPv6 destinations while 6to4 is used for 6to4 destinations. The side effect is that if IPv4 and 6to4 addresses are available, IPv4 will be preferred to IPv6 destinations.

I’m not sure about the exact motivation for this, but it does make sense. It reduces the load on the relay servers that route between 6to4 realm and native IPv6 internet; if 6to4 addresses talk to each other, they connect over IPv4 directly, without need for relay servers. Also, sometimes the relay servers can be topologically far away on the IPv4 internet, slowing down IPv6 communication. And while IPv6 is cool, since your traffic is going over IPv4 part of the way anyway (to the nearest 6to4 relay), it makes no sense to artificially switch to IPv6 for the rest of the trip if you can just use IPv4 all the way.

But, if you have no native IPv6 and want to prefer 6to4 to IPv4 communication – since IPv6 is cool – you can tweak your /etc/gai.conf:

#label ::/0          1
#label 2002::/16     2

Just uncomment the 2002::/16 line and change its label from 2 to 1. Then it will have the same label as the “normal” IPv6 internet. Its behavior will be suboptimal in some cases and you shouldn’t deploy this thoughtlessly, but if you just do this on your personal workstation, it is a way to get the warm “I’m using IPv6 – somewhat” feeling.

Categories: linux Tags: glibc, ipv6

ld.so Scopes

March 10th, 2011 No comments

Recently, I have spent quite a bit of my time debugging an evil ld.so bug involving mis-handling of scopes and I have noticed precious lack of documentation of any internal ld.so data structures. So again, this comes for the benefit of the googlers, an intro that could have saved me another quite bit of time spent poking the code.

Of course, the dynamic linker features a wide variety of fun hacks. The most interesting mechanism is probably how lazy relocation is performed, but things like that have already been described plenty of times before. The question we shall look into is what data structures are used when a new symbol is to be searched for and linker has already taken control. There are two important internal concepts of ld.so related to this – the link_map and the scope. You can see the data structures in include/link.h.

The struct link_map describes a single loaded object; it may be ld.so, the main program, libc, or any other shared object loaded afterwards, during startup or later. It has many members, like its name, its mates in global linked list of all objects, or its state. But the most interesting attribute is its scope.

The scope describes which libraries should be searched for symbol lookups occuring within the scope owner. (By the way, given that lookup scope may differ by caller, implementing dlsym() is not that trivial.) It is further divided into scope elements (struct r_scope_elem) – a single scope element basically describes a single search list of libraries, and the scope (link_map.l_scope is the scope used for symbol lookup) is list of such scope elements.

To reiterate, a symbol lookup scope is a list of lists! Then, when looking up a symbol, the linker walks the lists in the order they are listed in the scope. But what really are the scope elements? There are two usual kinds:

The “global scope” – all libraries (ahem, link_maps) that have been requested to be loaded by the main program (what ldd on the binary file of the main program would print out, plus dlopen()ed stuff).
The “local scope” – DT_NEEDED library dependencies of the current link_map (what ldd on the binary file of the library would print out, plus dlopen()ed stuff).

The global scope is shared between all link_maps (in the current namespace), while the local scope is owned by a particular library. (FIXME) If a library has local scope element in its scope, it adds itself to that scope. E.g. assume libA dlopen()ing libB (with RTLD_LOCAL) – libB will get and own a fresh local scope element, and all libraries loaded by libB will inherit and add themselves to that local scope element.

There are then four common situations:

The main program has only single scope element, the global scope. (At least I would expect so, I have not verified this.)
A library has been loaded with RTLD_LOCAL (the default case). Then its link_map has two scope elements, first comes the global scope, then comes the local scope.
A library has been loaded with RTLD_LOCAL | RTLD_DEEPBIND. In that case, the link_map has again the two scope elements, but the order is switched – the local scope comes first.
A library has been loaded with RTLD_GLOBAL. The link_map lists only the global scope.

(Another concept is namespace; each has its own id and linked list of link_maps, but usually there are just two, one for the ld.so and another for the application. Unless you are calling dlmopen() explicitly or using the LD_AUDIT interface, you can usually assume there is only a single namespace that matters.)

Just for fun – the bug I have been hunting has been caused by ld.so not handling local scopes quite properly. Normally, when unloading the library opened with RTLD_LOCAL, all its local scope members would be unloaded too. However, such a member could be flagged as RTLD_NODELETE, and in that case, it would stay around. The problem is, the code did not expect that and would remove the local scope owner and the local scope would go along with it. This means the nodelete library dependencies would disappear from its local scope and the next time it got called (e.g. within its static destructor), trying to resolve such a symbol would cause a “unresolved symbol” fatal error.

Categories: linux, software Tags: glibc, ld.so, suse

gethostbyname3 & gethostbyname4

April 22nd, 2010 1 comment

I was queried recently about the semi-mysterious and undocumented gethostbyname3_r() and gethostbyname4_r() functions within glibc, and noticed that there’s little googleable available on the topic.

A quick summary: These functions are getaddrinfo() backends – they are not available as public functions for users (though you can do similar tricks as nscd does if you really want to get at them) and they are further extensions of the gethostbyname2_r() function. If you want to use their functionality, just use appropriate getaddrinfo() incanations.
The functions are provided by the appropriate NSS backends and not all backends might provide them.

gethostbyname3_r() also retrieves TTL and canonical hostname (another hidden interface getcanonname_r() also does that) information for the host in single step.

gethostbyname4_r() is much more interesting. First, it does not return hostents but a more extended gaih_addrtuple format (see <nss.h>) that is more akin to struct addrinfo. Even more importantly, the lookup is not limited to a particular address family but returns information for all available AFs.

This means that AF_UNSPEC lookup does not need to call gethostbyname3_r() for AF_INET and then AF_INET6, but the gethostbyname4_r() NSS backend can do the lookup for both at once – in case of /etc/hosts the file is scanned once while in case of DNS both DNS requests are dispatched in parallel instead of in sequence. (…which can lead to infamous strange problems with some cheap DSL routers that will just ignore one of the queries and all queries will start to hang – this is why Debian has (had?) disabled gethostbyname4_r().)

In case you need to use the functions for some reason, glibc:nscd/ai_cache.c is simpler usage example than getaddrinfo() source.

Categories: linux, software Tags: dns, glibc, nss

Benchmarking string functions

November 6th, 2009 1 comment

Just in case someone will ever need to benchmark glibc string routines, I hacked together a simple framework for that, strbench.

In SUSE, we carry some ancient AMD-provided patches that replace strlen(), memcmp(), strcmp() and strncmp() on x86_64 with different implementation, in the last glibc update to 2.11 I have hoped to get rid of the AMD patch finally, but the benchmark have shown that in fact glibc-2.11 has quite massive performance regression here…

Categories: linux, software Tags: c, glibc, shell, suse

Make your glibc do Blowfish

September 7th, 2009 No comments

Since long long ago, SUSE glibc supports blowfish crypt() extension – just start your crypts with $2a$ etc. and crypt() will fry them using Blowfish. We base this functionality on a rather ancient OWL patch. I wonder if anyone actually makes use of this feature. ;-)

The trouble is, the OWL patch is pretty dirty and introduces its own wrapper crypt() that proxies between glibc’s MD5/DES crypt() and its Blowfish backend. And it has a lot of extra functionality noone cannot use since the appropriate symbols aren’t actually exported anymore. The patch is based on glibc-2.3.x, I assume back then exporting them worked differently.

However, glibc-2.7 got support for SHA256/SHA512 and with it more flexible crypt() implementation, making it quite easy to plug in more crypt() methods. The trouble is, we didn’t upgrade our Blowfish patch, so SHA256/SHA512 was actually blocked-out by the wrapper. Jan Engelhardt pointed out the problem, so I reworked the original OWL patch to take advantage of the new infrastructure (but keeping crypt_blowfish.c intact up to turning off BF_ASM).

So, if you want to teach your glibc Blowfish hashing, feel free to use http://pasky.or.cz/~pasky/dev/glibc/crypt_blowfish-1.0-suse.diff :-)

Update: Dmitry V. Levin ported the complete old patch to glibc-2.10.1. I will not make use of this for SUSE since I think wrapper.c is rather ugly hack which is not properly integrated to the infrastructure, and it retains potential for future maintenance problems; I don’t see why shouldn’t the new API rather integrate into the existing code instead of wrapping around it. The new API is required for tcb but there is no other support for it in SUSE anyway (and noone missed the API for many, many years).

Categories: linux, software Tags: blowfish, glibc, owl, patch, suse

glibc development philosophy

May 22nd, 2009 No comments

I just accidentally stumbled over an old blog post of Ulrich about minorities [in software project community] – not that I would concur on many of the points, but it gives a good insight to some reasoning behind Ulrich’s decisions.

Categories: linux, software Tags: glibc

glibc/pb-stable.git

May 22nd, 2009 No comments

glibc is kept in git now, which makes following it much more convenient. Also, it’s much more practical trying to track bugfix commits with git. glibc releases are frequently in fairly rough state, so it’s worthwhile for me when putting glibc in distribution to accumulate further bugfixes committed shortly after the release.

So, with an idea that other system integrators might find this useful as well, I created a small fork of glibc last night – pb-stable. Its master branch is the same as in glibc.git, but in addition it has a glibc-2.10-branch with many cherry-picked bugfixes committed on master since the release.

I intend to maintain this branch long-term at least for openSUSE usage (purely as cherry-picks except when a bug will need to be fixed that’s not applicable for master for some reason anymore), others are welcome to use it as well. I hope it gets upstream as well, I have sent a pull request… Well, we shall see, I have no expectations.

Categories: linux, software Tags: glibc, suse

Archive

I use 6to4 – why are my applications still preferring IPv4?

ld.so Scopes

gethostbyname3 & gethostbyname4

Benchmarking string functions

Make your glibc do Blowfish

glibc development philosophy

glibc/pb-stable.git

Recent Comments

Categories

Blogroll

Licence