Archive

Posts Tagged ‘repo.or.cz’

memtester and Virtual->Physical Address Translation

May 13th, 2010 5 comments

repo.or.cz server started having trouble with randomly corrupted repositories a while ago; short-time memtests were showing nothing and I was reluctant to take it offline for many days. So I found out the neat memtester tool and fired it up.

Sure enough, in some time, a bitflip error popped up – several times on the same memory offset:


Block Sequential : testing 12FAILURE: 0xc0c0c0c0c0c0c0c != 0xc0c0c0c0c0c0c0e at offset 0x0739610a.

Okay! That might hint on a bad memory cell in a DIMM. But which DIMM? Weelll…

We have to figure out which virtual address does the offset correspond with. Then, we have to figure out which physical address would that be. Finally, we have to guess the appropriate DIMM. We will make a lot of assumptions along the way – generally, the mapping can change all the time, pages may be swapped out, etc. – but memtester keeps the single mmap()ed region mlock()ed all the time, the architecture is regular i7, etc. And we don’t have to be 100% sure about the result.

First, keep the memtester running, do not restart it! Let’s assume its pid is 25773. First, we need to look at its memory maps:

# cat /proc/25773/maps 
00400000-00403000 r-xp 00000000 09:00 19511804                    /usr/bin/memtester
00602000-00603000 rw-p 00002000 09:00 19511804                   /usr/bin/memtester
7fea279c9000-7fea279ca000 rw-p 7fea279c9000 00:00 0 
7fea279ca000-7feb601ca000 rw-p 7fea279ca000 00:00 0 
7feb601ca000-7feb60314000 r-xp 00000000 09:00 28713328    /lib/libc-2.7.so
...

We can see pretty much immediately that the 7fea279ca000-7feb601ca000 map is the memory region of the main testing buffer – it’s just huge!

# echo $((0x7fea279ca000-0x7feb601ca000))
-5242880000

Splendid – 5GiB, just as much as we told memtester to check. Now, what is the virtual memory address of the fault? memtester grabs the buffer, splits it in two halves, then fills both halves with identical stuff and then goes through them, comparing. The second address contained the faulty bit set, so the printed offset is within the second buffer; its start is at (0x7feb601ca000-0x7fea279ca000)/2 + 0x7fea279ca000 = 0x7feac3dca000, add up the offset 0x0739610a… but beware! The offset is in ulongs, which means we need to multiple it by 8. and we get 0x7feacb16010a.

The 0x7feacb16010a should be the faulty address!

*** EDIT *** – this turns out to be wrong! There are two reasons:

  1. The offset is actually in multiples of sizeof(unsigned long), which is 8 on 64-bit archs; multiply the number by 8.
  2. There is some other problem – some slight shift. In my experiments, 0x7f1bd4f71a40 would be the real address but the computed one came out as 0x7f1bd4f72238 – not sure what causes that.

Therefore, the best solution is to apt-source memtester and tweak tests.c to print out also the actual pointers of the fault.

*** END EDIT *** – the rest should work as described again.

Ok. How to get the physical address? New Linux kernels have a nifty invention – /proc/.../pagemap – that provides access to per-page mapping information for the whole process virtual space; see Documentation/vm/pagemap.txt for details. Unfortunately, accessing it is not so simple, but I hacked together a simple Perl script:

#!/usr/bin/perl
# (c) Petr Baudis 2010 <pasky@suse.cz>
# Public domain.
# This won't work on 32-bit systems, sorry.
 
use warnings;
use strict;
use POSIX;
 
our ($pid, $vaddr);
 
($pid, $vaddr) = @ARGV;
 
open my $pm, "/proc/$pid/pagemap" or die "pagemap: $!";
binmode $pm;
 
my $pagesize = POSIX::sysconf(&POSIX::_SC_PAGESIZE);
my $ofs = int((hex $vaddr) / $pagesize) * 8;
seek $pm, $ofs, 0 or die "seek $vaddr ($pagesize * $ofs): $!";
 
read $pm, my $b, 8 or die "read $vaddr ($pagesize * $ofs): $!";
my $n = unpack "q", $b;
 
# Bits 0-54  page frame number (PFN) if present
# Bits 0-4   swap type if swapped
# Bits 5-54  swap offset if swapped
# Bits 55-60 page shift (page size = 1<<page shift)
# Bit  61    reserved for future use
# Bit  62    page swapped
# Bit  63    page present
 
my $page_present = ! ! ($n & (1 << 63));
my $page_swapped = ! ! ($n & (1 << 62));
my $page_size = 1 << (($n & ((1 << 61) - 1)) >> 55);
 
if (!$page_present and !$page_swapped) {
        printf "[%s: %d * %d] %x: not present\n", $vaddr, $pagesize, $ofs, $n;
        exit;
}
 
if (!$page_swapped) {
        my $pfn = ($n & ((1 << 55) - 1));
        printf "[%s: %d * %d] %x: present %d, size %d, PFN %x\n", $vaddr, $pagesize, $ofs, $n, $page_present, $page_size, $pfn;
} else {
        my $swapofs = (($n & ((1 << 55) - 1)) >> 5);
        my $swaptype = ($n & ((1 << 5) - 1));
        printf "[%s: %d * %d] %x: present %d, size %d, swap type %x, swap offset %x\n", $vaddr, $pagesize, $ofs, $n, $page_present, $page_size, $swaptype, $swapofs;
}

Fire this up, and you should see something like:

# perl ~pasky/pagemaplist.pl 25773 0x7feacb16010a
Hexadecimal number > 0xffffffff non-portable at /home/pasky/pagemaplist.pl line 18.
[0x7feacb16010a: 4096 * 274700012288] 860000000002adf7: present 1, size 4096, PFN 2adf7

PFN stands for Page Frame Number. To get physical address on a PC
from this, just multiply it by page size. The physical address should be 0x2adf7000.

So, which DIMM do we have to replace? This is the most problematic stpe, it does not seem that the mapping would be available anywhere.
Let’s look at the physical mappings available in total:

# dmesg | head -n 20
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.26-2-amd64 (Debian 2.6.26-21lenny4) (dannf@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Tue Mar 9 22:29:32 UTC 2010
[    0.000000] Command line: root=/dev/md0 ro quiet
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bf780000 (usable)
[    0.000000]  BIOS-e820: 00000000bf78e000 - 00000000bf790000 type 9
[    0.000000]  BIOS-e820: 00000000bf790000 - 00000000bf79e000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bf7d0000 - 00000000bf7e0000 (reserved)
[    0.000000]  BIOS-e820: 00000000bf7ec000 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000200000000 (usable)
[    0.000000] Entering add_active_range(0, 0, 159) 0 entries of 3200 used
...

There is 7GB available on the machine and that can be easily found to correspond to the 0000000000100000 - 00000000bf780000 3GiB and 0000000100000000 - 0000000200000000 4GiB ranges. Our physical address is very low in the range. I guess the best we could do is assume that the DIMMs provide mappings in the same order they are in the slot, and replace the first one…

Or, you could use dmidecode, but only if your BIOS is not broken like mine and actually reports the start/stop addresses. :(

[ANNOUNCE] Girocco hosting infrastructure v1.0

November 14th, 2009 No comments

I would like to announce Girocco-1.0, the first stable release of a universal Git project hosting infrastructure. You can get it at

http://repo.or.cz/w/girocco.git

You guessed right, Girocco is the software repo.or.cz runs at; however, compared to the past, it’s much cleaned up, cleanly packaged for easy re-deployment and fully configurable, thanks to sponsorship of Novartis and Novell. (Apologies for switching repo.or.cz to it and
releasing it one year later than it should’ve been done.)

Thus, you should be able to easily deploy a local Girocco instance at your site and configure it to do only what you want. Girocco allows both full push-based project hosting, or just mirroring and showing existing projects (either local or remote) – i.e. you can also use it just to collect scattered repositories of your developers and present them all at single place. The push-based hosting offers currently two models, chroot hosting and hooks-based permissions (for trusted environments) – it should be easy to add other models.

This way, Girocco might present an interesting alternative to Gitorious for people who prefer mirroring over full project hosting, prefer then rawer gitweb interface ;-), like the repo.or.cz simple forking model or want to make use of the GitHub-like flexible commit notifications mechanism.

(If you are actually going to deploy Girocco somewhere, it’s good to tell me so that I can take it into account to e.g. provide upgrade paths for future Girocco changes.)

If you are used to repo.or.cz, what’s new in Girocco at the user’s end?

  • Friendlier homepage.
  • Friendlier project/user management interface.
  • Friendlier mirror cloning process (you can see the progress in your browser).
  • Support for automatic Git mirroring of SVN repositories. (Only very simple, stdlayout only.)
  • Support for new-commit notifications, both for push and mirror project modes:
    • Get a notification to specified mail address(es)
    • Get a notification by receiving a POST HTTP request with GitHub-compatible JSON payload
    • Have repo.or.cz send commit notifications to the CIA service
  • Much easier to contribute a change to Girocco if you are missing any feature on repo.or.cz!

Enjoy! (Original announcement.)

Categories: linux, software Tags: , ,