Posts Tagged ‘memtester’

memtester and Virtual->Physical Address Translation

May 13th, 2010 5 comments server started having trouble with randomly corrupted repositories a while ago; short-time memtests were showing nothing and I was reluctant to take it offline for many days. So I found out the neat memtester tool and fired it up.

Sure enough, in some time, a bitflip error popped up – several times on the same memory offset:

Block Sequential : testing 12FAILURE: 0xc0c0c0c0c0c0c0c != 0xc0c0c0c0c0c0c0e at offset 0x0739610a.

Okay! That might hint on a bad memory cell in a DIMM. But which DIMM? Weelll…

We have to figure out which virtual address does the offset correspond with. Then, we have to figure out which physical address would that be. Finally, we have to guess the appropriate DIMM. We will make a lot of assumptions along the way – generally, the mapping can change all the time, pages may be swapped out, etc. – but memtester keeps the single mmap()ed region mlock()ed all the time, the architecture is regular i7, etc. And we don’t have to be 100% sure about the result.

First, keep the memtester running, do not restart it! Let’s assume its pid is 25773. First, we need to look at its memory maps:

# cat /proc/25773/maps 
00400000-00403000 r-xp 00000000 09:00 19511804                    /usr/bin/memtester
00602000-00603000 rw-p 00002000 09:00 19511804                   /usr/bin/memtester
7fea279c9000-7fea279ca000 rw-p 7fea279c9000 00:00 0 
7fea279ca000-7feb601ca000 rw-p 7fea279ca000 00:00 0 
7feb601ca000-7feb60314000 r-xp 00000000 09:00 28713328    /lib/

We can see pretty much immediately that the 7fea279ca000-7feb601ca000 map is the memory region of the main testing buffer – it’s just huge!

# echo $((0x7fea279ca000-0x7feb601ca000))

Splendid – 5GiB, just as much as we told memtester to check. Now, what is the virtual memory address of the fault? memtester grabs the buffer, splits it in two halves, then fills both halves with identical stuff and then goes through them, comparing. The second address contained the faulty bit set, so the printed offset is within the second buffer; its start is at (0x7feb601ca000-0x7fea279ca000)/2 + 0x7fea279ca000 = 0x7feac3dca000, add up the offset 0x0739610a… but beware! The offset is in ulongs, which means we need to multiple it by 8. and we get 0x7feacb16010a.

The 0x7feacb16010a should be the faulty address!

*** EDIT *** – this turns out to be wrong! There are two reasons:

  1. The offset is actually in multiples of sizeof(unsigned long), which is 8 on 64-bit archs; multiply the number by 8.
  2. There is some other problem – some slight shift. In my experiments, 0x7f1bd4f71a40 would be the real address but the computed one came out as 0x7f1bd4f72238 – not sure what causes that.

Therefore, the best solution is to apt-source memtester and tweak tests.c to print out also the actual pointers of the fault.

*** END EDIT *** – the rest should work as described again.

Ok. How to get the physical address? New Linux kernels have a nifty invention – /proc/.../pagemap – that provides access to per-page mapping information for the whole process virtual space; see Documentation/vm/pagemap.txt for details. Unfortunately, accessing it is not so simple, but I hacked together a simple Perl script:

# (c) Petr Baudis 2010 <>
# Public domain.
# This won't work on 32-bit systems, sorry.
use warnings;
use strict;
use POSIX;
our ($pid, $vaddr);
($pid, $vaddr) = @ARGV;
open my $pm, "/proc/$pid/pagemap" or die "pagemap: $!";
binmode $pm;
my $pagesize = POSIX::sysconf(&POSIX::_SC_PAGESIZE);
my $ofs = int((hex $vaddr) / $pagesize) * 8;
seek $pm, $ofs, 0 or die "seek $vaddr ($pagesize * $ofs): $!";
read $pm, my $b, 8 or die "read $vaddr ($pagesize * $ofs): $!";
my $n = unpack "q", $b;
# Bits 0-54  page frame number (PFN) if present
# Bits 0-4   swap type if swapped
# Bits 5-54  swap offset if swapped
# Bits 55-60 page shift (page size = 1<<page shift)
# Bit  61    reserved for future use
# Bit  62    page swapped
# Bit  63    page present
my $page_present = ! ! ($n & (1 << 63));
my $page_swapped = ! ! ($n & (1 << 62));
my $page_size = 1 << (($n & ((1 << 61) - 1)) >> 55);
if (!$page_present and !$page_swapped) {
        printf "[%s: %d * %d] %x: not present\n", $vaddr, $pagesize, $ofs, $n;
if (!$page_swapped) {
        my $pfn = ($n & ((1 << 55) - 1));
        printf "[%s: %d * %d] %x: present %d, size %d, PFN %x\n", $vaddr, $pagesize, $ofs, $n, $page_present, $page_size, $pfn;
} else {
        my $swapofs = (($n & ((1 << 55) - 1)) >> 5);
        my $swaptype = ($n & ((1 << 5) - 1));
        printf "[%s: %d * %d] %x: present %d, size %d, swap type %x, swap offset %x\n", $vaddr, $pagesize, $ofs, $n, $page_present, $page_size, $swaptype, $swapofs;

Fire this up, and you should see something like:

# perl ~pasky/ 25773 0x7feacb16010a
Hexadecimal number > 0xffffffff non-portable at /home/pasky/ line 18.
[0x7feacb16010a: 4096 * 274700012288] 860000000002adf7: present 1, size 4096, PFN 2adf7

PFN stands for Page Frame Number. To get physical address on a PC
from this, just multiply it by page size. The physical address should be 0x2adf7000.

So, which DIMM do we have to replace? This is the most problematic stpe, it does not seem that the mapping would be available anywhere.
Let’s look at the physical mappings available in total:

# dmesg | head -n 20
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 2.6.26-2-amd64 (Debian 2.6.26-21lenny4) ( (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Tue Mar 9 22:29:32 UTC 2010
[    0.000000] Command line: root=/dev/md0 ro quiet
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[    0.000000]  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[    0.000000]  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000bf780000 (usable)
[    0.000000]  BIOS-e820: 00000000bf78e000 - 00000000bf790000 type 9
[    0.000000]  BIOS-e820: 00000000bf790000 - 00000000bf79e000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000bf79e000 - 00000000bf7d0000 (ACPI NVS)
[    0.000000]  BIOS-e820: 00000000bf7d0000 - 00000000bf7e0000 (reserved)
[    0.000000]  BIOS-e820: 00000000bf7ec000 - 00000000c0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  BIOS-e820: 00000000ffc00000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 0000000200000000 (usable)
[    0.000000] Entering add_active_range(0, 0, 159) 0 entries of 3200 used

There is 7GB available on the machine and that can be easily found to correspond to the 0000000000100000 - 00000000bf780000 3GiB and 0000000100000000 - 0000000200000000 4GiB ranges. Our physical address is very low in the range. I guess the best we could do is assume that the DIMMs provide mappings in the same order they are in the slot, and replace the first one…

Or, you could use dmidecode, but only if your BIOS is not broken like mine and actually reports the start/stop addresses. :(