Realtime Signal Analysis in Perl

September 24th, 2011 No comments

About a month ago, we were working on the Fluffy Ball project – a computer input device that can react to fondling and punching. Thanks to a nice idea on the brmlab mailing list, we use a microphone and process the noise coming from the ball’s scratchy stuffing and an embedded jingle. The sounds from the outside are almost entirely dampened by the stuffing and for a human, the noise of fondling and punching is easily distinguishable.

Frequency spectrum, for our purposes, is just an array indexed by frequency, storing the amplitude of each frequency (in some range). A common variation is the power spectrum that describes the power of each frequency, i.e. the amplitude squared. Frequency spectrum is obtained by splitting the input signal to fixed-size samples and performing Discrete-Time Fourier Transform.

It turns out that trivial spectrum-based rules can be used to achieve reasonably high detection accuracy for a computer too (especially when the user is allowed to “train” her input based on feedback); I had big plans to use ANN and all the nifty things I have learned in our AI classes, but it turned out to be simply an overkill. The input signal is transformed to a frequency spectrum (see box) using real discrete FFT.

So, we have the audio signal coming in from a regular mic device and need to process it further. I chose Perl for quick prototyping and I have assumed that I would find some pre-made scaffolding for this ready. But it turns out that noone really published a simple example of even just showing a real-time frequency spectrum. So, here you go! :-)

First, we need some reasonable way to continuously display the spectrum. Most GUI paradigms are event-driven, but events are usually user interaction pieces and while it would be possible to incorporate continuous data-based updates in this model, it feels quite backwards. So we use a trick:

use warnings;
use strict;
 
init_dsp(); init_fft();
 
use Tk;
our $mw = MainWindow->new;
$mw->after(1, \&ticks); # after 1ms, give control back
MainLoop;
 
sub ticks {
	while (1) {
		render_signal(process_signal(read_dsp()));
		$mw->idletasks();
	}
}

This circumvents the event-driven architecture of Tk and instead puts our main loop in control, processing any GUI events when it’s good time. For more complex programs, this is a bad idea and it will lead to poorly maintaineable code, but when writing simple tools, you should not succumb to grand frameworks and let your code overgrow you.

Okay, how to grab audio input signal in Perl? Unfortunately, there are not really any handy modules you could use thoughtlessly. Audio::DSP is a possibility, but using it is clumsy, especially in the current world of ALSA as you have to rely on the imperfect aoss wrapper. A simple alternative is to get the raw byte data through a pipeline from the ALSA arecord tool:

our ($devname, $fmt, $bitrate, $wps, $bps, $bufsize, $dsp);
BEGIN {
	$devname = "default"; # or e.g. hw:1,0 for an additional USB soundcard input
	$fmt = 16;            # sample format (bits per sample)
	$bitrate = 16384;     # sample rate (number of samples per second)
	$wps = 8;             # FFT windows per second (rate of FFT updates)
	$bps = ($fmt * $bitrate) / 8; # bytes per second
	$bufsize = $bps / $wps;   # window buffer size in bytes
}
 
sub init_dsp {
	open ($dsp, '-|', 'arecord', '-D', $devname, '-t', 'raw',
		'-r', $bitrate, '-f', 'S'.$fmt) or die "arecord: $!";
	use IO::Handle;
	$dsp->autoflush(1);
}
 
sub read_dsp {
	my $w;
	read $dsp, $w, $bufsize or die "read: $!";
	return $w;
}

read_dsp will return one signal window per call, the window being a binary blob consisting of one two-byte word per sample. We want to magically convert this to a spectrogram.

Audio::Analyze is again the simple way to get a signal spectrum. If you are after analyzing a pure audio signal, you probably want to use it since it can easily filter the signal based on relative human perception of frequencies etc. But for us, it is inconvenient to feed it data through a pipe and we will directly use Math::FFT. It will still handle all the gory math for our case (and we care about the actual noise, not the way people would hear it).

use Math::FFT;
use List::Util qw(sum);
 
our @freqs;
sub init_fft {
	my $dft_size = $bitrate / $wps;
	for (my $i = 0; $i < $dft_size / 2; $i++) {
		$freqs[$i] = $i / $dft_size * $bitrate;
	}
}
 
sub process_signal {
	my ($bytes) = @_;
 
	# Convert raw bytes to a list of numerical values.
	$fmt == 16 or die "unsupported $fmt bits per sample\n";
	my @samples;
	while (length($bytes) > 0) {
		my $sample = unpack('s<', substr($bytes, 0, 2, ''));
		push(@samples, $sample);
	}
 
	# Perform RDFT
	my $fft = Math::FFT->new(\@samples);
	my $coeff = $fft->rdft;
 
	# The output are complex numbers describing the exactly phased
	# sin/cos waves. By taking an abs value of the complex numbers,
	# we just measure the amplitude of a wave for each frequency.
	my @mag;
	$mag[0] = sqrt($coeff->[0]**2);
	for (my $k = 1; $k < @$coeff / 2; $k++) {
		$mag[$k] = sqrt(($coeff->[$k * 2] ** 2)
		                + ($coeff->[$k * 2 + 1] ** 2));
	}
 
	# Rescale to 0..1. Many fancy strategies are possible, this is
	# extremely silly.
	my $avgmag = sum (@mag) / @mag;
	@mag = map { $_ / $avgmag * 0.3 } @mag;
	return @mag;
}

Not much to add besides the inline comments. The input of the process_signal function is a raw byte stream, the output is a list of amplitudes; @freqs maps the list indices to the actual Hz frequencies. The normalization to [0,1] interval shown here (pitching the mean at 0.3) is extremely naive, again there are many possible strategies. Also, you certainly want to use a window function etc. in more serious applications.

Now, for the visualization. We have chosen Tk for our GUI (it looks ugly, but it is reasonably easy to use despite its Tcl antics). We will use its Canvas object where we can draw freely, and just plot a line for each frequency:

our $canvas;
sub render_signal {
	# Display parameters, tweak to taste:
	my $rows = 2;
	my $hspace = 20;
	my $height = 150;
	my $vspace = 20;
 
	my @spectrum = @_;
	my $row_freqn = @spectrum / $rows;
 
	$canvas;
	unless ($canvas) {
		$canvas = $mw->Canvas(
			-width => $row_freqn + $hspace * 2,
			-height => $height * $rows + $vspace * ($rows + 1));
		$canvas->pack;
	}
	$canvas->delete('all');
 
	for my $y (0..($rows-1)) {
		for my $x (0..($row_freqn-1)) {
			my $hb = ($height + $vspace) * ($y + 1);
			my $i = $row_freqn * $y + $x;
 
			# Draw line:
			my $ampl = $spectrum[$i];
			$ampl <= 1.0 or $ampl = 1.0;
			my $bar = $height * $ampl;
			$canvas->createLine($x + $hspace, $hb,
			                    $x + $hspace, $hb - $bar);
 
			# Draw label:
			if (!($x % ($row_freqn/4))) {
				$canvas->createLine($x + $hspace, $hb + 0,
				                    $x + $hspace, $hb + 5,
				                    -fill => 'blue');
				$canvas->createText($x + $hspace, $hb + 15,
				                    -fill => 'blue',
				                    -font => 'small',
				                    -text => $freqs[$i]);
			}
		}
	}
 
	$mw->update();
}

This suffices for a naive visualization, you can easily tweak it to do thresholding and whatever else you desire. I have found that on some of my computers, the X protocol is pushed to its limits by repeatedly drawing a large amount of lines, and sometimes the spectrum will start to lag behind the signal; either show wider bars averaging together multiple frequencies, or use something other than a Canvas object – raw pixmap transfer would likely be better than such a large amount of line drawing operations.

For a serious signal analysis work, you will also want a spectrogram – a time-based plot of amplitude of various frequencies.

To get a working script skeleton, simply piece the code snippets together (fb-simple.pl). See fb.pl for the real fluffy ball script. It is much uglier, but it maintains sample averages over longer time windows (essential for more complex signal analysis), it has simple sample recording capabilities, and an example of naive threshold-based classifier.

Categories: software Tags:

Master Thesis: Information Sharing in MCTS

August 17th, 2011 1 comment

Just a quick note – at the beginning of August, I have submitted my master thesis, titled Information Sharing in MCTS. MCTS means Monte Carlo Tree Search and it is a powerful technique for finding moves in games with large state space and difficult evaluation functions, such as Go.

The thesis presents the modern Computer Go techniques and open problems, my (not too weak) Go-playing program Pachi and some modest improvements of the MCTS algorithms. It might make a good introduction to state-of-art Computer Go.

Categories: life, software Tags: , , , ,

Bitcoin and Anonymous Transactions

August 13th, 2011 2 comments

Previous: Bitcoin Transaction Speed vs. Double-Spending

Note: I do not endorse use of Bitcoin for any illegal activities. Like many, I regret the neccessary tradeoff between freedom, privacy and legitimate legal enforcement.

One of the much touted advantages of Bitcoin is its anonymity. The full record of all transactions is by the very nature of Bitcoin completely public. However, transactions move coins between a set of incoming addresses and a set of outgoing addresses, and the addresses are not tied to any particular entity; an address may belong to anyone and Bitcoin includes no way to tell.

The problem, of course, is that there might be ways to tell outside of Bitcoin, especially if you are legal authority (or good at finding security holes; or a Google user). Let’s say someone gives a donation (from address X) to an illegal free speech organization in Burma, and you want to figure out if it was a Burmese resident – and his address, in that case. The organization uses anonymous donation-specific addresses dnum. It wants to purchase a stock of paper for leaflets delivered to a conspirational address, without tipping off the secret police.

Let’s assume the secret police captured a computer of another supporter of dissidents. They retrieved the source address used for donations (address C) and now they look at where the coins went. A complicated bead net of addresses that merge and split coins among each other unrolls. The trouble is if it is possible to identify some of the addresses in the net. By government order, all incoming payments to the paper store are transferred from order-specific address p? to a single police-registered address P. The first order was paid at address p0, the second would be paid at p1.

The police can now clearly see the coins from address C, transferred to donation address d0, have been used to pay for paper and they will inquire in the store about customer of the address p0. But they can also see that other money have been used for this purchase as well. The trail goes through addresses X, xb, xa, xx, all unknown to police, but arrives to xx from M, the main coin store address of the local Bitcoin exchange. The authorities inquire at the exchange and get bank account number associated with the outgoing payment. The bank happily provides the address and the police is on its way. Does the bank account owner also own the address xx? And what about xa, xb and X, are these her addresses (used for keeping order in the books or to muddle the tracks) or someone else’s? Whose and why did the money go there? The case is not proven yet, but a person is already questioned at the secret police office and in a totalitarian regime, the burden of proof is effectively on her. The paper paid for at p1 will never arrive, but several black cars will.

(The xx-xa-xb-X split can be improved by sending parts of the amount to different addresses, then sending coins between them, and so on, but AFAIK this does not really help anything much as the coins are still being passed within a closed subgraph. Also, the dissident organization might try to avoid mixing independent sources in payments, but this may be difficult if you live from small donations.)


So, all is lost and hopeless, back to good old cash? Well, not necessarily. There are various approaches to combat the problem.

Solve part of it outside the box. Launder coins abroad. Spend them on legitimate-looking puppet causes while staying sure they either return to you or get sent the right way. Repeat and rinse several times. Preferably include real people with googleable Bitcoin addresses early in this chain to completely dissociate yourself from the coins in the public record. Of course, you need some real-world cooperation and friends worldwide. Similar approaches like dissociation by re-selling goods may be considered, but they seem very cumbersome, error-prone and likely losing money (or needing large investments and much time). This all could become a lot easier when paying with bitcoins in person becomes commonplace.

An obvious scheme to muddle the tracks is a mixin service (an escrow service might do this as a side-effect). Many people send in coins, they are all included in a single transaction and this transaction has many outgoing addresses for each user sending in money. It will split large pile of possibly suspect money to many small piles that can be used separately, and in the transaction it will be completely unclear which of the many source addresses is the source of the coins. Repeat few times with different sets of co-mixers and you have pretty much perfectly fogged the origins of the money.

It sounds great but it is fraught with many practical issues. Most obviously, you need to trust the service with your money; you send it in, it might arrive back or maybe not (they are scammers, or maybe just have technical trouble or got hacked). It should definitely be set up in some “safe” country. And still, you need to trust it is not logging anything, nor is its ISP, and nor is your country’s Great Firewall (i.e., use tor). Your government in fact secretly running this service would be a masterstroke. And of course, you absolutely need the “many people” part as well, so that you are actually mixing the coins with others. But these “many people” should be mainly honest persons with legal coin targets, there is not that much point in mixing if you are mixing just with crooks. For the same reason, “many honest people” will be reluctant to get into the trouble needlessly and mix with the crooks and you. An escrow service with mixing as a side-effect could alleviate this issue somewhat.

But what about a more elegant way? When making a transaction, you can pay a transaction fee, mainly to ensure speedier inclusion in a block. The transaction fee goes to whoever mines the block that includes the transaction. There is no upper bound to a transaction fee. We will spill coins in the fees.

The idea is similar to a mixin service; but instead of requiring voluntary participation, we use the txfees and the whole block chain as a mixin platform. In exchange, we concede that we cannot have all the money, but only some large fraction (a third? a half?). We will produce dummy transactions that move all the coins to the txfee, then collect part of it back in block generation payouts. This is completely legitimate and automatic; blocks generated by us and by other solo miners (pools are easily identifiable) are indistinguishable. Some of the money will return as clean laundry while the rest goes to support the Bitcoin system.

The practical implementation is simple. First, one needs to choose the ratio. The smaller the ratio, the harder to track the most common destination of the spillage, but the less money you get, obviously. Then, you create many small fee-spilling transactions transactions. (More means better laundered for further use, but take more time to get.) You keep part of them for your own blocks and broadcast the rest for other miners to pick up. You can get very smart about this, scanning the p2p network to broadcast to solo miners preferentially (you will be better masked), do some probabilistic modeling on the blocks you create, and so on. The big catch is being able to produce blocks at sufficient pace. The answer is either making a huge investment and building a massive solo rig, or just renting one! There are good rigs for rent. It is again important to rent abroad and make sure you cover your tracks well when paying rent. Of course, this is more practical for the accepting organization than the individual donator.


This proposal is far from perfect, but it should work and I have not seen it mentioned yet. My main point is that there may be more tricks like this and the anonymity discussion might yet get interesting.

Categories: software Tags: ,

Bitcoin Transaction Speed vs. Double-Spending

August 12th, 2011 1 comment

In the next few days, I will be posting a couple of trivial short thoughts on some frequently debated “issues” of Bitcoins that keep coming up e.g. on #bitcoin and that I reacted to.

The basis of Bitcoin are “blocks”. Sometimes, they are regarded as just a magic device for getting bitcoins by the mysterious activity of mining. But the main purpose of blocks is to build a trustworthy distributed database of bitcoin transactions. The trustworthiness comes from the blocks, which confirm the transactions by a “proof of work” – non-trivial computational effort.

The reason we want this “trustworthiness”? We are tackling one of the major problems of distributed currencies – double spending. You have a coin tied to an address; at the “same moment”, you could send two transactions to two nodes of the network, in one sending it to Alice, in the other sending it to Bob. Which of the transactions becomes accepted by the network? Half of the network might think you send the money to Alice, half that it goes to Bob. The point of the “block chain” is that no conflicting transactions appear in it, and the transaction with the most computing power behind it wins.

Unfortunately, you might need to wait for a block with your transaction in it for anything from five minutes to 1.5 hour, and for your transactions to be trustworthy enough, some followup blocks need to be accepted (the “confirmations”). This is a nuisance if you want to make real-time with bitcoins, of course, be it buying a soda can in vending machine or hefty TV in an electronics shop. It is also sometimes used as an argument against Bitcoin.

But there are two ways to deal with the issue. The first way, even mentioned at the Bitcoin wiki, is just not caring about the problem. If the payment is just few bitcents for groceries, it is just not worth the hassle. Hassle drives away customers and the loss may be greater than an occassional double-spend cheater makes. The modern society (including credit card companies) relies on the fact that people tend to be honest when there is non-infinitesimal risk.

In case of a hefty TV, non-infinitesimal risk may be worth it. But the solution is again no rocket science: a trusted third party. Two schemes may be possible – with a flexible plan, the service would have a contract with the customer and the merchant. At the time of payment, the service will send the money to the merchant and the customer will pay to the service. If money do not arrive, the service will sue the hell of the customer; the merchant gets their money regardless. Thinking about it, this works a lot like credit cards.

With a fixed plan, there is even less hassle for the customer. They just make a deposit to the third party – this is an upper limit to their spending. When they purchase an expensive TV set, the merchant receives guaranteed payment from the third party, while the customer sends the money to the service. If money do not arrive, it gets reimbursed from the deposit. An initial deposit (and thus more trust in the third party) is required, but the customer may remain anonymous and the service can have smaller legal department.

Bitcoin Spending Insurance could be a suitable name for one service like this, but you could have countless, like you have multiple credit card companies and many banks. On the merchant side, another service might aggregate them conveniently, and on the customer side, they may compete with their bussiness plans, bonuses, advertisements and other wretched capitalistic means. ;-)

I.e., this reduces the decentralization of Bitcoin, but so do currency exchanges and other services – it is an optional addon and if the users do not mind the wait, they are free to avoid it. In short, no need to worry about the fact that transactions are not real-time.

Next: Bitcoin and Anonymous Transactions

Categories: software Tags:

I use 6to4 – why are my applications still preferring IPv4?

July 4th, 2011 1 comment

I found out about this curious behavior almost a month ago during the World IPv6 Day. I was surprised about this, even though I really shouldn’t be, given that I was fixing some bugs in the glibc implementation of this mechanism only few months earlier. ;-)

If you are not bothering with tunnel brokers anymore and are using 6to4 for your IPv6 connectivity like me, you might have noticed that your applications still prefer IPv4, disappontingly. You can use getent ahosts www.brmlab.cz (or a different host) to see the list of addresses in the order your applications will most likely try to connect by default.

The key mechanism in play here is the RFC3484 getaddrinfo(3) address selection mechanism; on GNU/Linux system, it is described (and configurable) in /etc/gai.conf. The aim of the mechanism is to choose the most suitable pair of source and destination addresses; this is the place where we can choose whether to prefer IPv4 or IPv6, that if we can talk to localhost, we should do it that way, or to talk to link-local addresses using link-local addresses too.

When choosing a destination address, each is marked by a label and preference. First, if there is a destination address with the same label as its “best” source address, such addresses are preferred. From these candidates, the address with the highest preference is picked.

You can read up on the full details in the RFC. In a sense, the label differentiates between multiple transports; IPv4, normal IPv6 (2001::/16, or rather ::/0 minus a lot of exceptions), 6to4, link-local and localhost are all such separate transports. This mechanism for example makes sure that IPv4 is preferred to normal IPv6 in case we have IPv4 address, but only link-local IPv6 available. And the important point is that 6to4 is differentiated. If a system has both normal IPv6 and 6to4 configured, normal IPv6 is used for normal IPv6 destinations while 6to4 is used for 6to4 destinations. The side effect is that if IPv4 and 6to4 addresses are available, IPv4 will be preferred to IPv6 destinations.

I’m not sure about the exact motivation for this, but it does make sense. It reduces the load on the relay servers that route between 6to4 realm and native IPv6 internet; if 6to4 addresses talk to each other, they connect over IPv4 directly, without need for relay servers. Also, sometimes the relay servers can be topologically far away on the IPv4 internet, slowing down IPv6 communication. And while IPv6 is cool, since your traffic is going over IPv4 part of the way anyway (to the nearest 6to4 relay), it makes no sense to artificially switch to IPv6 for the rest of the trip if you can just use IPv4 all the way.

But, if you have no native IPv6 and want to prefer 6to4 to IPv4 communication – since IPv6 is cool – you can tweak your /etc/gai.conf:

#label ::/0          1
#label 2002::/16     2

Just uncomment the 2002::/16 line and change its label from 2 to 1. Then it will have the same label as the “normal” IPv6 internet. Its behavior will be suboptimal in some cases and you shouldn’t deploy this thoughtlessly, but if you just do this on your personal workstation, it is a way to get the warm “I’m using IPv6 – somewhat” feeling.

Categories: linux Tags: ,

Multiple Kerberos realms on single host

June 9th, 2011 No comments

We are using Kerberos for central authentication at our university department; recently, we have set up a second realm for authentication to other services than UNIX logins (e.g. webmail) – we did not like the idea of reusing the same passwords.

Setting up a second realm has been pretty straightforward, just add the other realm to /etc/krb5.conf, write explicit port number to admin_server and kadmin_port options, add another init script for secondary admin server and… it almost works.

But suddenly, users find themselves unable to change their password within the original realm using passwd or kpasswd, with less than helpful Authentication error: Failed reading application request. And that’s the reason I’m writing up this, since you explicitly need to set the totally undocumented option kpasswd_port to a different value than 464! This is because kadmind provides the kadmin service on TCP port (749 by default), but also chpw service on UDP port (464 by default).

Categories: linux Tags: , ,

Repairing git cherry-pick authorship information

May 27th, 2011 No comments

I spent just my last night going through few months worth of patches and cherry-picking the bugfixy ones to glibc’s release/2.11/master. But I was tired and didn’t pay attention to git’s messages, so at the end of the evening, I noticed that for all conflicting patches, I have done git commit -a instead of git commit -a -c commitid. This had a definite advantage since the “(cherry picked from commit …)” notices inserted by git cherry-pick -x got preserved, but also a very definitive problem – the author name and date info for each commit was wrong.

(Note that AIUI, 1.7.5 cherry-pick might not have this problem anymore. I’m still using 1.7.4, content with Debian’s packaged version nowadays.)

Due to the -x lines, we still have mapping to original history. Therefore, some scripting should fix this quickly. And sure enough…! Maybe this recipe will come useful to someone:

git filter-branch --commit-filter '
  if [ "$GIT_AUTHOR_NAME" = "Petr Baudis" ]; then
    # Author of this commit is wrong! We could also simply correct
    # all commits containing the "cherry picked" notice.
    cat >/tmp/logm$$ # save log message
    ocommit="$(sed -n '\''s/^(cherry picked from commit \(.*\))$/\1/p'\'' </tmp/logm$$)"
    # Load original authorship information:
    IFS=: read GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE \
        <<<"$(git log -1 --pretty=format:"%an:%ae:%at" $ocommit)"
    # Redo the commit:
    export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
    git commit-tree "$@" </tmp/logm$$
    rm /tmp/logm$$
else
    git commit-tree "$@" # preserve commit intact
fi' c55cc45ed76603b380489ee8c91ab5dce92e92f1..HEAD

Note that this requires that /bin/sh is bash (which may NOT be the case on debian!). Otherwise, you need to rewrite the <<< bit.

The c55cc45ed… commit is the first wrong cherry-pick. You may omit that altogether if you wish but the complete branch history is going to be rewritten. Also note that you should never rewrite commits that are already pushed out to a public place.

Categories: linux, software Tags: ,

brmd: A Case for POE

May 26th, 2011 2 comments

In brmlab, we want to track who is unlocking the space, whether someone is inside, have some good visual indicator that live stream is on air, and so on. In other words, we have an Arduino with some further hardware, and we want to show whatever is reported by the Arduino on IRC and web, and provide some web-based control (open/closed status override) in the opposite direction too.

What to use for a service (we call it brmd) that will bind all these interfaces together? It just needs a lot of boring frontends and simple state maintenance. It turns out that Perl’s POE framework is ideal for this – most of the code for IRC, HTTP and device read/write is already there, so you just grab the modules, slam them together and you have exactly what you need with minimal effort. Right?

It turns out that there are caveats – basically, the idea is correct, aside of getting stuck on a single stupidity of mine, I’d have the whole thing put together in something like half an hour. Unfortunately, the moment you want robustness too, things are getting a lot more complex; to handle the device disappearing, IRC disconnections, not having HTTP socket fds leak away, etc., you suddenly need to either magically know what further modules to hook up or start exeting some manual effort. Still, I like how POE is making it so easy to give a simple state machine many input/output interfaces and when you get used to the idiosyncracies, you can even make it somewhat reliable.

Example POE code

While this task seems to be ideal fit for POE, I’ve found surprisingly few examples of more complex POE component interaction on the web. Therefore, I’m going to lay out at least tersed up version of brmd below to help fellow future googlers. Nothing of this is anything ground-breaking, but it should help a lot to have a template to go by. Our real version is somewhat more verbose and includes some more interfaces: brmdoor.git:brmd/brmd.pl

I assume that you already know what “POE kernel” and “POE session” is. Beware that I’m a POE beginner myself, and I haven’t gone through much effort to clean the code up the way I would if I were to work together with someone else on this. Some things surely aren’t solved optimally and you might even pick up a bad habit or two.

In order to have some neat separation, we will divide brmd to several components where each will take care of a single interface; the main package will only spawn them up and do some most basic coordination. If we were to grow much larger, it would be worth the effort to even set up some kind of message bus (I wish POE would provide that implicitly), here we just directly signal components needing the info.

Read more…

Categories: software Tags: , , ,

Cute cuddly robot!

May 6th, 2011 No comments

In the past few months, I have been playing a bit with a great robotic platform available at the university in the Introduction to mobile robotics and Eurobot subjects.

We were provided a pre-made robot chassis with some basic electronics, an ATMega128 board, hefty battery, motors and Sabretooth motor drivers. With AxTheB, we have built a merkur-based construction on top of it to hold a camera-on-stick module that gives a picture of surroundings of the robot, and a 12″ notebook that is hooked up to the webcam and the ATMega.

The most interesting thing is the camera. It is held up on a wooden stick, facing upwards to a parabolic mirror (i.e. a laddle), giving it picture of its surroundings in about 320\deg angle (part of the view is obstructed by the stick). That’s not my original idea but it was originally suggested and built for brmbot outdoor. We had it for Robotour 2010 competition, and during the competition I have even built some basic image recognition for it, but in the end we did not have time to integrate it to the main control software so the camera served only as a holder for GPS+compass back then.

More about the tasks below. In the end it turned out that I really don’t have enough time to do this so things got quite stressful at one point, but I would feel really bad if I gave up. In the end, I managed to get things working. And as any robot builder will tell you, seeing your tiny friend roam around happily, doing whatever it’s supposed to do, is worth any stress! :-)

The source code is rather horrible. Keep in mind that it was hacked incrementally and never really cleaned up.

Brmpuk

The first task (Puck Collect from Robot Challenge 2011 in Vienna): Your tiny robot is in a ~2x2m white playground, with two corner squares painted red and blue, and with tiny red and blue pucks scattered over the playground. Its robotic opponent also sits in the playground. Each player has a color assigned, and its goal is to accumulate as many pucks of its color in its corner as possible, while avoiding putting pucks of the other color in that corner. And a deadline of two minutes.

(In addition, the robot has got also a sort of plough in the front
where it collects and pushes the pucks as few centimeters directly
in front of the robot are invisible for the camera.)

Unfortunately, I was not actually able to spend a lot of time on the project (and attend the awesome lectures) due to sickness (to play with robots, you have to come to the lab) and time scheduling problems. Nevertheless, I have been able to finish at least a basic version of the robot and it kind of did what it was supposed to do. (Though for a real competition, more work would be needed – the construction was very frail and the bot could not properly unstuck itself when it got misled e.g. by colors seen outside of the playground.)

Source code: brmpuk.git

Blackline

The second task was a lot simpler – just follow a thick black line laid on the floor. With the communications, firmware and video processing infrastructure already debugged, it was just a matter of replacing the image recognition and I managed to get everything done and debugged in just a couple of hours.

To briefly describe the workings of the robot: The camera provides a picture of the neighborhood of the robot in the YUYV format. The software periodically grabs frames from the camera and looks at three rectangles covering the area slightly in front of the robot – straight ahead, to the right and to the left. (There is a slight gap to give the robot a chance to react with sufficient head-start, deal well with sharp turns and skip over gaps.)

The contents of the three rectangles is then analyzed; YUYV is a convenient pixel format for image recognition since you already have brightness (luma) and hue (chrominance) separated. To detect dark line on light background, it is enough to look at the Y-values of pixels. Second, we are detecting a high-contrast object that is always smaller than the rectangle we look at. So we do a trivial thing – get luma difference of the darkest and brightest pixel. We do not get dark pixels from dirt since the camera image is sufficiently blury, and the only large enough dark object on white background is the line, so this works perfectly, can adjust to overall brightness of the image (to a degree), and is really simple and foolproof.

The robot driving strategy is then trivial. After each frame, the control program will give an update on new speed of both wheels. If it detects the line is being followed, it goes straight ahead. Otherwise, if one of the side rectangles is active, it stops one motor and starts turning in that direction. If no rectangle is active, it may be that a sharp turn has been encountered: at one point both the ahead rectangle and a side rectangle was active, at the next moment none was. Therefore, it looks at the last time one of the side rectangles has been active and goes in that direction. A similar handling is used for both side rectangles active (during turning to one side, the other rectangle may blink to activity e.g. due to table edge).

Source code: brmpuk.git blackline


P.S.: I just wasted three hours trying to work around bugs in WordPress, jQuery(?) and YouTube/Chromium so that I could publish this post. How are you people managing to live in this Web 2.0 world?!

Categories: hardware, software Tags: , ,

ld.so Scopes

March 10th, 2011 No comments

Recently, I have spent quite a bit of my time debugging an evil ld.so bug involving mis-handling of scopes and I have noticed precious lack of documentation of any internal ld.so data structures. So again, this comes for the benefit of the googlers, an intro that could have saved me another quite bit of time spent poking the code.

Of course, the dynamic linker features a wide variety of fun hacks. The most interesting mechanism is probably how lazy relocation is performed, but things like that have already been described plenty of times before. The question we shall look into is what data structures are used when a new symbol is to be searched for and linker has already taken control. There are two important internal concepts of ld.so related to this – the link_map and the scope. You can see the data structures in include/link.h.

The struct link_map describes a single loaded object; it may be ld.so, the main program, libc, or any other shared object loaded afterwards, during startup or later. It has many members, like its name, its mates in global linked list of all objects, or its state. But the most interesting attribute is its scope.

The scope describes which libraries should be searched for symbol lookups occuring within the scope owner. (By the way, given that lookup scope may differ by caller, implementing dlsym() is not that trivial.) It is further divided into scope elements (struct r_scope_elem) – a single scope element basically describes a single search list of libraries, and the scope (link_map.l_scope is the scope used for symbol lookup) is list of such scope elements.

To reiterate, a symbol lookup scope is a list of lists! Then, when looking up a symbol, the linker walks the lists in the order they are listed in the scope. But what really are the scope elements? There are two usual kinds:

  • The “global scope” – all libraries (ahem, link_maps) that have been requested to be loaded by the main program (what ldd on the binary file of the main program would print out, plus dlopen()ed stuff).
  • The “local scope” – DT_NEEDED library dependencies of the current link_map (what ldd on the binary file of the library would print out, plus dlopen()ed stuff).

The global scope is shared between all link_maps (in the current namespace), while the local scope is owned by a particular library. (FIXME) If a library has local scope element in its scope, it adds itself to that scope. E.g. assume libA dlopen()ing libB (with RTLD_LOCAL) – libB will get and own a fresh local scope element, and all libraries loaded by libB will inherit and add themselves to that local scope element.

There are then four common situations:

  • The main program has only single scope element, the global scope. (At least I would expect so, I have not verified this.)
  • A library has been loaded with RTLD_LOCAL (the default case). Then its link_map has two scope elements, first comes the global scope, then comes the local scope.
  • A library has been loaded with RTLD_LOCAL | RTLD_DEEPBIND. In that case, the link_map has again the two scope elements, but the order is switched – the local scope comes first.
  • A library has been loaded with RTLD_GLOBAL. The link_map lists only the global scope.

(Another concept is namespace; each has its own id and linked list of link_maps, but usually there are just two, one for the ld.so and another for the application. Unless you are calling dlmopen() explicitly or using the LD_AUDIT interface, you can usually assume there is only a single namespace that matters.)

Just for fun – the bug I have been hunting has been caused by ld.so not handling local scopes quite properly. Normally, when unloading the library opened with RTLD_LOCAL, all its local scope members would be unloaded too. However, such a member could be flagged as RTLD_NODELETE, and in that case, it would stay around. The problem is, the code did not expect that and would remove the local scope owner and the local scope would go along with it. This means the nodelete library dependencies would disappear from its local scope and the next time it got called (e.g. within its static destructor), trying to resolve such a symbol would cause a “unresolved symbol” fatal error.

Categories: linux, software Tags: , ,