Using CUPS to print text files in non-UTF8 charset encoding

May 17th, 2012 No comments

At our university department, many people still haven’t migrated to UTF8 and are still happily using ISO-8859-2 – mainly due to the amount of legacy text (TeX, …) documents.
Nowadays, support for non-UTF8 is slowly waning though, and CUPS is a prime example. Most of (shabby anyway) support for non-UTF8 encodings have been removed few years ago. It is still possible to force CUPS to print text files in non-UTF8 encoding if you extract the appropriate files from ancient version (1.2 or some-such) of CUPS to /usr/share/cups/charset/ and print using e.g. lpr -o document-format='text/plain;charset=iso-8859-2'. However, there is simply no support for lpr automatically setting the charset based on your locale.

We decided that the best way to go is to simply auto-detect the encoding using the awesome enca package and convert text files from this encoding to UTF8. This should be actually fairly fool-proof in practice, unless you are dealing with an extremely mixed set of languages. Making own CUPS filter is easy – just change texttops entries in /etc/cups/mime.conv to textautoencps and create a new /usr/lib/cups/filter/textautoencps file:

#!/bin/bash
 
if [ $# == 0 ]; then
  echo >&2 "ERROR: $0 job-id user title copies options [file]"
  exit 1
fi
 
{ if [ $# -ge 6 ]; then
    cat $6
  else
    cat
  fi; } |
    enconv -x utf-8 -L czech |
    /usr/lib/cups/filter/texttops "${@:0:6}"
Categories: linux, software Tags: , , ,

Publicly Killable Computations

March 7th, 2012 No comments

At our university department, people sometimes need to run expensive or long-term computations. We have few servers reserved for computations, but frequently it is useful to run computations on machines in the offices since some of them are fairly powerful and mostly get only very light use CPU-wise.

However, such computations must never impair any interactive or more pressing use of the machine. Therefore, we want to limit scheduling priority of the computations, limit total memory used by the computations and allow *anyone* kill *any* running computation. It turns out that this is not as trivial to achieve as I hoped.

In comes Computations under control: compctl – cgroup-based control of publicly limitable and stopable tasks. It is a tool that allows anyone to execute a command (or start screen) such that it is marked as a computation. Then, it allows anyone else to limit the total amount of memory allocated for all computations and to stop a specific computation or all computations on a machine. It uses cgroups to keep track of computations and limit the total memory usage, and a simple client-server architecture to perform priviledged tasks.

I hope it will be useful for someone else too. :-) Feel free to send in patches, and extra pairs of eyeballs checking the security would be welcome too. Top on my TODO list is simple debian package and a more verbose compctl –list output.

Categories: linux, software Tags: , , , ,

Full-text search in mutt: alternative notmuch integration

February 12th, 2012 1 comment

If some feature is too slow, you end up conciously avoiding it and losing productivity. This is one of the reasons that we emphasize so much that Git is as fast as it is – you end up using it more because of that. One thing I always found very frustrating was full-text search in mutt; it takes _minutes_ on my mailbox and I end up trying many different header-based queries instead in order to find the mail. But today, I have finally set up notmuch, a very nice and fast mail indexer.

Unfortunately, there was no satisfying way of integrating notmuch with mutt! There is a notmuch-mutt script which creates a temporary maildir with results and moves me there. This was not going to work for me – you cannot make any changes in the “search results list” like deleting mails (I wonder if status would carry if I reply to mails; I suspect not) and in order to get back to your mails, you need to switch mailbox – which implies that your previous position is not restored and that it’s quite slow (few seconds – too much!).

What I envisioned instead was something like the ‘l’imit function that I use very much, just faster. ;-) It turns out that mutt can match message-ids in the limit query and that notmuch can output a list of message-ids of matched mails. Therefore, the most hackish approach is simply to use notmuch to generate a limit specification and perform that – and it turns out that this is good enough (in my scenario)!

Just put these two bindings (or only the first one) in your .muttrc:

# 'L' performs a notmuch query, showing only the results
macro index L "<enter-command>unset wait_key<enter><shell-escape>read -p 'notmuch query: ' x; echo \$x >~/.cache/mutt_terms<enter><limit>~i \"\`notmuch search --output=messages \$(cat ~/.cache/mutt_terms) | head -n 600 | perl -le '@a=<>;chomp@a;s/\^id:// for@a;$,=\"|\";print@a'\`\"<enter>" "show only messages matching a notmuch pattern"
# 'a' shows all messages again (supersedes default <alias> binding)
macro index a "<limit>all\n" "show all messages (undo limit)"

Perhaps sometime in the future, we will get native libnotmuch support in mutt, but I think this is a pretty good substitute for now. :-)

TODO list

  • the way this snippet prompts using a temporary file is completely absurd; mutt needs to get a builtin prompt function for its macros
  • only the most recent 600 search hits are shown, since…
  • …the filtering is grossly inefficient; it is still very fast on my computer, but if mutt could just directly get a list of message ids and match them, things would be much nicer than me abusing the regex matching machinery
  • the 600 search hits limit is global over all folders, therefore if you have a lot of mails and a lot of folders, searching for a common word may hide even some recent results
  • notmuch cannot search for substrings, apparently, only whole words
  • notmuch does not deal with diacritics and other locale transliteration character classes
Categories: software Tags: , ,

Realtime Signal Analysis in Perl

September 24th, 2011 No comments

About a month ago, we were working on the Fluffy Ball project – a computer input device that can react to fondling and punching. Thanks to a nice idea on the brmlab mailing list, we use a microphone and process the noise coming from the ball’s scratchy stuffing and an embedded jingle. The sounds from the outside are almost entirely dampened by the stuffing and for a human, the noise of fondling and punching is easily distinguishable.

Frequency spectrum, for our purposes, is just an array indexed by frequency, storing the amplitude of each frequency (in some range). A common variation is the power spectrum that describes the power of each frequency, i.e. the amplitude squared. Frequency spectrum is obtained by splitting the input signal to fixed-size samples and performing Discrete-Time Fourier Transform.

It turns out that trivial spectrum-based rules can be used to achieve reasonably high detection accuracy for a computer too (especially when the user is allowed to “train” her input based on feedback); I had big plans to use ANN and all the nifty things I have learned in our AI classes, but it turned out to be simply an overkill. The input signal is transformed to a frequency spectrum (see box) using real discrete FFT.

So, we have the audio signal coming in from a regular mic device and need to process it further. I chose Perl for quick prototyping and I have assumed that I would find some pre-made scaffolding for this ready. But it turns out that noone really published a simple example of even just showing a real-time frequency spectrum. So, here you go! :-)

First, we need some reasonable way to continuously display the spectrum. Most GUI paradigms are event-driven, but events are usually user interaction pieces and while it would be possible to incorporate continuous data-based updates in this model, it feels quite backwards. So we use a trick:

use warnings;
use strict;
 
init_dsp(); init_fft();
 
use Tk;
our $mw = MainWindow->new;
$mw->after(1, \&ticks); # after 1ms, give control back
MainLoop;
 
sub ticks {
	while (1) {
		render_signal(process_signal(read_dsp()));
		$mw->idletasks();
	}
}

This circumvents the event-driven architecture of Tk and instead puts our main loop in control, processing any GUI events when it’s good time. For more complex programs, this is a bad idea and it will lead to poorly maintaineable code, but when writing simple tools, you should not succumb to grand frameworks and let your code overgrow you.

Okay, how to grab audio input signal in Perl? Unfortunately, there are not really any handy modules you could use thoughtlessly. Audio::DSP is a possibility, but using it is clumsy, especially in the current world of ALSA as you have to rely on the imperfect aoss wrapper. A simple alternative is to get the raw byte data through a pipeline from the ALSA arecord tool:

our ($devname, $fmt, $bitrate, $wps, $bps, $bufsize, $dsp);
BEGIN {
	$devname = "default"; # or e.g. hw:1,0 for an additional USB soundcard input
	$fmt = 16;            # sample format (bits per sample)
	$bitrate = 16384;     # sample rate (number of samples per second)
	$wps = 8;             # FFT windows per second (rate of FFT updates)
	$bps = ($fmt * $bitrate) / 8; # bytes per second
	$bufsize = $bps / $wps;   # window buffer size in bytes
}
 
sub init_dsp {
	open ($dsp, '-|', 'arecord', '-D', $devname, '-t', 'raw',
		'-r', $bitrate, '-f', 'S'.$fmt) or die "arecord: $!";
	use IO::Handle;
	$dsp->autoflush(1);
}
 
sub read_dsp {
	my $w;
	read $dsp, $w, $bufsize or die "read: $!";
	return $w;
}

read_dsp will return one signal window per call, the window being a binary blob consisting of one two-byte word per sample. We want to magically convert this to a spectrogram.

Audio::Analyze is again the simple way to get a signal spectrum. If you are after analyzing a pure audio signal, you probably want to use it since it can easily filter the signal based on relative human perception of frequencies etc. But for us, it is inconvenient to feed it data through a pipe and we will directly use Math::FFT. It will still handle all the gory math for our case (and we care about the actual noise, not the way people would hear it).

use Math::FFT;
use List::Util qw(sum);
 
our @freqs;
sub init_fft {
	my $dft_size = $bitrate / $wps;
	for (my $i = 0; $i < $dft_size / 2; $i++) {
		$freqs[$i] = $i / $dft_size * $bitrate;
	}
}
 
sub process_signal {
	my ($bytes) = @_;
 
	# Convert raw bytes to a list of numerical values.
	$fmt == 16 or die "unsupported $fmt bits per sample\n";
	my @samples;
	while (length($bytes) > 0) {
		my $sample = unpack('s<', substr($bytes, 0, 2, ''));
		push(@samples, $sample);
	}
 
	# Perform RDFT
	my $fft = Math::FFT->new(\@samples);
	my $coeff = $fft->rdft;
 
	# The output are complex numbers describing the exactly phased
	# sin/cos waves. By taking an abs value of the complex numbers,
	# we just measure the amplitude of a wave for each frequency.
	my @mag;
	$mag[0] = sqrt($coeff->[0]**2);
	for (my $k = 1; $k < @$coeff / 2; $k++) {
		$mag[$k] = sqrt(($coeff->[$k * 2] ** 2)
		                + ($coeff->[$k * 2 + 1] ** 2));
	}
 
	# Rescale to 0..1. Many fancy strategies are possible, this is
	# extremely silly.
	my $avgmag = sum (@mag) / @mag;
	@mag = map { $_ / $avgmag * 0.3 } @mag;
	return @mag;
}

Not much to add besides the inline comments. The input of the process_signal function is a raw byte stream, the output is a list of amplitudes; @freqs maps the list indices to the actual Hz frequencies. The normalization to [0,1] interval shown here (pitching the mean at 0.3) is extremely naive, again there are many possible strategies. Also, you certainly want to use a window function etc. in more serious applications.

Now, for the visualization. We have chosen Tk for our GUI (it looks ugly, but it is reasonably easy to use despite its Tcl antics). We will use its Canvas object where we can draw freely, and just plot a line for each frequency:

our $canvas;
sub render_signal {
	# Display parameters, tweak to taste:
	my $rows = 2;
	my $hspace = 20;
	my $height = 150;
	my $vspace = 20;
 
	my @spectrum = @_;
	my $row_freqn = @spectrum / $rows;
 
	$canvas;
	unless ($canvas) {
		$canvas = $mw->Canvas(
			-width => $row_freqn + $hspace * 2,
			-height => $height * $rows + $vspace * ($rows + 1));
		$canvas->pack;
	}
	$canvas->delete('all');
 
	for my $y (0..($rows-1)) {
		for my $x (0..($row_freqn-1)) {
			my $hb = ($height + $vspace) * ($y + 1);
			my $i = $row_freqn * $y + $x;
 
			# Draw line:
			my $ampl = $spectrum[$i];
			$ampl <= 1.0 or $ampl = 1.0;
			my $bar = $height * $ampl;
			$canvas->createLine($x + $hspace, $hb,
			                    $x + $hspace, $hb - $bar);
 
			# Draw label:
			if (!($x % ($row_freqn/4))) {
				$canvas->createLine($x + $hspace, $hb + 0,
				                    $x + $hspace, $hb + 5,
				                    -fill => 'blue');
				$canvas->createText($x + $hspace, $hb + 15,
				                    -fill => 'blue',
				                    -font => 'small',
				                    -text => $freqs[$i]);
			}
		}
	}
 
	$mw->update();
}

This suffices for a naive visualization, you can easily tweak it to do thresholding and whatever else you desire. I have found that on some of my computers, the X protocol is pushed to its limits by repeatedly drawing a large amount of lines, and sometimes the spectrum will start to lag behind the signal; either show wider bars averaging together multiple frequencies, or use something other than a Canvas object – raw pixmap transfer would likely be better than such a large amount of line drawing operations.

For a serious signal analysis work, you will also want a spectrogram – a time-based plot of amplitude of various frequencies.

To get a working script skeleton, simply piece the code snippets together (fb-simple.pl). See fb.pl for the real fluffy ball script. It is much uglier, but it maintains sample averages over longer time windows (essential for more complex signal analysis), it has simple sample recording capabilities, and an example of naive threshold-based classifier.

Categories: software Tags:

Master Thesis: Information Sharing in MCTS

August 17th, 2011 1 comment

Just a quick note – at the beginning of August, I have submitted my master thesis, titled Information Sharing in MCTS. MCTS means Monte Carlo Tree Search and it is a powerful technique for finding moves in games with large state space and difficult evaluation functions, such as Go.

The thesis presents the modern Computer Go techniques and open problems, my (not too weak) Go-playing program Pachi and some modest improvements of the MCTS algorithms. It might make a good introduction to state-of-art Computer Go.

Categories: life, software Tags: , , , ,

Bitcoin and Anonymous Transactions

August 13th, 2011 2 comments

Previous: Bitcoin Transaction Speed vs. Double-Spending

Note: I do not endorse use of Bitcoin for any illegal activities. Like many, I regret the neccessary tradeoff between freedom, privacy and legitimate legal enforcement.

One of the much touted advantages of Bitcoin is its anonymity. The full record of all transactions is by the very nature of Bitcoin completely public. However, transactions move coins between a set of incoming addresses and a set of outgoing addresses, and the addresses are not tied to any particular entity; an address may belong to anyone and Bitcoin includes no way to tell.

The problem, of course, is that there might be ways to tell outside of Bitcoin, especially if you are legal authority (or good at finding security holes; or a Google user). Let’s say someone gives a donation (from address X) to an illegal free speech organization in Burma, and you want to figure out if it was a Burmese resident – and his address, in that case. The organization uses anonymous donation-specific addresses dnum. It wants to purchase a stock of paper for leaflets delivered to a conspirational address, without tipping off the secret police.

Let’s assume the secret police captured a computer of another supporter of dissidents. They retrieved the source address used for donations (address C) and now they look at where the coins went. A complicated bead net of addresses that merge and split coins among each other unrolls. The trouble is if it is possible to identify some of the addresses in the net. By government order, all incoming payments to the paper store are transferred from order-specific address p? to a single police-registered address P. The first order was paid at address p0, the second would be paid at p1.

The police can now clearly see the coins from address C, transferred to donation address d0, have been used to pay for paper and they will inquire in the store about customer of the address p0. But they can also see that other money have been used for this purchase as well. The trail goes through addresses X, xb, xa, xx, all unknown to police, but arrives to xx from M, the main coin store address of the local Bitcoin exchange. The authorities inquire at the exchange and get bank account number associated with the outgoing payment. The bank happily provides the address and the police is on its way. Does the bank account owner also own the address xx? And what about xa, xb and X, are these her addresses (used for keeping order in the books or to muddle the tracks) or someone else’s? Whose and why did the money go there? The case is not proven yet, but a person is already questioned at the secret police office and in a totalitarian regime, the burden of proof is effectively on her. The paper paid for at p1 will never arrive, but several black cars will.

(The xx-xa-xb-X split can be improved by sending parts of the amount to different addresses, then sending coins between them, and so on, but AFAIK this does not really help anything much as the coins are still being passed within a closed subgraph. Also, the dissident organization might try to avoid mixing independent sources in payments, but this may be difficult if you live from small donations.)


So, all is lost and hopeless, back to good old cash? Well, not necessarily. There are various approaches to combat the problem.

Solve part of it outside the box. Launder coins abroad. Spend them on legitimate-looking puppet causes while staying sure they either return to you or get sent the right way. Repeat and rinse several times. Preferably include real people with googleable Bitcoin addresses early in this chain to completely dissociate yourself from the coins in the public record. Of course, you need some real-world cooperation and friends worldwide. Similar approaches like dissociation by re-selling goods may be considered, but they seem very cumbersome, error-prone and likely losing money (or needing large investments and much time). This all could become a lot easier when paying with bitcoins in person becomes commonplace.

An obvious scheme to muddle the tracks is a mixin service (an escrow service might do this as a side-effect). Many people send in coins, they are all included in a single transaction and this transaction has many outgoing addresses for each user sending in money. It will split large pile of possibly suspect money to many small piles that can be used separately, and in the transaction it will be completely unclear which of the many source addresses is the source of the coins. Repeat few times with different sets of co-mixers and you have pretty much perfectly fogged the origins of the money.

It sounds great but it is fraught with many practical issues. Most obviously, you need to trust the service with your money; you send it in, it might arrive back or maybe not (they are scammers, or maybe just have technical trouble or got hacked). It should definitely be set up in some “safe” country. And still, you need to trust it is not logging anything, nor is its ISP, and nor is your country’s Great Firewall (i.e., use tor). Your government in fact secretly running this service would be a masterstroke. And of course, you absolutely need the “many people” part as well, so that you are actually mixing the coins with others. But these “many people” should be mainly honest persons with legal coin targets, there is not that much point in mixing if you are mixing just with crooks. For the same reason, “many honest people” will be reluctant to get into the trouble needlessly and mix with the crooks and you. An escrow service with mixing as a side-effect could alleviate this issue somewhat.

But what about a more elegant way? When making a transaction, you can pay a transaction fee, mainly to ensure speedier inclusion in a block. The transaction fee goes to whoever mines the block that includes the transaction. There is no upper bound to a transaction fee. We will spill coins in the fees.

The idea is similar to a mixin service; but instead of requiring voluntary participation, we use the txfees and the whole block chain as a mixin platform. In exchange, we concede that we cannot have all the money, but only some large fraction (a third? a half?). We will produce dummy transactions that move all the coins to the txfee, then collect part of it back in block generation payouts. This is completely legitimate and automatic; blocks generated by us and by other solo miners (pools are easily identifiable) are indistinguishable. Some of the money will return as clean laundry while the rest goes to support the Bitcoin system.

The practical implementation is simple. First, one needs to choose the ratio. The smaller the ratio, the harder to track the most common destination of the spillage, but the less money you get, obviously. Then, you create many small fee-spilling transactions transactions. (More means better laundered for further use, but take more time to get.) You keep part of them for your own blocks and broadcast the rest for other miners to pick up. You can get very smart about this, scanning the p2p network to broadcast to solo miners preferentially (you will be better masked), do some probabilistic modeling on the blocks you create, and so on. The big catch is being able to produce blocks at sufficient pace. The answer is either making a huge investment and building a massive solo rig, or just renting one! There are good rigs for rent. It is again important to rent abroad and make sure you cover your tracks well when paying rent. Of course, this is more practical for the accepting organization than the individual donator.


This proposal is far from perfect, but it should work and I have not seen it mentioned yet. My main point is that there may be more tricks like this and the anonymity discussion might yet get interesting.

Categories: software Tags: ,

Bitcoin Transaction Speed vs. Double-Spending

August 12th, 2011 1 comment

In the next few days, I will be posting a couple of trivial short thoughts on some frequently debated “issues” of Bitcoins that keep coming up e.g. on #bitcoin and that I reacted to.

The basis of Bitcoin are “blocks”. Sometimes, they are regarded as just a magic device for getting bitcoins by the mysterious activity of mining. But the main purpose of blocks is to build a trustworthy distributed database of bitcoin transactions. The trustworthiness comes from the blocks, which confirm the transactions by a “proof of work” – non-trivial computational effort.

The reason we want this “trustworthiness”? We are tackling one of the major problems of distributed currencies – double spending. You have a coin tied to an address; at the “same moment”, you could send two transactions to two nodes of the network, in one sending it to Alice, in the other sending it to Bob. Which of the transactions becomes accepted by the network? Half of the network might think you send the money to Alice, half that it goes to Bob. The point of the “block chain” is that no conflicting transactions appear in it, and the transaction with the most computing power behind it wins.

Unfortunately, you might need to wait for a block with your transaction in it for anything from five minutes to 1.5 hour, and for your transactions to be trustworthy enough, some followup blocks need to be accepted (the “confirmations”). This is a nuisance if you want to make real-time with bitcoins, of course, be it buying a soda can in vending machine or hefty TV in an electronics shop. It is also sometimes used as an argument against Bitcoin.

But there are two ways to deal with the issue. The first way, even mentioned at the Bitcoin wiki, is just not caring about the problem. If the payment is just few bitcents for groceries, it is just not worth the hassle. Hassle drives away customers and the loss may be greater than an occassional double-spend cheater makes. The modern society (including credit card companies) relies on the fact that people tend to be honest when there is non-infinitesimal risk.

In case of a hefty TV, non-infinitesimal risk may be worth it. But the solution is again no rocket science: a trusted third party. Two schemes may be possible – with a flexible plan, the service would have a contract with the customer and the merchant. At the time of payment, the service will send the money to the merchant and the customer will pay to the service. If money do not arrive, the service will sue the hell of the customer; the merchant gets their money regardless. Thinking about it, this works a lot like credit cards.

With a fixed plan, there is even less hassle for the customer. They just make a deposit to the third party – this is an upper limit to their spending. When they purchase an expensive TV set, the merchant receives guaranteed payment from the third party, while the customer sends the money to the service. If money do not arrive, it gets reimbursed from the deposit. An initial deposit (and thus more trust in the third party) is required, but the customer may remain anonymous and the service can have smaller legal department.

Bitcoin Spending Insurance could be a suitable name for one service like this, but you could have countless, like you have multiple credit card companies and many banks. On the merchant side, another service might aggregate them conveniently, and on the customer side, they may compete with their bussiness plans, bonuses, advertisements and other wretched capitalistic means. ;-)

I.e., this reduces the decentralization of Bitcoin, but so do currency exchanges and other services – it is an optional addon and if the users do not mind the wait, they are free to avoid it. In short, no need to worry about the fact that transactions are not real-time.

Next: Bitcoin and Anonymous Transactions

Categories: software Tags:

I use 6to4 – why are my applications still preferring IPv4?

July 4th, 2011 1 comment

I found out about this curious behavior almost a month ago during the World IPv6 Day. I was surprised about this, even though I really shouldn’t be, given that I was fixing some bugs in the glibc implementation of this mechanism only few months earlier. ;-)

If you are not bothering with tunnel brokers anymore and are using 6to4 for your IPv6 connectivity like me, you might have noticed that your applications still prefer IPv4, disappontingly. You can use getent ahosts www.brmlab.cz (or a different host) to see the list of addresses in the order your applications will most likely try to connect by default.

The key mechanism in play here is the RFC3484 getaddrinfo(3) address selection mechanism; on GNU/Linux system, it is described (and configurable) in /etc/gai.conf. The aim of the mechanism is to choose the most suitable pair of source and destination addresses; this is the place where we can choose whether to prefer IPv4 or IPv6, that if we can talk to localhost, we should do it that way, or to talk to link-local addresses using link-local addresses too.

When choosing a destination address, each is marked by a label and preference. First, if there is a destination address with the same label as its “best” source address, such addresses are preferred. From these candidates, the address with the highest preference is picked.

You can read up on the full details in the RFC. In a sense, the label differentiates between multiple transports; IPv4, normal IPv6 (2001::/16, or rather ::/0 minus a lot of exceptions), 6to4, link-local and localhost are all such separate transports. This mechanism for example makes sure that IPv4 is preferred to normal IPv6 in case we have IPv4 address, but only link-local IPv6 available. And the important point is that 6to4 is differentiated. If a system has both normal IPv6 and 6to4 configured, normal IPv6 is used for normal IPv6 destinations while 6to4 is used for 6to4 destinations. The side effect is that if IPv4 and 6to4 addresses are available, IPv4 will be preferred to IPv6 destinations.

I’m not sure about the exact motivation for this, but it does make sense. It reduces the load on the relay servers that route between 6to4 realm and native IPv6 internet; if 6to4 addresses talk to each other, they connect over IPv4 directly, without need for relay servers. Also, sometimes the relay servers can be topologically far away on the IPv4 internet, slowing down IPv6 communication. And while IPv6 is cool, since your traffic is going over IPv4 part of the way anyway (to the nearest 6to4 relay), it makes no sense to artificially switch to IPv6 for the rest of the trip if you can just use IPv4 all the way.

But, if you have no native IPv6 and want to prefer 6to4 to IPv4 communication – since IPv6 is cool – you can tweak your /etc/gai.conf:

#label ::/0          1
#label 2002::/16     2

Just uncomment the 2002::/16 line and change its label from 2 to 1. Then it will have the same label as the “normal” IPv6 internet. Its behavior will be suboptimal in some cases and you shouldn’t deploy this thoughtlessly, but if you just do this on your personal workstation, it is a way to get the warm “I’m using IPv6 – somewhat” feeling.

Categories: linux Tags: ,

Multiple Kerberos realms on single host

June 9th, 2011 No comments

We are using Kerberos for central authentication at our university department; recently, we have set up a second realm for authentication to other services than UNIX logins (e.g. webmail) – we did not like the idea of reusing the same passwords.

Setting up a second realm has been pretty straightforward, just add the other realm to /etc/krb5.conf, write explicit port number to admin_server and kadmin_port options, add another init script for secondary admin server and… it almost works.

But suddenly, users find themselves unable to change their password within the original realm using passwd or kpasswd, with less than helpful Authentication error: Failed reading application request. And that’s the reason I’m writing up this, since you explicitly need to set the totally undocumented option kpasswd_port to a different value than 464! This is because kadmind provides the kadmin service on TCP port (749 by default), but also chpw service on UDP port (464 by default).

Categories: linux Tags: , ,

Repairing git cherry-pick authorship information

May 27th, 2011 No comments

I spent just my last night going through few months worth of patches and cherry-picking the bugfixy ones to glibc’s release/2.11/master. But I was tired and didn’t pay attention to git’s messages, so at the end of the evening, I noticed that for all conflicting patches, I have done git commit -a instead of git commit -a -c commitid. This had a definite advantage since the “(cherry picked from commit …)” notices inserted by git cherry-pick -x got preserved, but also a very definitive problem – the author name and date info for each commit was wrong.

(Note that AIUI, 1.7.5 cherry-pick might not have this problem anymore. I’m still using 1.7.4, content with Debian’s packaged version nowadays.)

Due to the -x lines, we still have mapping to original history. Therefore, some scripting should fix this quickly. And sure enough…! Maybe this recipe will come useful to someone:

git filter-branch --commit-filter '
  if [ "$GIT_AUTHOR_NAME" = "Petr Baudis" ]; then
    # Author of this commit is wrong! We could also simply correct
    # all commits containing the "cherry picked" notice.
    cat >/tmp/logm$$ # save log message
    ocommit="$(sed -n '\''s/^(cherry picked from commit \(.*\))$/\1/p'\'' </tmp/logm$$)"
    # Load original authorship information:
    IFS=: read GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE \
        <<<"$(git log -1 --pretty=format:"%an:%ae:%at" $ocommit)"
    # Redo the commit:
    export GIT_AUTHOR_NAME GIT_AUTHOR_EMAIL GIT_AUTHOR_DATE
    git commit-tree "$@" </tmp/logm$$
    rm /tmp/logm$$
else
    git commit-tree "$@" # preserve commit intact
fi' c55cc45ed76603b380489ee8c91ab5dce92e92f1..HEAD

Note that this requires that /bin/sh is bash (which may NOT be the case on debian!). Otherwise, you need to rewrite the <<< bit.

The c55cc45ed… commit is the first wrong cherry-pick. You may omit that altogether if you wish but the complete branch history is going to be rewritten. Also note that you should never rewrite commits that are already pushed out to a public place.

Categories: linux, software Tags: ,