October 24th, 2014 No comments

In part of my research, I have been heavily involved with building portfolios of optimization algorithms. Optimization algorithms stay at the root of many computational tasks, from designing laser mirror systems to neural network training. We want to find a minimum (or maximum) of some mathematical function, and for some functions it’s easier than for others.

For very many fairly hairy functions, the best state-of-art optimization algorithm is based on genetic algorithms and it’s called CMA-ES. It also has a very nice Python implementation by its original author, Nikolaus Hansen.

CMA-ES is still not as good as it could be on some functions with many local optima, but its performance can be much improved by establishing a restart strategy that will repeatedly restart it with varying population size and parameters. The best performing restart strategy is BIPOP-CMA-ES and unfortunately, it had no Python implementation so far. I took care of that more than a month ago, but since it’s taking some time to get my modifications upstreamed, if anyone would find that useful,

here is a patch for CMA-1.1.02 adding BIPOP restart strategy

Categories: software Tags: , , , ,

A 16-color default-ish vim color scheme for xterm-256color

May 20th, 2014 2 comments

I recently switched to xterm-256color in my konsole, but I found that vim looks exceedingly ugly, unfortunately. The colors were all washed out and difficult to read with reduced brightness. I decided to explore some alternative color schemes, including popular ones like solarized etc., but they just don’t work for me – I have the default color scheme burned into my mind and I really like its high contrast even though I can easily stare at it for 12+ hours in a row. It also works great even in adverse light coditions on a notebook.

In the end, I didn’t find any dark vim color scheme that would just look like its default 16-color color scheme. (Light background color scheme looks mostly the same in 16 and 256 colors by default.) So I had to create my own – 256like16.vim, drop it in ~/.vim/colors. You may want to edit yours to add some more bolds to make it look exactly like the 16-color scheme, but I ended up liking this one more, after all.

(You will need to install colorsupport.vim so that GUI color settings are used in the 256-color terminal. This particular script worked by far the best for me, and :ColorSchemeBrowse is also great when exploring schemes.)

Categories: linux Tags: ,

CLIPBOARD cut’n’paste in xterm

May 16th, 2014 No comments

Call me old-fashioned but I’m still using xterm on my desktop computer (where I use just fluxbox as my window manager) – it suits me just fine, but for one thing that I finally managed to solve. xterm by default ignores the clipboard, and none of the previously published solutions cut it for me, until now.

In X11, we have two commonly used selection buffers – PRIMARY and CLIPBOARD:

  • PRIMARY is used when you simply highlight text in most applications, without pressing anything, and you can paste from it using the middle mouse button; it is of fleeting nature and used for quick cut’n’paste; and it doesn’t work well with all applications, e.g. libreoffice doesn’t put highlighted stuff there at least in some contexts and non-textarea HTML5 text edit widgets usually can’t handle the middle button for pasting.
  • CLIPBOARD is used when you use ctrl-c or ctrl-v and can be used even with the evil applications above, but the problem is it’s not supported by xterm well!

In most terminal emulators, you can use the clipboard either using menus or shift-ctrl-c / shift-ctrl-v. However, in xterm, the best you can do is either…

  • Make it use CLIPBOARD just instead of PRIMARY and in the same manner – the moment you select any text in xterm, it will plaster it over whatever else was in the CLIPBOARD before, without any explicit action. This sucks.
  • Have a different set of bindings for selection to PRIMARY and CLIPBOARD. This is a lot better, but I’m out of modifiers since I use shift to cut’n’paste in terminal applications that use mouse themselves (e.g. elinks).

So, my solution is to bring in the shift-ctrl-c / shift-ctrl-v bindings! In your ~/.Xresources or ~/.Xdefaults, add

XTerm*VT100.*translations:      #override \
        Shift Ctrl C: select-end(CLIPBOARD, CUT_BUFFER0) \n\
        Shift Ctrl V: insert-selection(CLIPBOARD, CUT_BUFFER0)

(and don’t forget to xrdb ~/.Xresources afterwards).

Now, you can use shift-ctrl-v for pasting from CLIPBOARD, and almost use shift-ctrl-c for copying to clipboard. There is a catch – you must press shift-ctrl-c while you are still holding the mouse button, i.e you press left mouse button, drag your selection, then before releasing it, press shift-ctrl-c; thankfully, that can be done by one hand without too much cramping.

It’s a bit inconvenient because of this bug, and doesn’t quite work with left and right selection; maybe I will sometime get around to adding true clipboard support to xterm code, but I think this is good enough for me at this point. :)

Categories: linux Tags: , ,

speedread – A simple terminal-based open source Spritz-alike.

March 2nd, 2014 1 comment

A few hours ago, I have read about Spritz, a new speed-reading app, and I was quite impressed by the idea. I know the underlying concept is not new and I didn’t even try out Spritz itself (they announce their idea but not release the software, huh?), but this was the first time I have heard about it and I really liked it!

So I decided to implement my own terminal version of this idea, acting as a regular command-line filter. Find the new tool speedread at:

(Yes, it may not work well for beletry. Or slides. Yes, it may not work well for emails that you want to just skim for keywords. But then there’s the other 80% of text I need to chew through that does not fall in either category. I’ll have to continue trying it out for longer but it might be really useful.)

Meanwhile, I have also learned about OpenSpritz, a web-based implementation of Spritz. Can be a good match for speedread!

Categories: software Tags: , , , , ,

SMTP from Exim-equipped roaming notebook (SSH smarthost)

February 13th, 2014 1 comment

I don’t send email from my notebook often, dealing with my correspondence on my server machine via ssh. When I need to do it, it’s usually when I’m sending Git patches or something like that. I didn’t meet much trouble with sending it directly, but SMTP servers of Debian-involved people are some of the most picky one can meet and I decided it’ll be best if I switch the exim4 on my notebook to smarthost mode where all mail is relayed via my main server.

So that should be trivial to do, right? Wrong, apparently. I figured I’d use SMTP auth, but it just seems mind-bogglingly complicated to configure if you don’t want to spend an evening on it. The client part is fairly easy (probably both on exim4 and postfix), but setting up postfix server to do SMTP auth (for just a single person) is really silly stuff. Maybe not so crazy if you use PAM / shadow for authentication, but that means that on my notebook, I’d have to store (in plaintext) my server password anyone could use to log in – no way. It seems I could switch to Dovecot and somehow pass it a simple password to use, but at that point my patience ran out and I just backed off a litle.

Why not just use ssh for smarthost SMTP transport? Authentication via ssh is something everyone understands nowadays, it does the best job there, no silly passwords involved and you can just pipe SMTP through it. You wouldn’t do that at in a company setting with Windows notebooks, but for a single geek, it seems ideal.

Someone already did set up ssh as exim transport, but that’s for exim3. So here follows a super-quick HOWTO to do this with exim4:

  • Set up ssh key on client:
    sudo -u Debian-exim /bin/bash
    ssh-keygen # go with the default, and empty password, this will be used in an automated way
    ssh # to fill up known_hosts; it will fail yet
    cat ~/.ssh/ # this is my public key
    exit # ..the sudo
  • Set up ssh key on server – paste the public key printed by the cat above to ~me/.ssh/authorized_keys and prepend command="nc -w1 localhost smtp",no-agent-forwarding,no-port-forwarding,no-X11-forwarding to the key line. This key can now be used only for mail relaying.
  • Do dpkg-reconfigure exim4-config and configure smarthost mode. Also use it to find out whether you are using split or big configuration. You will also probably want to enable “mailname hiding”, otherwise your return-path will contain an unroutable address.
  • Set up ssh transport in exim4 – add the following to the config file:
      debug_print = "T: ssh_pipe for smarthost delivery"
      driver = pipe
      path = "/bin:/usr/bin:/usr/local/bin"
      command = "ssh nc -w1 localhost smtp"
      message_prefix = "HELO\r\n"

    (it would be nicer if we used the actual smarthost configuration option value and our notebook’s hostname instead of hardcoded strings, I guess).

  • In the smarthost: section of the configuration file, replace transport = remote_smtp_smarthost with transport = ssh_pipe.
  • /etc/init.d/exim4 reload and voilá, sending mail from anywhere should work now!

I *wish* setting up roaming SMTP nodes would be way easier nowadays and I wouldn’t have to eventually spend about 90 minutes on this stuff…

Categories: linux Tags: , , ,

systemd: journal listing on /dev/tty12

February 12th, 2014 5 comments

Inspired by the Debian CTTE deliberations on the new default init for Debian, I installed systemd on my notebook after tonight’s forced reboot and played with it a little.

(And I like it! I was very sceptical when hearing about systemd first, but after reading a lot of discussions and trying it myself, I find most of the problematic points either fixed already or a load of FUD. The immediate big selling point for me is actually journald, it and its integration with systemctl is really awesome. I’ll actually find systemd more useful on servers than desktops, I think.)

While it’s a nice exercise for anyone wanting to get familiar with systemd, I still decided to share a tidbit – service file that will make log entries show up on /dev/tty12. Many people run with rsyslogd set up for this, you’ll want to disable that (by default, all journal entries are forwarded to rsyslog). The advantage of showing journal entries instead is mainly color coding. :)

The file listing follows, or get it here.

# Simple systemd service that will show journal contents on /dev/tty12
# by running journalctl -af on it.
# Install by:
#  - Saving this as /etc/systemd/system/journal@tty12.service
#  - Running systemctl enable journal@tty12
#  - Running systemctl start journal@tty12
# journald can also log on console itself, but current Debian version won't
# show timestamps and color-coding.
# systemd is under LGPL2.1 etc, this is inspired by getty@.service.

Description=Journal tail on %I
After=systemd-user-sessions.service plymouth-quit-wait.service systemd-journald.service

# On systems without virtual consoles, don't start any getty. (Note
# that serial gettys are covered by serial-getty@.service, not this
# unit

# the VT is cleared by TTYVTDisallocate
ExecStart=/bin/sh -c "exec /bin/journalctl -af > /dev/%I"

# Unset locale for the console getty since the console has problems
# displaying some internationalized messages.


(P.S.: Creating this service file – my very first one – took me 10 minutes total, including studying documentation and debugging two stupid mistakes I made.)

Categories: linux Tags: , ,

GPS souřadnice českých měst a obcí

February 1st, 2014 10 comments

Pro zobrazování poloh dopadů meteosond na IRC jsem potřeboval v jednoduchém CSV formátu seznam souřadnic českých měst, ale ukázalo se, že je překvapivě obtížné něco takového získat. Sice existuje tabulka na jednom astronomickém webu, výběr tam zahrnutých obcí je ale docela divný, někde je místo obce jen její část, atd.

Nakonec jsem zvolil postup “udělej si sám”, a to kombinací seznamu na Wikipedii, Google Geocoding API a trochy XPath.

Seznam rozumné podmnožiny měst mohu získat třeba pomocí:

curl '' |
  sed -ne 's/^# \[\[\([^]|]*|\)*\([^]]*\)\]\].*/\2/p' | sort

Mám-li zase jméno obce, její souřadnice mohu získat tímto zaklínadlem:

m=Aš; curl -s ''"${m// /+},+CZ"'&sensor=false' |
  xmllint --xpath '//location[lat or lng]//text()' -

(Důležitý trik je to ,CZ, jinak bude Google znát spoustu Kolínů a Aš bude znamenat Americká Samoa. Alternativně si můžete z výsledků vyfiltrovat ty české pomocí XPath //result[address_component/short_name/text()="CZ"]/geometry/location[lat or lng]//text().)

Teď už to pro vygenerování jednoduchého CSV stačí spojit dohromady:

curl '' |
  sed -ne 's/^# \[\[\([^]|]*|\)*\([^]]*\)\]\].*/\2/p' | sort |
  while read m; do
    echo -n $m
    curl -s ''"${m// /+},+CZ"'&sensor=false' |
      xmllint --xpath '//location[lat or lng]//text()' - |
      tr -s '\n' ' ' | tr ' ' ','
    sleep 0.1
  done | sed 's/,$//'

Rádi byste hotové CSV?

Bonus: Podobně vygenerované CSV s pražskými částmi (katastrálními územími).

Bonus 2: A ještě CSV s obcemi s přenesenou působností (další velké obce a města)

Categories: linux Tags: , , , , ,

Brmson / BlanQA

January 27th, 2014 No comments

I have recently been dabbling in Natural Language Processing, in particular Question Answering. I have been fascinated by the success of IBM Watson and have gradually came to believe that this technology can serve as a great basis of autonomous agents operating in the complex world of human knowledge. (I later came across Project Aristo – I’m not alone.) This approach, compared to projects like OpenCog that aim to create autonomous agents understanding and operating in the physical world, seems to offer many advantages – but let’s talk about that some other time.

Let’s say we wanted to take a stab on approximating IBM Watson with easily available technology, in “at home” conditions (or rather, “at hackerspace” – I gave this aim a temporary callsign “Project Brmson”). What’s the best we can do?

So I took a look at the current open source question-answering technologies and found – well, just one, and none that would be immediately usable by anyone. I have put together a short survey of the current landscape.

The only OSS framework I found that (i) could be used with not-so-many modifications to produce something functional, and (ii) would be a good base to build a truly good system on, is OAQA / OpenQA. It seems appealing from multiple viewpoints – it builds on the UIMA unstructured data processing platform which is also at the basis of IBM Watson, it originates at CMU which collaborated with IBM in this area; and, well, it’s the only platform that already exists anyway, so it’s a good starting point for someone who has no prior clue about the field. A honorable mention goes to OpenEphyra, basically a non-UIMA OAQA predecessor by the same institution; it’s not a good base to use for new systems, but can be sourced for a lot of NLP functionality.

In my first stab, I looked if there is actually a working QA system built on top of OAQA, and the answer was non-obvious. There is a helloqa project, but its master branch can currently do nothing useful. However, there is also a prototype branch that can actually answer some terrorism-related questions! It doesn’t work out of the box, but our fork does if you follow the instructions. But overally the project seems to be a bit of a hack and not a good base for a universal system usable by anyone but the original author.

So I set out to rewrite the helloqa-prototype from scratch on top of OAQA and build a different, clean and extendable QA pipeline (that shares bits of the original code and is much simpler). Thus, behold the project BlanQA! :-)

BlanQA is focused on universality, practicality and user-friendliness. That means there is a relatively detailed documentation and easy to follow installation instructions (try BlanQA out yourself!). By default, BlanQA offers interactive mode and will answer on top of Project Gutenberg corpus; but you can also connect it to IRC (#brmson @ freenode) or run on top of Wikipedia.

BlanQA is still a very stupid program at this point. It gets the answer right about 10-30% of the time, depending on how nicely you ask. But it’s more important as a base on top of which you can add clever algorithms (the smartest parts of BlanQA are currently outsourced from the OpenEphyra project, mainly guessing the type of the answer – is it a person? location? amount of something?). And if you want an OSS question-answering engine now, BlanQA is where to turn!

I want to develop this further, but the way ahead remains a little unclear. The thing is, OAQA appears to have significant architectural problems, as I realized while I continued hacking BlanQA and learning more about both OAQA and the UIMA framework it builds on top of. The rest of this section is a bit technical, c.f. also a quick intro to BlanQA architecture.

The basic UIMA principle is that each artifact (in this case: question, document/passage, answer) should have its own CAS (“piece of data” with a set of annotations and other featuresets derived from it) with a dedicated type system and appropriate Sofa (view of this piece of data). This would enable easy creation of stand-off annotations of e.g. fetched documents.

However, the OAQA model works with just a single CAS that has just the question text set as a Sofa and then a variety of types mashed together, partitioned only into phase-based views. This seems to me as a substantially less appealing option – it doesn’t allow to use third-party UIMA annotators that expect their subject to be the Sofa, it might be harmful for scaleout and it seems generally awkward to use; I actually have hard time seeing what advantages does using UIMA bring on the table in this model.

So it seems the way forward for BlanQA (or likely a differently-named successor) is to break away of OAQA and build directly on top of UIMA (possibly with a hacked version of uima-ecd that supports multiple CAS, but that seems as a bit intimidating proposition).

Tue Jan 28 2014 update: Note that we have started work on a new Question Answering engine YodaQA built on UIMA from scratch.

Categories: software Tags: , , , , , ,

Weathersonde – Nearby Landing Notification

January 26th, 2014 No comments

At our hackerspace brmlab, one of the things we do is picking up landed weather sondes. In short, fun hardware literally falling off the sky, several times a day, every day. These are stratospheric balloons used for weather data prediction, launched from various sites, that reach the 35km altitude, then the balloon bursts and it lands back on the ground at a random location. At the whole time, it transmits its current GPS coordinates via radio, making this a rather exciting sub-class of geocaching.

As a simple hack today (idea by chido), I created a simple script that is designed to be run three times a day, runs sonde trajectory prediction (a service – example) and if the sonde is predicted to land in a certain radius, reports that with a link to the prediction. By default, it is connected to jendabot, one of our brmlab IRC robotic minions, written in an appealingly crazy way as a collection of bash scripts.

Categories: life, software Tags: , , , , , ,

Mice TV!

January 20th, 2014 1 comment

Chido has two mouse-pets (Acomys Caihirinus, actually) and we finally did what we already planned to do long ago:


A live video stream of our mouse palace!

Some technical trivia – the IP cam used is Edimax IC-3110 (it’s pretty crappy, not recommended) and we are restreaming using vlc invocation:

cvlc -L --sout "#transcode{vcodec=mp4v,vb=1024,scale=1}:duplicate{dst=http{mux=ts,dst=:8090/mouses.flv},select=noaudio}" --no-sout-rtp-sap --no-sout-standard-sap --sout-ts-shaping=1000 --sout-ts-use-key-frames --ttl=40 rtsp://admin:PASSWORD@192.168.6.X:554/ipcam.sdp

(I did not get h264 FLV stream working reliably, unfortunately. I tried #transcode{vcodec=h264,venc=x264{keyint=20},vb=4096,scale=1} and duplicate{dst=http{mux=ffmpeg{mux=flv},dst=...,select=noaudio}.)

Categories: linux Tags: , , ,