Keras for Binary Classification

January 13th, 2016 No comments

So I didn’t get around to seriously (besides running a few examples) play with Keras (a powerful library for building fully-differentiable machine learning models aka neural networks) – until now. And I have been a bit surprised about how tricky it actually was for me to get a simple task running, despite (or maybe because of) all the docs available already.

The thing is, many of the “basic examples” gloss over exactly how the inputs and mainly outputs look like, and that’s important. Especially since for me, the archetypal simplest machine learning problem consists of binary classification, but in Keras the canonical task is categorical classification. Only after fumbling around for a few hours, I have realized this fundamental rift.

The examples (besides LSTM sequence classification) silently assume that you want to classify to categories (e.g. to predict words etc.), not do a binary 1/0 classification. The consequences are that if you naively copy the example MLP at first, before learning to think about it, your model will never learn anything and to add insult to injury, always show the accuracy as 1.0.

So, there are a few important things you need to do to perform binary classification:

  • Pass output_dim=1 to your final Dense layer (this is the obvious one).
  • Use sigmoid activation instead of softmax – obviously, softmax on single output will always normalize whatever comes in to 1.0.
  • Pass class_mode='binary' to model.compile() (this fixes the accuracy display, possibly more; you want to pass show_accuracy=True to model.fit()).

Other lessons learned:

  • For some projects, my approach of first cobbling up an example from existing code and then thinking harder about it works great; for others, not so much…
  • In IPython, do not forget to reinitialize model = Sequential() in some of your cells – a lot of confusion ensues otherwise.
  • Keras is pretty awesome and powerful. Conceptually, I think I like NNBlocks‘ usage philosophy more (regarding how you build the model), but sadly that library is still very early in its inception (I have created a bunch of gh issues).

(Edit: After a few hours, I toned down this post a bit. It wasn’t meant at all to be an attack at Keras, though it might be perceived by someone as such. Just as a word of caution to fellow Keras newbies. And it shouldn’t take much to improve the Keras docs.)

Categories: ailao, software Tags: , , ,

My Conky setup

December 19th, 2015 No comments

screenshot

A couple of weeks ago, I have created my own fairly elaborate setup of the Conky system monitor. I have been wanting to fix up some of the weather display aspects, but I’m realistically not getting around to that anytime soon.

So, I have pushed it out to Github now.

Categories: linux Tags: , ,

Linked Data Mashups

November 13th, 2015 No comments

I’m still working on YodaQA and there is quite some interest in it in my mailbox. One thing leads to another and our startup Ailao already has a few first customers, we work together on various related semantic NLP / search projects.

In YodaQA, we have a much neater web interface as well as a mobile app as the natural way to interact with a QA system is using your voice. Plus, on a limited domain (movies), we are getting pretty close to crossing the 80% mark for accuracy on simpler questions, entering the “magic zone” where people might start really trusting the system. A few essential blocks for that are still in the pipeline, though.

I’ll try to post a bit more about YodaQA and other work we are doing in the coming weeks / months (as well as some of my hobby projects, of course).

For a course of Jan Šedivý, I prepared a presentation on building apps around the semantic web and linked data. See it here for an intro to the tech, it also includes two silly web mashups that might be inspiring.

Categories: ailao, software Tags: , , , , ,

YodaQA Question Answering

April 27th, 2015 1 comment

I was working on Question Answering last year. Guess what, I’m still on it!

I threw away my first prototype BlanQA and started building a second system, YodaQA. It currently has reasonable performance of answering about a third of trivia questions properly and listing the correct answer in top five candidates for half of the questions – without doing any googling or binging.

A few weeks ago, I published the first paper on YodaQA. With a few fellow scientists, we also re-started the qa-oss Google Group on open source question answering systems.

Today, I finally made a proper homepage for YodaQA and launched a live demo of the system. It’s pretty primitive, but hopefully will serve as a proof of concept.

Categories: ailao, software Tags: , ,

Michi – 15×15 ~6k KGS in 540 lines of Python code

March 25th, 2015 No comments

So what’s the strongest program you can make with minimum effort and code size while keeping maximum clarity? Chess programers were exploring this for long time, e.g. with Sunfish, and that inspired me to try out something similar in Go over a few evening recently:

https://github.com/pasky/michi

Unfortunately, Chess rules are perhaps more complicated for humans, but much easier to play for computers! So the code is longer and more complicated than Sunfish, but hopefully it is still possible to understand it for a Computer Go newbie over a few hours. I will welcome any feedback and/or pull requests.

Contrary to other minimalistic UCT Go players, I wanted to create a program that actually plays reasonably. It can beat many beginners and on 15×15 fares about even with GNUGo; even on 19×19, it can win about 20% of its games with GNUGo on a beefier machine. Based on my observations, the limiting factor is time – Python is sloooow and a faster language with the exact same algorithm should be able to speed this up at least 5x, which should mean at least two ranks level-up. I attempt to leave the code also as my legacy, not sure if I’ll ever get back to Pachi – these parts of a Computer Go program I consider most essential. The biggest code omission wrt. strength is probably lack of 2-liberty semeai reading and more sophisticated self-atari detection.

P.S.: 6k KGS estimate has been based on playtesting against GNUGo over 40-60 games – winrate is about 50% with 4000 playouts/move. Best I can do… But you can connect the program itself to KGS too:

http://www.gokgs.com/gameArchives.jsp?user=michibot

Categories: software Tags: , , , ,

Suspend out-of-focus Firefox to save battery from useless CPU usage

November 29th, 2014 2 comments

My Firefox (or rather Iceweasel) is prone to constantly spinning and eating about 50-70% CPU on average when should be supposed to just sit idle. I tried to find the root cause, but Javascript profiler sees nothing and other forays didn’t end up with much (I discovered that spinning progress wheel when some tab is forever loading amounts to about 20% CPU, though). Not sure if always, but sometimes it spins within WebGLImageConverter::run() (no callgraph, sorry).

So in the end I decided to treat the symptoms instead. Cleaning my CPU fan helped with the noise, but the main problem is that running firefox brings my battery life to about 1/2 to 1/3. So one obvious solution would be to just stop the damn process when I don’t use it. I typically don’t do background downloads while on battery (or otherwise), so that means I want it stopped when the window is inactive – not in focus. This is surprisingly exotic idea, apparently, and not easy to do in most window managers.

I even tried switching to awesome or i3 window managers which should make this easy, but I’m psychologically not up to that; I think I’m too conservative, but I decided not to stick with that. I use the MATE desktop environment with marco window manager. Perhaps switching to sawfish would be a good option, but in the end I just decided to write a shell script that will periodically assess the situation and suspend or resume firefox as needed. Of course this introduces extra wakeups and ambient CPU load, but when powertop reports that my GoogleTalkPlugin (running at all times for whatever reason) wakes up 150 times per second, the powersaving situation on Linux is still too messy – so who cares?

Here goes the script, in the hope that it will be useful for someone else too. Run it in a terminal or backgrounded in your ~/.xprofile, it will stop the firefox process when out of focus for more than 10s and on battery, and resume it within a second when switching back. In practice, I found these timings completely acceptable so far, and didn’t notice any ill effects of constant STOP/CONT either.

firefox-suspender.sh:

#!/bin/bash
#
# firefox-suspender: Periodically check whether firefox is out of focus
# and STOP it in that case after a time delay; if in focus but stopped,
# send SIGCONT.
#
# (c) Petr Baudis <pasky@ucw.cz>  2014
# MIT licence if this is even copyrightable
 
loop_delay=1    # [s]
stop_delay=10   # [s]
 
last_in_focus=$(date +%s)
firefoxpid=
state=running
 
while true; do
  sleep $loop_delay
 
  # Get active window id
  window=$(xprop -root _NET_ACTIVE_WINDOW)
  window=${window#*# }
  # What kind of window is it?
  class=$(xprop -id "$window" WM_CLASS)
  # echo Active window $window, class $class
 
  if [[ "$class" =~ Navigator ]]; then
    # Firefox!  We know it is running.  Make sure we
    # have its pid and update the last seen date.
    # If we stopped it, resume again.
    if [ "$state" = stopped ]; then
      echo "$(date)  Resuming firefox @ $firefoxpid"
      if kill -CONT $firefoxpid; then
        state=running
      else
        firefoxpid=
      fi
    fi
    last_in_focus=$(date +%s)
    if [ -z "$firefoxpid" ]; then
      firefoxpid=$(pidof iceweasel)
    fi
    if [ -z "$firefoxpid" ]; then
      firefoxpid=$(pidof firefox)
    fi
 
    continue
  fi
 
  # Not Firefox!  If it's running, we are on battery and
  # it's been long enough, stop it now.
  if [ "$state" != running ]; then
    continue
  fi
 
  read battery </sys/class/power_supply/BAT0/status
  if [ "$battery" != Discharging ]; then
    continue
  fi
 
  if [ $(($(date +%s) - last_in_focus)) -ge $stop_delay ]; then
    echo "$(date)  Stopping firefox @ $firefoxpid"
    if ! kill -STOP $firefoxpid; then
      firefoxpid=
    fi
    state=stopped
  fi
done
Categories: linux Tags: , , , ,

BIPOP-CMA-ES Patch

October 24th, 2014 No comments

In part of my research, I have been heavily involved with building portfolios of optimization algorithms. Optimization algorithms stay at the root of many computational tasks, from designing laser mirror systems to neural network training. We want to find a minimum (or maximum) of some mathematical function, and for some functions it’s easier than for others.

For very many fairly hairy functions, the best state-of-art optimization algorithm is based on genetic algorithms and it’s called CMA-ES. It also has a very nice Python implementation by its original author, Nikolaus Hansen.

CMA-ES is still not as good as it could be on some functions with many local optima, but its performance can be much improved by establishing a restart strategy that will repeatedly restart it with varying population size and parameters. The best performing restart strategy is BIPOP-CMA-ES and unfortunately, it had no Python implementation so far. I took care of that more than a month ago, but since it’s taking some time to get my modifications upstreamed, if anyone would find that useful,

here is a patch for CMA-1.1.02 adding BIPOP restart strategy

Categories: software Tags: , , , ,

A 16-color default-ish vim color scheme for xterm-256color

May 20th, 2014 No comments

I recently switched to xterm-256color in my konsole, but I found that vim looks exceedingly ugly, unfortunately. The colors were all washed out and difficult to read with reduced brightness. I decided to explore some alternative color schemes, including popular ones like solarized etc., but they just don’t work for me – I have the default color scheme burned into my mind and I really like its high contrast even though I can easily stare at it for 12+ hours in a row. It also works great even in adverse light coditions on a notebook.

In the end, I didn’t find any dark vim color scheme that would just look like its default 16-color color scheme. (Light background color scheme looks mostly the same in 16 and 256 colors by default.) So I had to create my own – 256like16.vim, drop it in ~/.vim/colors. You may want to edit yours to add some more bolds to make it look exactly like the 16-color scheme, but I ended up liking this one more, after all.

(You will need to install colorsupport.vim so that GUI color settings are used in the 256-color terminal. This particular script worked by far the best for me, and :ColorSchemeBrowse is also great when exploring schemes.)

Categories: linux Tags: ,

CLIPBOARD cut’n’paste in xterm

May 16th, 2014 No comments

Call me old-fashioned but I’m still using xterm on my desktop computer (where I use just fluxbox as my window manager) – it suits me just fine, but for one thing that I finally managed to solve. xterm by default ignores the clipboard, and none of the previously published solutions cut it for me, until now.

In X11, we have two commonly used selection buffers – PRIMARY and CLIPBOARD:

  • PRIMARY is used when you simply highlight text in most applications, without pressing anything, and you can paste from it using the middle mouse button; it is of fleeting nature and used for quick cut’n’paste; and it doesn’t work well with all applications, e.g. libreoffice doesn’t put highlighted stuff there at least in some contexts and non-textarea HTML5 text edit widgets usually can’t handle the middle button for pasting.
  • CLIPBOARD is used when you use ctrl-c or ctrl-v and can be used even with the evil applications above, but the problem is it’s not supported by xterm well!

In most terminal emulators, you can use the clipboard either using menus or shift-ctrl-c / shift-ctrl-v. However, in xterm, the best you can do is either…

  • Make it use CLIPBOARD just instead of PRIMARY and in the same manner – the moment you select any text in xterm, it will plaster it over whatever else was in the CLIPBOARD before, without any explicit action. This sucks.
  • Have a different set of bindings for selection to PRIMARY and CLIPBOARD. This is a lot better, but I’m out of modifiers since I use shift to cut’n’paste in terminal applications that use mouse themselves (e.g. elinks).

So, my solution is to bring in the shift-ctrl-c / shift-ctrl-v bindings! In your ~/.Xresources or ~/.Xdefaults, add

XTerm*VT100.*translations:      #override \
        Shift Ctrl C: select-end(CLIPBOARD, CUT_BUFFER0) \n\
        Shift Ctrl V: insert-selection(CLIPBOARD, CUT_BUFFER0)

(and don’t forget to xrdb ~/.Xresources afterwards).

Now, you can use shift-ctrl-v for pasting from CLIPBOARD, and almost use shift-ctrl-c for copying to clipboard. There is a catch – you must press shift-ctrl-c while you are still holding the mouse button, i.e you press left mouse button, drag your selection, then before releasing it, press shift-ctrl-c; thankfully, that can be done by one hand without too much cramping.

It’s a bit inconvenient because of this bug, and doesn’t quite work with left and right selection; maybe I will sometime get around to adding true clipboard support to xterm code, but I think this is good enough for me at this point. :)

Categories: linux Tags: , ,

speedread – A simple terminal-based open source Spritz-alike.

March 2nd, 2014 No comments

A few hours ago, I have read about Spritz, a new speed-reading app, and I was quite impressed by the idea. I know the underlying concept is not new and I didn’t even try out Spritz itself (they announce their idea but not release the software, huh?), but this was the first time I have heard about it and I really liked it!

So I decided to implement my own terminal version of this idea, acting as a regular command-line filter. Find the new tool speedread at:

https://github.com/pasky/speedread

(Yes, it may not work well for beletry. Or slides. Yes, it may not work well for emails that you want to just skim for keywords. But then there’s the other 80% of text I need to chew through that does not fall in either category. I’ll have to continue trying it out for longer but it might be really useful.)

Meanwhile, I have also learned about OpenSpritz, a web-based implementation of Spritz. Can be a good match for speedread!

Categories: software Tags: , , , , ,