### Archive

Archive for the ‘software’ Category

With kind support of the Medialab foundation and Jan Šedivý, we are looking hard for students in Prague to work with us on a summer internship! We actually have two options for you:

• eClub Summer Camp (main option) – we have some ambitious projects and ideas for you to try out if you are excited by machine learning, big data and artificial intelligence. Exploratory, exciting, state-of-art research without required previous in-depth knowledge! (Just good basic math and programming.)
• Summer Job (auxiliary option, full-time coder) – we need help polishing the edges of some of our projects, seeking students that are skilled programmers.

We are mainly affiliated with FEL CVUT, but we also have students from MFF UK and we’ll welcome students from other Czech universities too. As long as you are a competent programmer, want to do something more than yet another Android game, and willing to come in person three times a week – let’s do something groundbreaking together!

Categories: Tags:

## YodaQA Grand Challenge!

Recently, the YodaQA team is collaborating with Falk Pollok from RWTH Aachen who is interested in using Question Answering in education to help people digest what they have learned better and to generally assist with studying. To this end, he has created PalmQA – a QA application that multiplexes between many question answering backends, ensembling them together to a more accurate system.

Falk has built backends for IBM Watson’s DeepQA among others (Google, Evi, Kngine and MIT’s Start), but in the end, the combination of YodaQA and Wolfram Alpha is “a match made in heaven,” as Falk said in an email a short while ago.

As a finishing touch to his work (being submitted as a diploma thesis), Falk made a Grand Challenge – letting an independent third party make a list of 30 factoid questions of varying difficulty, and pitching the PalmQA against a wide variety of humans. Perhaps not quite as dramatic or grandiose as IBM Watson’s Jeopardy participation, but still a nice showcase of where we are now.

Well, PalmQA did great! 26 people competed, and typically could get about 15 out of the 30 right. The best human answered 24 questions correctly. But no matter – PalmQA managed to answer 25 out of 30 questions right!

So, in this challenge, Falk’s ensemble-enhanced YodaQA beats the best human!

As mentioned above, PalmQA offers integration of YodaQA with Wolfram Alpha, Google QA, MIT’s Start, Amazon’s Evi and Kngine. We hope to merge this ensembling system into the YodaQA project in the future!

I also entered just plain YodaQA into the Grand Challenge, in the configuration that’s running at live.ailao.eu right now. It got 18 questions right, still better than an average human! If we also included purely “computational” questions (algebra, unit conversions) that YodaQA just isn’t designed to answer (it’s still essentially a search engine), that’d make 24 questions out of 30. Pretty good!

See the Grand Challenge Github issue for more info. We should get the complete details of the challenge, comparisons to other public QA engines (like Google) etc. in Falk’s upcoming thesis.

This is how the plain YodaQA fared:

Question Text correct found
What is the capital of Zimbabwe? Harare Harare
Who invented the Otto engine? Nikolaus Otto Nikolaus Otto
When was Pablo Picasso born? 1881 1881
What is 7*158 + 72 – 72 + 9? 1115 78.182.71.65 78
Who wrote the novel The Light Fantastic? Terry Pratchett Terry Pratchett
In which city was Woody Allen born? New York New York
Who is the current prime minister of Italy? Matteo Renzi Matteo Renzi
What is the equatorial radius of Earth’s moon? 1738 the Moon and Su
When did the Soviet Union dissolve? 1991 1991
What is the core body temperature of a human? 37 Bio 42 and cour
Who is the current Dalai Lama? Tenzin Gyatso Tenzin Gyatso
What is 2^23? 8388608 the Gregorian c
Who is the creator of Star Trek? Gene Roddenberr Gene Roddenberr
In which city is the Eiffel Tower? Paris Paris
12 metric tonnes in kilograms? 12 *000 SI
Where is the mouth of the river Rhine? the Netherlands the Netherlands
Where is Buckingham Palace located? London London
Who directed the movie The Green Mile? Frank Darabont Frank Darabont
When did Franklin D. Roosevelt die? 1945 1945
Who was the first man in space? Yuri Gagarin Yuri Gagarin
Where was the Peace of Westphalia signed? Osnabrück France
Who was the first woman to be awarded a Nobel Priz Marie Curie Elinor Ostrom
12.1147 inches to yards? 0.3365194444 CUX 570 17 577
What is the atomic number of potassium? 19 19
Where is the Tiananmen Square? China China
What is the binomial name of horseradish? Armoracia Rusti Armoracia Rusti
How long did Albert Einstein live? 76 Germany
Who earned the most Academy Awards? . Walt Disney Jimmy Stewart
How many lines does the London Underground have? 11 Soho Revue Bar
When is the next planned German Federal Convention 1850
Categories: Tags:

## YodaQA learned to tweet

April 4th, 2016 1 comment

Guest post by Petr Marek (source)

YodaQA learned how to use twitter during easter holidays. You can ask it by sending tweet with question to @askYodaQA . YodaQA will answer you shortly. How is it possible? I created app in the Google’s App Script, which handles receiving question from twitter and answering them.

Why did I create it? YodaQA can reach more users in the new interesting form thanks to it. I believe they will help us to find even more ways how YodaQA can help them. It is pretty symbiosis. YodaQA will help twitter users, and they will help it back. Let’s look how it is made.

### The two important tools

The most important ingredient was Google’s App Script. It is basically JavaScript with the connection to Google services. You can make your own App Script apps in Google Drive. The best thing is that you can make triggers run the app every minute for example. And it’s for free.

The second thing you need is to create twitter app on account, which your bot will use to communicate with its followers. It will grant you access tokens, which you need to connect to twitter API. I used Twitter Lib for Google Apps Script to simplify the communication with API. It allowed me to tweet and get tweets with questions easily. You just need to call the right function with some arguments.

### General idea behind

That was the tools that I used. But how did I make it work? I will describe the general idea now. I set App Script project to run my code every minute. The code does basically two things.

The first step is to obtain answers from twitter and to ask YodaQA. Bot searches all tweets with @askYodaQA. It saves the users that tweeted them and the time when it found the tweets. Then it sends the text of the tweet to YodaQA. YodaQA replies with dialog id and question id, which it saves to the list of questions.

The second step is to go through list of questions and to ask YodaQA for answers to these questions. Bot sends questions to users as soon as the answers are finished. You can even rely on features of the Hub, such as dialogs and coreference resolution. Two questions are connected to dialog when they are asked within five minutes interval.

I said that I save some information. Where? I used spreadsheet as memory. I use one sheet as “user memory”, the second as “asked question list” and the last as memory for the id of the last served tweet. I even log some information into a Google Docs text document. It may sound simple (and it is simple), but it works.

You can try it right now. Just tweet question with @askYodaQA and answer will arrive within few minutes. You can even use hashtags or mention other users. They will also receive the answer.

You can see the whole code on GitHub. You can use it and modify it for your own twitter bots too, maybe on your own data?

Categories: Tags:

## Dialog for YodaQA!

One of our great student interns at eClub/Ailao Petr Marek who also made the current YodaQA web interface is now working on adding a new element to our ecosystem – the Hub. This is an interface between the web app and the YodaQA system which takes care of various tasks that don’t fit a “pure question answering” system well. For example, it tracks dialog context or allows domain-specific question handling (if you want to add support for retrieving current traffic information, TV schedules or custom question transformations).

We see voice (and chat) as the perfect fit for question answering systems like YodaQA, and together with this, dialog comes naturally. This is why Petr M. has recently transformed our live QA interface to the dialog format (and it now goes through the Hub). The dialog tracking is internally still relatively simplistic from a scientific point of view, but it’s more than enough to already create a great impression. And right now, in a simple, tongue-in-cheek informal test, YodaQA does great compared to “competition”!

See the complete presentation!

Categories: Tags:

## Live Streaming to HTML5?

March 13th, 2016 1 comment

We have our mice TV now streaming our colony of mus minutoides at the canonical URL http://mice.or.cz/ but it would be nice if you could watch them in your web browser (without flash) instead of having to open a media player for the purpose.

I gave that some serious prodding. We still use vlc with the same config as in the original post (mp4v codec + mpegts container). Our video source is an IP cam producing mp4v via rtsp and an important constraint is CPU usage as it runs on my many-purpose server (current setup consumes 10% of one CPU core). We’d like things to work in Debian’s chromium and iceweasel, primarily.

It seems that in the HTML5 world, you have these basic options:

• MP4/H264 in MP4 – this *does not work* with live streaming because you need to make sure the browser receives a correct header with metadata which normally occurs only at the top of the file; it might work with some horrible custom code hacks but nothing off-the-shelf
• VP80/VP90 in webm – this works, but encoding consumes between 150%-250% CPU! even with low bitrates; this may be okay for dedicated streaming servers but completely out of the question for me
• Theora in Ogg – this almost works, but the stream stutters every few seconds (or slips into endless buffering), making it pretty hard to watch; apparently some keyframes are lost and Theora homepage gives a caveat that Ogg encoding is broken in VLC; the CPU usage is about 30%, which would have been acceptable

That’s it for the stock video tag formats, apparently. There are two more alternatives:

• HTTP Live Stream (HLS) has no native support in browsers outside of mobile, might work with a hack like https://github.com/RReverser/mpegts but you may as well use MSE then
• Media Source Extensions (MSE) seem to allow basically implementing decoding custom containers (in javascript) for any codecs, which sounds hopeful if we’d just like to pass mp4v (or h264) through. The most popular such container is DASH, which seems to be all about fragmenting video to smaller HTTP requests with per-fragment bitrate negotiation, but still completely codec agnostic. Re Firefox, needs almost latest version. Media players support DASH too.

So far, the best looking courses seem to be:

• Media server nginx-rtmp-module (in my case with pull directive towards the ipcam’s rtsp) with mpeg-dash output and dash.js based webpage. I might have misunderstood something but it might actually just work (assuming that the bitrate negotiation could always end up just choosing the ipcam’s fixed bitrate; something very low is completely sufficient anyway).
• Debug libogg + libtheora to find out why it produces corrupted streams – have fun!
Categories: Tags:

## Keras for Binary Classification

January 13th, 2016 1 comment

So I didn’t get around to seriously (besides running a few examples) play with Keras (a powerful library for building fully-differentiable machine learning models aka neural networks) – until now. And I have been a bit surprised about how tricky it actually was for me to get a simple task running, despite (or maybe because of) all the docs available already.

The thing is, many of the “basic examples” gloss over exactly how the inputs and mainly outputs look like, and that’s important. Especially since for me, the archetypal simplest machine learning problem consists of binary classification, but in Keras the canonical task is categorical classification. Only after fumbling around for a few hours, I have realized this fundamental rift.

The examples (besides LSTM sequence classification) silently assume that you want to classify to categories (e.g. to predict words etc.), not do a binary 1/0 classification. The consequences are that if you naively copy the example MLP at first, before learning to think about it, your model will never learn anything and to add insult to injury, always show the accuracy as 1.0.

So, there are a few important things you need to do to perform binary classification:

• Pass output_dim=1 to your final Dense layer (this is the obvious one).
• Use sigmoid activation instead of softmax – obviously, softmax on single output will always normalize whatever comes in to 1.0.
• Pass class_mode='binary' to model.compile() (this fixes the accuracy display, possibly more; you want to pass show_accuracy=True to model.fit()).

Other lessons learned:

• For some projects, my approach of first cobbling up an example from existing code and then thinking harder about it works great; for others, not so much…
• In IPython, do not forget to reinitialize model = Sequential() in some of your cells – a lot of confusion ensues otherwise.
• Keras is pretty awesome and powerful. Conceptually, I think I like NNBlocks‘ usage philosophy more (regarding how you build the model), but sadly that library is still very early in its inception (I have created a bunch of gh issues).

(Edit: After a few hours, I toned down this post a bit. It wasn’t meant at all to be an attack at Keras, though it might be perceived by someone as such. Just as a word of caution to fellow Keras newbies. And it shouldn’t take much to improve the Keras docs.)

Categories: Tags:

I’m still working on YodaQA and there is quite some interest in it in my mailbox. One thing leads to another and our startup Ailao already has a few first customers, we work together on various related semantic NLP / search projects.

In YodaQA, we have a much neater web interface as well as a mobile app as the natural way to interact with a QA system is using your voice. Plus, on a limited domain (movies), we are getting pretty close to crossing the 80% mark for accuracy on simpler questions, entering the “magic zone” where people might start really trusting the system. A few essential blocks for that are still in the pipeline, though.

I’ll try to post a bit more about YodaQA and other work we are doing in the coming weeks / months (as well as some of my hobby projects, of course).

For a course of Jan Šedivý, I prepared a presentation on building apps around the semantic web and linked data. See it here for an intro to the tech, it also includes two silly web mashups that might be inspiring.

Categories: Tags:

April 27th, 2015 1 comment

I was working on Question Answering last year. Guess what, I’m still on it!

I threw away my first prototype BlanQA and started building a second system, YodaQA. It currently has reasonable performance of answering about a third of trivia questions properly and listing the correct answer in top five candidates for half of the questions – without doing any googling or binging.

A few weeks ago, I published the first paper on YodaQA. With a few fellow scientists, we also re-started the qa-oss Google Group on open source question answering systems.

Today, I finally made a proper homepage for YodaQA and launched a live demo of the system. It’s pretty primitive, but hopefully will serve as a proof of concept.

Categories: Tags:

## Michi – 15×15 ~6k KGS in 540 lines of Python code

So what’s the strongest program you can make with minimum effort and code size while keeping maximum clarity? Chess programers were exploring this for long time, e.g. with Sunfish, and that inspired me to try out something similar in Go over a few evening recently:

Unfortunately, Chess rules are perhaps more complicated for humans, but much easier to play for computers! So the code is longer and more complicated than Sunfish, but hopefully it is still possible to understand it for a Computer Go newbie over a few hours. I will welcome any feedback and/or pull requests.

Contrary to other minimalistic UCT Go players, I wanted to create a program that actually plays reasonably. It can beat many beginners and on 15×15 fares about even with GNUGo; even on 19×19, it can win about 20% of its games with GNUGo on a beefier machine. Based on my observations, the limiting factor is time – Python is sloooow and a faster language with the exact same algorithm should be able to speed this up at least 5x, which should mean at least two ranks level-up. I attempt to leave the code also as my legacy, not sure if I’ll ever get back to Pachi – these parts of a Computer Go program I consider most essential. The biggest code omission wrt. strength is probably lack of 2-liberty semeai reading and more sophisticated self-atari detection.

P.S.: 6k KGS estimate has been based on playtesting against GNUGo over 40-60 games – winrate is about 50% with 4000 playouts/move. Best I can do… But you can connect the program itself to KGS too:

http://www.gokgs.com/gameArchives.jsp?user=michibot

Categories: software Tags: