Archive for the ‘software’ Category

Modern CUDA + CuDNN Theano/Keras AMI on AWS

January 22nd, 2017 2 comments

Wow, what a jargon-filled post title. Basically, we do a lot of our deep learning currently on the AWS EC2 cloud – but to use the GPU there with all the goodies (up to CuDNN that supports modern Theano’s batch normalization) is a surprisingly arduous process which you basically need to do manually, with a lot of trial and error and googling and hacking. This is awful, mind-boggling and I hate that everyone has to go through this. So, to fix this bad situation, I just released a community AMI that:

  • …is based on Ubuntu 16.04 LTS (as opposed to 14.04)
  • …comes with CUDA + CuDNN drivers and toolkit already set up to work on g2.2xlarge instances
  • …has Theano and Keras preinstalled and preconfigured so that you can run the Keras ResNet model on a GPU right away (or anything else you desire)

To get started, just spin up a GPU (g2.2xlarge) instance from community AMI ami-f0bde196 (1604-cuda80-cudnn5110-theano-keras), ssh in as the ubuntu@ user and get going! No hassles. But of course, EC2 charges apply.

Edit (errata): Actually, there’s a bug – sorry about that! Out of the box, the nvidia kernel driver is not loaded properly on boot. I might update the AMI later, for now to fix it manually:

  1. Edit /etc/modprobe.d/blacklist.conf (using for example sudo nano) and append the line blacklist nouveau to the end of that file
  2. Run sudo update-initramfs -u
  3. Reboot. Now, everything should finally work.

This AMI was created like this:

  • The stock Ubuntu 16.04 LTS AMI
  • NVIDIA driver 367.57 (older drivers do not support CUDA 8.0, while this is the last driver version to support the K520 GRID GPU used in AWS)
  • To make the driver setup go through, the trick to install apt-get install linux-image-extra-`uname -r` per
  • CUDA 8.0 and CuDNN 8.0 set up from the official though unannounced NVIDIA Debian packages by replaying the nvidia-docker recipes
  • bashrc modified to include cuda in the path
  • Theano and Keras from latest Git as of writing this blogpost (feel free to git pull and reinstall), and some auxiliary python-related etc. packages
  • Theano configured to use GPU and Keras configured to use Theano (and the “th” image dim ordering rather than “tf” – this is currently non-default in Keras!)
  • Example Keras deep learning models, even an elephant.jpg! Just run python
  • Exercise: Install TensorFlow on the system as well, release your own AMI and post its id in the comments!
  • Tip: Use nvidia-docker based containers to package your deep learning software; combine it with docker-machine to easily provision GPU instances in AWS and execute your models as needed. Using this for development is a hassle, though.


Categories: ailao, linux, software Tags: , , , , ,

Research at Ailao

June 6th, 2016 1 comment

Readers of this blog should be already a little bit familiar with the Ailao brand, which we use for spinning off and commercialization of our academic research. Originally, Ailao was all about text and question answering, but there was always the theme of dealing with unstructured data in general.

Nowadays, Ailao is not just me (Petr BaudiÅ¡) anymore – but a partnership with Tomáš Gogár and Tomáš Tunys, fellow PhD students! And we are widening our breadth to cover documents in general, developing a machine learning computational platform Ailao Brain (just a codename!) as well as working hard on some exciting end-user products. We are also working on a prettier website (including a new look for this blog) and many other things, but more on that soon.

What I wanted to point out is our talk at Machine Learning Meetups Prague. The talk itself (video) is in Czech, but you can enjoy at least our English slides on our bleeding edge technology research (webpage information extraction and text understanding). Stay tuned for more!

Categories: ailao, life, software Tags: , , ,

YodaQA’s abilities are enlarged by traffic domain

May 23rd, 2016 1 comment

Guest post by Petr Marek (source)

Everybody driving a car needs the navigation to get to the destination fast and avoid traffic jam. One of the biggest problems is how to enter fast the destination and how to find where are the congestions, what is the traffic situation. YodaQA Traffic is a project attempting to answer the traffic related questions quickly and efficiently. Drivers may ask questions in natural language like: “What is the traffic situation in the Evropská street?” or “What is the fastest route from Opletalova street to Kafkova street?” You can try out the prototype (demo available only for limited time) – try to ask for example “traffic situation in the Wilsonova street” .

YodaQA Traffic still has some limitations. Currently we only have a browser version not suitable for smart phones. It is answering traffic questions for Prague’s streets only.

But as usual, this whole technology demo is open source – you can find it in the branch f/traffic-flow of our Hub project.

How does it work and where we get the data from?

All YodaQA are first analyzed to recognize and select traffic questions. We do it in two steps. The first step is to recognize the question topic. We use six topics like traffic situation, traffic incident or fastest route. The topic is determined by comparing semantic similarity of the user’s question with a set of reference questions. We estimate the similarity with our Dataset-STS Scoring API. Each reference question is labeled by a “topic”. The Sentence Pair Similarity algorithm selects the reference question “topic” with the highest similarity to the question.

Next we need to recognize the location, i.e. to recognize the street name. This is handled by another tool called the Label-lookup which we normally use for entity linking in YodaQA. It compares questions words with a list of all street names in the Prague. We exported the list of streets names in Prague from OpenStreetMap. We do not do exact match, we try to select the closest street name from the list.

The last step is to decide whether the question is really the traffic question, because the Dataset-STS API and Label-lookup can find topic and street name even in a pure movie question like “When was the Nightmare on Elm Street released?”. The Dataset-STS and Label-lookup return not only topic or street name but also the score, fortunately. We created dataset of over 70 traffic questions and over 300 movies questions and founded the minimal score thresholds, with which the recognition makes the lowest classification error on this dataset.

Once we know the type of question and the location we start a small script accessing the traffic situation data from HERE Maps. The only complication is that the the API doesn’t return traffic situation for particular street, but bounding box only. To overcome this problem we have to find a bounding box for a desired location, using an algorithm we developed for this purpose. Then we call the traffic flow API to acquire the information for all streets in the bounding box. Finally, we filter out the traffic situation for the desired street.

It was great fun to work on this application, it is not perfect but it shows how to create intelligent assistants helping people solving various everyday situations. We are also excited to see, how the users will use the new functionality of YodaQA and how it will help them.

Categories: ailao, software Tags: , , , , , , ,

Semantic Sentence Pair Scoring

May 20th, 2016 No comments

The blog has been a little bit silent – a typical sign of us working too hard to worry about that! But we’ll satisfy some of your curiosity in the coming weeks as we have about six posts in the pipeline.

The thing I would like to mention first is some fundamental research we work on now. I stepped back from my daily Question Answering churn and took a little look around and decided the right thing to focus for a while are the fundamentals of the NLP field so that our machine learning works better and makes more sense. Warning: We’ll use some scientific jargon in this one post.

So, in the first months of 2016 I focused huge chunk of my research on deep learning of natural language. That means neural networks used on unstructured text, in various forms, shapes and goals. I have set some audacious goals for myself, fell short in some aspects but still made some good progress hopefully. Here’s the deal – a lot of the current research is about processing a single sentence, maybe to classify its sentiment or translate it or generate other sentences. But I have noticed that recently, I have seen many problems that are about scoring a pair of two sentences. So I decided to look into that and try to build something that (A) works better, (B) actually has an API and we can use it anywhere for anything.

My original goal was to build awesome new neural network architectures that will turn the field on its head. But I noticed that the field is a bit of a mess – there is a lot of tasks that are about the same thing, but very little cross-talk between them. So you get a paper that improves the task of Answer Sentence Selection, but could the models do better on the Ubuntu Dialogue task then, or on Paraphrasing datasets? Who knows! Meanwhile, each dataset has its own format and a lot of time is spent only in writing the adapter code for it. Training protocols (from objectives to segmentation to embedding preinitializations) are inconsistent, and some datasets need a lot of improvement. Well, my goal turned to sorting out the field, cross-check the same models on many tasks and provide a better entry point for others than I had.

Software: Getting a few students of the 3C group together, we have created the dataset-sts platform for all tasks and models that are about comparing two sentences using deep learning. We have a pretty good coverage (of both tasks and models), and more brewing in some side branches. It’s in Python and uses the awesome Keras deep learning library.

Paper: To kick things off research-wise, we have posted a paper Sentence Pair Scoring: Towards Unified Framework for Text Comprehension where we summed up what we have learned early in the process. A few highlights:

  • We have a lofty goal of building an universal text comprehension model, a sort of black box that eats your sentences and produces embeddings that correspond to their meaning, which you can use for whatever task you need to do. Long way to go, but we have found that a simple neural model trained on very large data is doing pretty good in this exact setting, and even if applied to tasks and data that look very different from the original. Maybe we are on to something.
  • Our framework is state-of-art on the Ubuntu Dialogue dataset of 1M techsupport IRC dialogs, beating Facebook’s memory network models.
  • It’s hard to compare neural models because if you train a model 16 times with the same data, the result will always be somewhat different. Not a big deal with large test datasets, but a very big deal with small test datasets which are still popular in the research community. Almost all papers ignore this! If you look at evolution of performance of models in some areas like Answer Sentence Selection, we have found that most differences over the last year are deep below per-train variance we see.

Please take a look, and tell us what you think! We’ll shortly cover a follow-up paper here that we also already posted, and we plan to continue the work by improving our task and model coverage further, fixing a few issues with our training process and experimenting with some novel neural network ideas.

More to come, both about our research and some more product-related news, in a few days. We will also talk about how the abstract-sounding research connects with some very practical technology we are introducing.

Studying in Prague? Join us at eClub Summer Camp!

April 13th, 2016 No comments

With kind support of the Medialab foundation and Jan Šedivý, we are looking hard for students in Prague to work with us on a summer internship! We actually have two options for you:

  • eClub Summer Camp (main option) – we have some ambitious projects and ideas for you to try out if you are excited by machine learning, big data and artificial intelligence. Exploratory, exciting, state-of-art research without required previous in-depth knowledge! (Just good basic math and programming.)
  • Summer Job (auxiliary option, full-time coder) – we need help polishing the edges of some of our projects, seeking students that are skilled programmers.

We are mainly affiliated with FEL CVUT, but we also have students from MFF UK and we’ll welcome students from other Czech universities too. As long as you are a competent programmer, want to do something more than yet another Android game, and willing to come in person three times a week – let’s do something groundbreaking together!

Categories: ailao, software Tags: , , ,

YodaQA Grand Challenge!

April 7th, 2016 1 comment

Recently, the YodaQA team is collaborating with Falk Pollok from RWTH Aachen who is interested in using Question Answering in education to help people digest what they have learned better and to generally assist with studying. To this end, he has created PalmQA – a QA application that multiplexes between many question answering backends, ensembling them together to a more accurate system.

Falk has built backends for IBM Watson’s DeepQA among others (Google, Evi, Kngine and MIT’s Start), but in the end, the combination of YodaQA and Wolfram Alpha is “a match made in heaven,” as Falk said in an email a short while ago.

As a finishing touch to his work (being submitted as a diploma thesis), Falk made a Grand Challenge – letting an independent third party make a list of 30 factoid questions of varying difficulty, and pitching the PalmQA against a wide variety of humans. Perhaps not quite as dramatic or grandiose as IBM Watson’s Jeopardy participation, but still a nice showcase of where we are now.

Well, PalmQA did great! 26 people competed, and typically could get about 15 out of the 30 right. The best human answered 24 questions correctly. But no matter – PalmQA managed to answer 25 out of 30 questions right!

So, in this challenge, Falk’s ensemble-enhanced YodaQA beats the best human!

As mentioned above, PalmQA offers integration of YodaQA with Wolfram Alpha, Google QA, MIT’s Start, Amazon’s Evi and Kngine. We hope to merge this ensembling system into the YodaQA project in the future!

I also entered just plain YodaQA into the Grand Challenge, in the configuration that’s running at right now. It got 18 questions right, still better than an average human! If we also included purely “computational” questions (algebra, unit conversions) that YodaQA just isn’t designed to answer (it’s still essentially a search engine), that’d make 24 questions out of 30. Pretty good!

See the Grand Challenge Github issue for more info. We should get the complete details of the challenge, comparisons to other public QA engines (like Google) etc. in Falk’s upcoming thesis.

This is how the plain YodaQA fared:

Question Text correct found
What is the capital of Zimbabwe? ✓ Harare Harare
Who invented the Otto engine? ✓ Nikolaus Otto Nikolaus Otto
When was Pablo Picasso born? ✓ 1881 1881
What is 7*158 + 72 – 72 + 9? ✗ 1115 78
Who wrote the novel The Light Fantastic? ✓ Terry Pratchett Terry Pratchett
In which city was Woody Allen born? ✓ New York New York
Who is the current prime minister of Italy? ✓ Matteo Renzi Matteo Renzi
What is the equatorial radius of Earth’s moon? ✗ 1738 the Moon and Su
When did the Soviet Union dissolve? ✓ 1991 1991
What is the core body temperature of a human? ✗ 37 Bio 42 and cour
Who is the current Dalai Lama? ✓ Tenzin Gyatso Tenzin Gyatso
What is 2^23? ✗ 8388608 the Gregorian c
Who is the creator of Star Trek? ✓ Gene Roddenberr Gene Roddenberr
In which city is the Eiffel Tower? ✓ Paris Paris
12 metric tonnes in kilograms? ✗ 12 *000 SI
Where is the mouth of the river Rhine? ✓ the Netherlands the Netherlands
Where is Buckingham Palace located? ✓ London London
Who directed the movie The Green Mile? ✓ Frank Darabont Frank Darabont
When did Franklin D. Roosevelt die? ✓ 1945 1945
Who was the first man in space? ✓ Yuri Gagarin Yuri Gagarin
Where was the Peace of Westphalia signed? ✗ Osnabrück France
Who was the first woman to be awarded a Nobel Priz ✗ Marie Curie Elinor Ostrom
12.1147 inches to yards? ✗ 0.3365194444 CUX 570 17 577
What is the atomic number of potassium? ✓ 19 19
Where is the Tiananmen Square? ✓ China China
What is the binomial name of horseradish? ✓ Armoracia Rusti Armoracia Rusti
How long did Albert Einstein live? ✗ 76 Germany
Who earned the most Academy Awards? . Walt Disney Jimmy Stewart
How many lines does the London Underground have? ✗ 11 Soho Revue Bar
When is the next planned German Federal Convention ✗ 1850
Categories: ailao, software Tags: , , ,

YodaQA learned to tweet

April 4th, 2016 No comments

Guest post by Petr Marek (source)

YodaQA learned how to use twitter during easter holidays. You can ask it by sending tweet with question to @askYodaQA . YodaQA will answer you shortly. How is it possible? I created app in the Google’s App Script, which handles receiving question from twitter and answering them.

Why did I create it? YodaQA can reach more users in the new interesting form thanks to it. I believe they will help us to find even more ways how YodaQA can help them. It is pretty symbiosis. YodaQA will help twitter users, and they will help it back. Let’s look how it is made.

YodaQA twitter

The two important tools

The most important ingredient was Google’s App Script. It is basically JavaScript with the connection to Google services. You can make your own App Script apps in Google Drive. The best thing is that you can make triggers run the app every minute for example. And it’s for free.

The second thing you need is to create twitter app on account, which your bot will use to communicate with its followers. It will grant you access tokens, which you need to connect to twitter API. I used Twitter Lib for Google Apps Script to simplify the communication with API. It allowed me to tweet and get tweets with questions easily. You just need to call the right function with some arguments.

General idea behind

That was the tools that I used. But how did I make it work? I will describe the general idea now. I set App Script project to run my code every minute. The code does basically two things.

The first step is to obtain answers from twitter and to ask YodaQA. Bot searches all tweets with @askYodaQA. It saves the users that tweeted them and the time when it found the tweets. Then it sends the text of the tweet to YodaQA. YodaQA replies with dialog id and question id, which it saves to the list of questions.

YodaQA twitter

The second step is to go through list of questions and to ask YodaQA for answers to these questions. Bot sends questions to users as soon as the answers are finished. You can even rely on features of the Hub, such as dialogs and coreference resolution. Two questions are connected to dialog when they are asked within five minutes interval.

I said that I save some information. Where? I used spreadsheet as memory. I use one sheet as “user memory”, the second as “asked question list” and the last as memory for the id of the last served tweet. I even log some information into a Google Docs text document. It may sound simple (and it is simple), but it works.

You can try it right now. Just tweet question with @askYodaQA and answer will arrive within few minutes. You can even use hashtags or mention other users. They will also receive the answer.

You can see the whole code on GitHub. You can use it and modify it for your own twitter bots too, maybe on your own data?

Categories: ailao, software Tags: , , , ,

Dialog for YodaQA!

March 31st, 2016 No comments

One of our great student interns at eClub/Ailao Petr Marek who also made the current YodaQA web interface is now working on adding a new element to our ecosystem – the Hub. This is an interface between the web app and the YodaQA system which takes care of various tasks that don’t fit a “pure question answering” system well. For example, it tracks dialog context or allows domain-specific question handling (if you want to add support for retrieving current traffic information, TV schedules or custom question transformations).

We see voice (and chat) as the perfect fit for question answering systems like YodaQA, and together with this, dialog comes naturally. This is why Petr M. has recently transformed our live QA interface to the dialog format (and it now goes through the Hub). The dialog tracking is internally still relatively simplistic from a scientific point of view, but it’s more than enough to already create a great impression. And right now, in a simple, tongue-in-cheek informal test, YodaQA does great compared to “competition”!

See the complete presentation!

Categories: ailao, software Tags: , , ,

Live Streaming to HTML5?

March 13th, 2016 1 comment

We have our mice TV now streaming our colony of mus minutoides at the canonical URL but it would be nice if you could watch them in your web browser (without flash) instead of having to open a media player for the purpose.

I gave that some serious prodding. We still use vlc with the same config as in the original post (mp4v codec + mpegts container). Our video source is an IP cam producing mp4v via rtsp and an important constraint is CPU usage as it runs on my many-purpose server (current setup consumes 10% of one CPU core). We’d like things to work in Debian’s chromium and iceweasel, primarily.

It seems that in the HTML5 world, you have these basic options:

  • MP4/H264 in MP4 – this *does not work* with live streaming because you need to make sure the browser receives a correct header with metadata which normally occurs only at the top of the file; it might work with some horrible custom code hacks but nothing off-the-shelf
  • VP80/VP90 in webm – this works, but encoding consumes between 150%-250% CPU! even with low bitrates; this may be okay for dedicated streaming servers but completely out of the question for me
  • Theora in Ogg – this almost works, but the stream stutters every few seconds (or slips into endless buffering), making it pretty hard to watch; apparently some keyframes are lost and Theora homepage gives a caveat that Ogg encoding is broken in VLC; the CPU usage is about 30%, which would have been acceptable

That’s it for the stock video tag formats, apparently. There are two more alternatives:

  • HTTP Live Stream (HLS) has no native support in browsers outside of mobile, might work with a hack like but you may as well use MSE then
  • Media Source Extensions (MSE) seem to allow basically implementing decoding custom containers (in javascript) for any codecs, which sounds hopeful if we’d just like to pass mp4v (or h264) through. The most popular such container is DASH, which seems to be all about fragmenting video to smaller HTTP requests with per-fragment bitrate negotiation, but still completely codec agnostic. Re Firefox, needs almost latest version. Media players support DASH too.

So far, the best looking courses seem to be:

  • Media server nginx-rtmp-module (in my case with pull directive towards the ipcam’s rtsp) with mpeg-dash output and dash.js based webpage. I might have misunderstood something but it might actually just work (assuming that the bitrate negotiation could always end up just choosing the ipcam’s fixed bitrate; something very low is completely sufficient anyway).
  • Debug libogg + libtheora to find out why it produces corrupted streams – have fun!
Categories: linux, software Tags: , , ,

Keras for Binary Classification

January 13th, 2016 5 comments

So I didn’t get around to seriously (besides running a few examples) play with Keras (a powerful library for building fully-differentiable machine learning models aka neural networks) – until now. And I have been a bit surprised about how tricky it actually was for me to get a simple task running, despite (or maybe because of) all the docs available already.

The thing is, many of the “basic examples” gloss over exactly how the inputs and mainly outputs look like, and that’s important. Especially since for me, the archetypal simplest machine learning problem consists of binary classification, but in Keras the canonical task is categorical classification. Only after fumbling around for a few hours, I have realized this fundamental rift.

The examples (besides LSTM sequence classification) silently assume that you want to classify to categories (e.g. to predict words etc.), not do a binary 1/0 classification. The consequences are that if you naively copy the example MLP at first, before learning to think about it, your model will never learn anything and to add insult to injury, always show the accuracy as 1.0.

So, there are a few important things you need to do to perform binary classification:

  • Pass output_dim=1 to your final Dense layer (this is the obvious one).
  • Use sigmoid activation instead of softmax – obviously, softmax on single output will always normalize whatever comes in to 1.0.
  • Pass class_mode='binary' to model.compile() (this fixes the accuracy display, possibly more; you want to pass show_accuracy=True to

Other lessons learned:

  • For some projects, my approach of first cobbling up an example from existing code and then thinking harder about it works great; for others, not so much…
  • In IPython, do not forget to reinitialize model = Sequential() in some of your cells – a lot of confusion ensues otherwise.
  • Keras is pretty awesome and powerful. Conceptually, I think I like NNBlocks‘ usage philosophy more (regarding how you build the model), but sadly that library is still very early in its inception (I have created a bunch of gh issues).

(Edit: After a few hours, I toned down this post a bit. It wasn’t meant at all to be an attack at Keras, though it might be perceived by someone as such. Just as a word of caution to fellow Keras newbies. And it shouldn’t take much to improve the Keras docs.)

Categories: ailao, software Tags: , , ,