Archive

Posts Tagged ‘keras’

Modern CUDA + CuDNN Theano/Keras AMI on AWS

Wow, what a jargon-filled post title. Basically, we do a lot of our deep learning currently on the AWS EC2 cloud – but to use the GPU there with all the goodies (up to CuDNN that supports modern Theano’s batch normalization) is a surprisingly arduous process which you basically need to do manually, with a lot of trial and error and googling and hacking. This is awful, mind-boggling and I hate that everyone has to go through this. So, to fix this bad situation, I just released a community AMI that:

• …is based on Ubuntu 16.04 LTS (as opposed to 14.04)
• …comes with CUDA + CuDNN drivers and toolkit already set up to work on g2.2xlarge instances
• …has Theano and Keras preinstalled and preconfigured so that you can run the Keras ResNet model on a GPU right away (or anything else you desire)

To get started, just spin up a GPU (g2.2xlarge) instance from community AMI ami-f0bde196 (1604-cuda80-cudnn5110-theano-keras), ssh in as the ubuntu@ user and get going! No hassles. But of course, EC2 charges apply.

Edit (errata): Actually, there’s a bug – sorry about that! Out of the box, the nvidia kernel driver is not loaded properly on boot. I might update the AMI later, for now to fix it manually:

1. Edit /etc/modprobe.d/blacklist.conf (using for example sudo nano) and append the line blacklist nouveau to the end of that file
2. Run sudo update-initramfs -u
3. Reboot. Now, everything should finally work.

This AMI was created like this:

• The stock Ubuntu 16.04 LTS AMI
• NVIDIA driver 367.57 (older drivers do not support CUDA 8.0, while this is the last driver version to support the K520 GRID GPU used in AWS)
• To make the driver setup go through, the trick to install apt-get install linux-image-extra-uname -r per
• CUDA 8.0 and CuDNN 8.0 set up from the official though unannounced NVIDIA Debian packages by replaying the nvidia-docker recipes
• bashrc modified to include cuda in the path
• Theano and Keras from latest Git as of writing this blogpost (feel free to git pull and reinstall), and some auxiliary python-related etc. packages
• Theano configured to use GPU and Keras configured to use Theano (and the “th” image dim ordering rather than “tf” – this is currently non-default in Keras!)
• Example Keras deep learning models, even an elephant.jpg! Just run python resnet50.py
• Exercise: Install TensorFlow on the system as well, release your own AMI and post its id in the comments!
• Tip: Use nvidia-docker based containers to package your deep learning software; combine it with docker-machine to easily provision GPU instances in AWS and execute your models as needed. Using this for development is a hassle, though.

Enjoy!

Categories: Tags:

Semantic Sentence Pair Scoring

The blog has been a little bit silent – a typical sign of us working too hard to worry about that! But we’ll satisfy some of your curiosity in the coming weeks as we have about six posts in the pipeline.

The thing I would like to mention first is some fundamental research we work on now. I stepped back from my daily Question Answering churn and took a little look around and decided the right thing to focus for a while are the fundamentals of the NLP field so that our machine learning works better and makes more sense. Warning: We’ll use some scientific jargon in this one post.

So, in the first months of 2016 I focused huge chunk of my research on deep learning of natural language. That means neural networks used on unstructured text, in various forms, shapes and goals. I have set some audacious goals for myself, fell short in some aspects but still made some good progress hopefully. Here’s the deal – a lot of the current research is about processing a single sentence, maybe to classify its sentiment or translate it or generate other sentences. But I have noticed that recently, I have seen many problems that are about scoring a pair of two sentences. So I decided to look into that and try to build something that (A) works better, (B) actually has an API and we can use it anywhere for anything.

My original goal was to build awesome new neural network architectures that will turn the field on its head. But I noticed that the field is a bit of a mess – there is a lot of tasks that are about the same thing, but very little cross-talk between them. So you get a paper that improves the task of Answer Sentence Selection, but could the models do better on the Ubuntu Dialogue task then, or on Paraphrasing datasets? Who knows! Meanwhile, each dataset has its own format and a lot of time is spent only in writing the adapter code for it. Training protocols (from objectives to segmentation to embedding preinitializations) are inconsistent, and some datasets need a lot of improvement. Well, my goal turned to sorting out the field, cross-check the same models on many tasks and provide a better entry point for others than I had.

Software: Getting a few students of the 3C group together, we have created the dataset-sts platform for all tasks and models that are about comparing two sentences using deep learning. We have a pretty good coverage (of both tasks and models), and more brewing in some side branches. It’s in Python and uses the awesome Keras deep learning library.

Paper: To kick things off research-wise, we have posted a paper Sentence Pair Scoring: Towards Unified Framework for Text Comprehension where we summed up what we have learned early in the process. A few highlights:

• We have a lofty goal of building an universal text comprehension model, a sort of black box that eats your sentences and produces embeddings that correspond to their meaning, which you can use for whatever task you need to do. Long way to go, but we have found that a simple neural model trained on very large data is doing pretty good in this exact setting, and even if applied to tasks and data that look very different from the original. Maybe we are on to something.
• Our framework is state-of-art on the Ubuntu Dialogue dataset of 1M techsupport IRC dialogs, beating Facebook’s memory network models.
• It’s hard to compare neural models because if you train a model 16 times with the same data, the result will always be somewhat different. Not a big deal with large test datasets, but a very big deal with small test datasets which are still popular in the research community. Almost all papers ignore this! If you look at evolution of performance of models in some areas like Answer Sentence Selection, we have found that most differences over the last year are deep below per-train variance we see.

Please take a look, and tell us what you think! We’ll shortly cover a follow-up paper here that we also already posted, and we plan to continue the work by improving our task and model coverage further, fixing a few issues with our training process and experimenting with some novel neural network ideas.

More to come, both about our research and some more product-related news, in a few days. We will also talk about how the abstract-sounding research connects with some very practical technology we are introducing.

Categories: Tags:

Keras for Binary Classification

So I didn’t get around to seriously (besides running a few examples) play with Keras (a powerful library for building fully-differentiable machine learning models aka neural networks) – until now. And I have been a bit surprised about how tricky it actually was for me to get a simple task running, despite (or maybe because of) all the docs available already.

The thing is, many of the “basic examples” gloss over exactly how the inputs and mainly outputs look like, and that’s important. Especially since for me, the archetypal simplest machine learning problem consists of binary classification, but in Keras the canonical task is categorical classification. Only after fumbling around for a few hours, I have realized this fundamental rift.

The examples (besides LSTM sequence classification) silently assume that you want to classify to categories (e.g. to predict words etc.), not do a binary 1/0 classification. The consequences are that if you naively copy the example MLP at first, before learning to think about it, your model will never learn anything and to add insult to injury, always show the accuracy as 1.0.

So, there are a few important things you need to do to perform binary classification:

• Pass output_dim=1 to your final Dense layer (this is the obvious one).
• Use sigmoid activation instead of softmax – obviously, softmax on single output will always normalize whatever comes in to 1.0.
• Pass class_mode='binary' to model.compile() (this fixes the accuracy display, possibly more; you want to pass show_accuracy=True to model.fit()).

Other lessons learned:

• For some projects, my approach of first cobbling up an example from existing code and then thinking harder about it works great; for others, not so much…
• In IPython, do not forget to reinitialize model = Sequential() in some of your cells – a lot of confusion ensues otherwise.
• Keras is pretty awesome and powerful. Conceptually, I think I like NNBlocks‘ usage philosophy more (regarding how you build the model), but sadly that library is still very early in its inception (I have created a bunch of gh issues).

(Edit: After a few hours, I toned down this post a bit. It wasn’t meant at all to be an attack at Keras, though it might be perceived by someone as such. Just as a word of caution to fellow Keras newbies. And it shouldn’t take much to improve the Keras docs.)

Categories: Tags: