Wow, what a jargon-filled post title. Basically, we do a lot of our deep learning currently on the AWS EC2 cloud – but to use the GPU there with all the goodies (up to CuDNN that supports modern Theano’s batch normalization) is a surprisingly arduous process which you basically need to do manually, with a lot of trial and error and googling and hacking. This is awful, mind-boggling and I hate that everyone has to go through this. So, to fix this bad situation, I just released a community AMI that:
- …is based on Ubuntu 16.04 LTS (as opposed to 14.04)
- …comes with CUDA + CuDNN drivers and toolkit already set up to work on g2.2xlarge instances
- …has Theano and Keras preinstalled and preconfigured so that you can run the Keras ResNet model on a GPU right away (or anything else you desire)
To get started, just spin up a GPU (g2.2xlarge) instance from community AMI ami-f0bde196 (1604-cuda80-cudnn5110-theano-keras), ssh in as the ubuntu@ user and get going! No hassles. But of course, EC2 charges apply.
Edit (errata): Actually, there’s a bug – sorry about that! Out of the box, the nvidia kernel driver is not loaded properly on boot. I might update the AMI later, for now to fix it manually:
/etc/modprobe.d/blacklist.conf(using for example
sudo nano) and append the line
blacklist nouveauto the end of that file
sudo update-initramfs -u
- Reboot. Now, everything should finally work.
Edit 2: Some things work fine, but for others, the latest git Theano version is apparently a bit too bleeding edge. If you get a crash like
mod.cu(313): error: identifier "callkernel_node_0daf510dc96416bc43e98d75a5bc6019_0" is undefined when compiling your models, downgrade Theano to some older version, e.g.
sudo rm -r /usr/local/lib/python2.7/dist-packages/Theano-0.9.0.dev* ~/.theano/compiledir_* cd theano git checkout f66ea7e sudo python setup.py install
This AMI was created like this:
- The stock Ubuntu 16.04 LTS AMI
- NVIDIA driver 367.57 (older drivers do not support CUDA 8.0, while this is the last driver version to support the K520 GRID GPU used in AWS)
- To make the driver setup go through, the trick to install
apt-get install linux-image-extra-`uname -r`per
- CUDA 8.0 and CuDNN 8.0 set up from the official though unannounced NVIDIA Debian packages by replaying the nvidia-docker recipes
- bashrc modified to include cuda in the path
- Theano and Keras from latest Git as of writing this blogpost (feel free to git pull and reinstall), and some auxiliary python-related etc. packages
- Theano configured to use GPU and Keras configured to use Theano (and the “th” image dim ordering rather than “tf” – this is currently non-default in Keras!)
- Example Keras deep learning models, even an elephant.jpg! Just run
- Exercise: Install TensorFlow on the system as well, release your own AMI and post its id in the comments!
- Tip: Use nvidia-docker based containers to package your deep learning software; combine it with docker-machine to easily provision GPU instances in AWS and execute your models as needed. Using this for development is a hassle, though.