Modern CUDA + CuDNN Theano/Keras AMI on AWS
Wow, what a jargon-filled post title. Basically, we do a lot of our deep learning currently on the AWS EC2 cloud – but to use the GPU there with all the goodies (up to CuDNN that supports modern Theano’s batch normalization) is a surprisingly arduous process which you basically need to do manually, with a lot of trial and error and googling and hacking. This is awful, mind-boggling and I hate that everyone has to go through this. So, to fix this bad situation, I just released a community AMI that:
- …is based on Ubuntu 16.04 LTS (as opposed to 14.04)
- …comes with CUDA + CuDNN drivers and toolkit already set up to work on g2.2xlarge instances
- …has Theano and Keras preinstalled and preconfigured so that you can run the Keras ResNet model on a GPU right away (or anything else you desire)
To get started, just spin up a GPU (g2.2xlarge) instance from community AMI ami-f0bde196 (1604-cuda80-cudnn5110-theano-keras), ssh in as the ubuntu@ user and get going! No hassles. But of course, EC2 charges apply.
Edit (errata): Actually, there’s a bug – sorry about that! Out of the box, the nvidia kernel driver is not loaded properly on boot. I might update the AMI later, for now to fix it manually:
- Edit
/etc/modprobe.d/blacklist.conf
(using for examplesudo nano
) and append the lineblacklist nouveau
to the end of that file - Run
sudo update-initramfs -u
- Reboot. Now, everything should finally work.
This AMI was created like this:
- The stock Ubuntu 16.04 LTS AMI
- NVIDIA driver 367.57 (older drivers do not support CUDA 8.0, while this is the last driver version to support the K520 GRID GPU used in AWS)
- To make the driver setup go through, the trick to install
apt-get install linux-image-extra-`uname -r`
per - CUDA 8.0 and CuDNN 8.0 set up from the official though unannounced NVIDIA Debian packages by replaying the nvidia-docker recipes
- bashrc modified to include cuda in the path
- Theano and Keras from latest Git as of writing this blogpost (feel free to git pull and reinstall), and some auxiliary python-related etc. packages
- Theano configured to use GPU and Keras configured to use Theano (and the “th” image dim ordering rather than “tf” – this is currently non-default in Keras!)
- Example Keras deep learning models, even an elephant.jpg! Just run
python resnet50.py
- Exercise: Install TensorFlow on the system as well, release your own AMI and post its id in the comments!
- Tip: Use nvidia-docker based containers to package your deep learning software; combine it with docker-machine to easily provision GPU instances in AWS and execute your models as needed. Using this for development is a hassle, though.
Enjoy!
Hi
Thanks a lot for your AMI! Just wanted to let you know there’s one more gotcha: you have to be running 4.4.0-53-generic kernel. When I launched AMI I was on 4.4.0-112-generic so the bugfix (update-initramfs -u) didn’t work on 4.4.0-112 kernel. I didn’t find out how to force specific kernel on boot, so I just uninstalled 4.4.0-112 and rebooted
dpkg -l | grep linux-image
apt-get remove
sudo reboot
# make sure you’re on 4.4.0-53
uname -a
# edit /etc/modprobe.d/blacklist.conf and run sudo update-initramfs -u
I am also facing this issue, When I launched AMI I was on 4.4.0-112-generic so the bugfix (update-initramfs -u) didn’t work on 4.4.0-112 kernel. Anyone here if find the solution then please tell me here.