Home > ailao, linux, software > Modern CUDA + CuDNN Theano/Keras AMI on AWS

Modern CUDA + CuDNN Theano/Keras AMI on AWS

January 22nd, 2017 Leave a comment Go to comments

Wow, what a jargon-filled post title. Basically, we do a lot of our deep learning currently on the AWS EC2 cloud – but to use the GPU there with all the goodies (up to CuDNN that supports modern Theano’s batch normalization) is a surprisingly arduous process which you basically need to do manually, with a lot of trial and error and googling and hacking. This is awful, mind-boggling and I hate that everyone has to go through this. So, to fix this bad situation, I just released a community AMI that:

  • …is based on Ubuntu 16.04 LTS (as opposed to 14.04)
  • …comes with CUDA + CuDNN drivers and toolkit already set up to work on g2.2xlarge instances
  • …has Theano and Keras preinstalled and preconfigured so that you can run the Keras ResNet model on a GPU right away (or anything else you desire)

To get started, just spin up a GPU (g2.2xlarge) instance from community AMI ami-f0bde196 (1604-cuda80-cudnn5110-theano-keras), ssh in as the ubuntu@ user and get going! No hassles. But of course, EC2 charges apply.


Edit (errata): Actually, there’s a bug – sorry about that! Out of the box, the nvidia kernel driver is not loaded properly on boot. I might update the AMI later, for now to fix it manually:

  1. Edit /etc/modprobe.d/blacklist.conf (using for example sudo nano) and append the line blacklist nouveau to the end of that file
  2. Run sudo update-initramfs -u
  3. Reboot. Now, everything should finally work.

This AMI was created like this:

  • The stock Ubuntu 16.04 LTS AMI
  • NVIDIA driver 367.57 (older drivers do not support CUDA 8.0, while this is the last driver version to support the K520 GRID GPU used in AWS)
  • To make the driver setup go through, the trick to install apt-get install linux-image-extra-`uname -r` per
  • CUDA 8.0 and CuDNN 8.0 set up from the official though unannounced NVIDIA Debian packages by replaying the nvidia-docker recipes
  • bashrc modified to include cuda in the path
  • Theano and Keras from latest Git as of writing this blogpost (feel free to git pull and reinstall), and some auxiliary python-related etc. packages
  • Theano configured to use GPU and Keras configured to use Theano (and the “th” image dim ordering rather than “tf” – this is currently non-default in Keras!)
  • Example Keras deep learning models, even an elephant.jpg! Just run python resnet50.py
  • Exercise: Install TensorFlow on the system as well, release your own AMI and post its id in the comments!
  • Tip: Use nvidia-docker based containers to package your deep learning software; combine it with docker-machine to easily provision GPU instances in AWS and execute your models as needed. Using this for development is a hassle, though.

Enjoy!

Categories: ailao, linux, software Tags: , , , , ,
  1. Eugene
    February 13th, 2018 at 00:22 | #1

    Hi
    Thanks a lot for your AMI! Just wanted to let you know there’s one more gotcha: you have to be running 4.4.0-53-generic kernel. When I launched AMI I was on 4.4.0-112-generic so the bugfix (update-initramfs -u) didn’t work on 4.4.0-112 kernel. I didn’t find out how to force specific kernel on boot, so I just uninstalled 4.4.0-112 and rebooted

    dpkg -l | grep linux-image
    apt-get remove
    sudo reboot
    # make sure you’re on 4.4.0-53
    uname -a
    # edit /etc/modprobe.d/blacklist.conf and run sudo update-initramfs -u

  2. June 29th, 2020 at 13:24 | #2

    I am also facing this issue, When I launched AMI I was on 4.4.0-112-generic so the bugfix (update-initramfs -u) didn’t work on 4.4.0-112 kernel. Anyone here if find the solution then please tell me here.

  1. No trackbacks yet.


6 − = one