Training DeepMind's DQN to Play 'Pong', the Atari Game

Feb 1, 2017

A while ago I saw this Google DeepMind’s Deep Q-learning playing Atari Breakout video.

After some research, I got really intrigued by DeepMind’s “Human-level control through deep reinforcement learning” paper, in which their Atari game AI program could exceed human performance in most of the games tested. I wanted to run the same AI program on Jetson TX1 and reproduce the result, and I found kuz’s DeepMind Atari Deep Q Learner on GitHub. However, the code does not work on Jetson TX1. The problem is that the Deep Q Learner uses too much memory. It gets killed pretty quickly because the system runs out of memory. Note that Jetson TX1 only has 4GB of RAM for both CPU and GPU in total, while kuz’s code would require at least 6GB or so.

So I forked kuz’s code and made some changes to reduce memory usage. The resulting code could run successfully on Jetson TX1, with some caveat (memory leak) as explained below. My modified code is here.

https://github.com/jkjung-avt/DeepMind-Atari-Deep-Q-Learner

You can refer to the instructions on the above GitHub page for how to install and run the AI program. Basically you’d need to have Torch7 (with CUDA) installed on your system, and run ./install_dependencies.sh. Then you can do ./run_gpu pong to train the DQN and ./test_gpu pong to test a trained DQN snapshot.

I trained the Deep Q Learner from scratch for roughly a week on Jetson TX1. The resulting AI agent could play the ‘pong’ game as shown below. It has clearly learned a good policy (actually the Q-function) and could beat the game all the time. In the following saved test episode, the AI agent, on the right-hand side, won the game handily with a score of 21:4.

pong-test screenshots

Pretty cool, isn’t it?

Well, here’s a list of the things I’ve modified from kuz’s repository:

I removed Torch7 installation part in ‘install_dependencies.sh’, because I prefer the standard way of installing Torch7. You can refer to my earlier post for that.
I reduced the size of ‘replay_memory’ from 1,000,000 to 50,000. This effectively reduced memory requirement of the program. I also decreased ‘learn_start’ from 50,000 to 5,000. These 2 parameters are present in the ‘run_gpu’ and ‘test_gpu’ scripts.
The ‘run_gpu’ script would always resume training from the last neural network snapshot if it finds a previously saved snapshot, e.g. ‘dqn/DQN3_0_1_pong_FULL_Y.t7’. If you want to train a network from scratch instead, you can just remove the corresponding .t7 file.
I removed the ‘run_cpu’ and ‘test_cpu’ scripts, because I always want to run the GPU (CUDA accelerated) version of the code.

Finally I’d like to put down some additional notes about this Torch7 DQN implementation:

In addition to Jetson TX1, I also tested the same code on my x64 PC with GTX-1080 GPU. It took less than 1 day to train the DQN to play ‘pong’ for 10 million steps on the PC. And the resulting AI agent was good.
There is memory leak in Torch7’s image.display() API. So when I execute ‘run_gpu’ on Jetson TX1, it would still get killed by the system after a while. I get around this problem by adding a swap partition on Jetson TX1. I highly recommend you do the same if you are training the AI program on Jetson TX1 or a system with few memory.
As explained earlier, I decreased size of the ‘replay_memory’ in order to reduce memory requirement of the DQN. Since the main purpose of the replay memory is to generate near iid data to train the neural net, reducing its size might increase dependency of the data used for training. For my experiment on the ‘pong’ game, this doesn’t seem to hurt the performance too much. However, for more complicated games or games with longer episodes, reducing ‘replay_memory’ might have a pretty bad impact.
During training, the DQN would save the latest neural net snapshot every ‘save_freq’ steps. This parameter could be found in the ‘run_gpu’ script and is currently set to 12,500. As mentioned earlier, the saved neural net snapshot file is named, say, ‘DQN3_0_1_pong_FULL_Y.t7’ in the ‘dqn’ subdirectory.
Note that every time you resume training with the ‘run_gpu’ script (loading a trained neural net sanpshot), the epsilon greedy algorithm will still start off with epsilon 1 and then gradually decrease it to 0.1. That means the AI agent would always pick actions completely randomely at the beginning. You can change this behavior by modifying line #33 of the ‘run_gpu’ script: ‘ep=1’.
I have also tried to train the DQN to play the ‘space_invaders’ game on Jetson TX1, for roughly a month. The resulting AI agent could play the game decently well. It could hit a score of over 1,000 occasionally, which is significantly better than a naive random player (<200). Note that feature representation of ‘space_invaders’ is much more complicated than ‘pong’, so it’s much very explainable that it took the DQN more time to learn how to play ‘space_invaders’.
When I discuss deep learning and AI training with others, I often hear people saying that NVIDIA Tegra platform is for inference only and you cannot do neural net training on it. I myself hold a different opinion on that. Tegra X1, capable of 512 GFLOPS for FP32, is a very powerful platform. As shown in this DQN ‘pong’ example, if size of the neural net you’re trying to train is reasonable, I think it’s perfectly OK to train a neural net directly on Tegra X1. This holds especially true for reinforcement learning tasks, where you’d sample trajectories and learn from those trajectories directly on the same system. I think there’s still a lot of things for me to explore in this area!

To-do: (Things I’d like to do in subsequent posts if I have the time.)

Some primer on reinforcement learning, Q-function learning and deep neural networks. Or simply referencing this article: Deep Learning in a Nutshell: Reinforcement Learning.
More explanation about the parameters/options in train_agent.lua.
Reading and commenting the Deep Q Learner code.