How to build a playable fully-neural version of Atari Breakout - Paras Chopra
1. Neural Atari
How to build a playable fully-neural
version of Atari Breakout
@paraschopra, founder of Lossfunk
Code: https://github.com/paraschopra/atari-pixels
5. How cool is it to generate interactive
experiences entirely from a neural networks
6. My plan
● Select a game: Atari Breakout
● Train an agent using Reinforcement Learning
○ An excuse to learn RL
● Generate videos of the agent playing the game
● Learn a world model
○ That takes in current frame + latent action to produce the next frame
● Map real actions (LEFT, RIGHT, etc.) to latent actions
● Deploy the world model as a playable game
○ Real actions to latent actions
○ Latent
7. Train an agent to play Atari Breakout
I used Q learning. Theory is pretty simple!
Exploration in any environment gives us these tuples
(state, action, next state, reward, done)
Your job is to learn a function that estimates cumulative future rewards for each
possible action given a state.
9. I used Double Q Learning
● You have two networks
○ Given (state, action, reward, next state)
○ Policy network: estimates Q value for a given state and the action taken
○ Target network: a lagging version of policy network that gives you target value to calculate
loss against
■ Next action chosen = argmax(policy_network(next state))
■ Target Q value = Immediate reward + gamma * target_network(next action chosen)
● There is an exploration parameterized by epsilon
○ Epsilon probability -> random action (this decays over time)
○ Else -> take action with maximum Q value
10. This helped me train an agent that reached a score ~20
● It’s normal to get to scores 200 or more, but I just wanted to test water flowing
through the pipes
13. Caution: I wasted 10 days chasing a subtle bug
My agent was getting stuck in a local optima.
It went LEFT, scored a point, and then did
nothing.
I went mad trying to debug, ended up learning a
lot about RL.
Finally realized that LLM generated code was
normalizing data twice (divide by 255), once
while passing frames and once in forward pass.
14. But it was fixed and I had lots of lots of videos
of Atari Breakout!
15. World model for dynamics of Atari Breakout
Previous
frame
Next frame
Quantized latent
Previous frames
Next frame
reconstruction
Encoder
Decoder
16. First attempt: ball disappeared
Frames were getting reconstructed, but the
ball was missing.
- Top frame is initial frame
- Middle frame is next frame
- Bottom frame is reconstructed frame given
initial frame + predicted latent action
22. First attempt at neural game via learned world model
Quantized latent
Previous frames
Next frame
reconstruction
Decoder
LEFT
RIGHT
NOOP
FIRE
Feed generated
frame back
28. First attempt at neural game via learned world model
Quantized latent
Previous frames
Next frame
reconstruction
Decoder
LEFT
RIGHT
NOOP
FIRE
Feed generated
frame back
Previous frames
30. Paddle went to left, score increases from 0 to 7, and it stays there!
Nope :(
31. Debugging latents: error in full pipeline is 50% (19/35),
while isolated action to latent error is 5%
33. After many days of debugging!
The frames were ordered RGB at one
place, but RBG at other place
(Python PIL reorders it!
“Dammit, Claude”
But also “Thanks Claude”
35. Actions at the bottom -> mine
Notice score increase (0->1) and
life lost (5->4)
The entire game (including score
and life tracking) is generated in
pixels via a neural network