SlideShare a Scribd company logo
Neural Atari
How to build a playable fully-neural
version of Atari Breakout
@paraschopra, founder of Lossfunk
Code: https://github.com/paraschopra/atari-pixels
Demo
Inspiration
How to build a playable fully-neural version of Atari Breakout - Paras Chopra
How cool is it to generate interactive
experiences entirely from a neural networks
My plan
● Select a game: Atari Breakout
● Train an agent using Reinforcement Learning
○ An excuse to learn RL
● Generate videos of the agent playing the game
● Learn a world model
○ That takes in current frame + latent action to produce the next frame
● Map real actions (LEFT, RIGHT, etc.) to latent actions
● Deploy the world model as a playable game
○ Real actions to latent actions
○ Latent
Train an agent to play Atari Breakout
I used Q learning. Theory is pretty simple!
Exploration in any environment gives us these tuples
(state, action, next state, reward, done)
Your job is to learn a function that estimates cumulative future rewards for each
possible action given a state.
Q-values
LEFT 40
RIGHT 10
NOOP 5
FIRE 20
Future total reward
from the state
CNN mostly
I used Double Q Learning
● You have two networks
○ Given (state, action, reward, next state)
○ Policy network: estimates Q value for a given state and the action taken
○ Target network: a lagging version of policy network that gives you target value to calculate
loss against
■ Next action chosen = argmax(policy_network(next state))
■ Target Q value = Immediate reward + gamma * target_network(next action chosen)
● There is an exploration parameterized by epsilon
○ Epsilon probability -> random action (this decays over time)
○ Else -> take action with maximum Q value
This helped me train an agent that reached a score ~20
● It’s normal to get to scores 200 or more, but I just wanted to test water flowing
through the pipes
My process: vibecode!
Plan Implement
I vibecoded what was SOTA in 2013
Caution: I wasted 10 days chasing a subtle bug
My agent was getting stuck in a local optima.
It went LEFT, scored a point, and then did
nothing.
I went mad trying to debug, ended up learning a
lot about RL.
Finally realized that LLM generated code was
normalizing data twice (divide by 255), once
while passing frames and once in forward pass.
But it was fixed and I had lots of lots of videos
of Atari Breakout!
World model for dynamics of Atari Breakout
Previous
frame
Next frame
Quantized latent
Previous frames
Next frame
reconstruction
Encoder
Decoder
First attempt: ball disappeared
Frames were getting reconstructed, but the
ball was missing.
- Top frame is initial frame
- Middle frame is next frame
- Bottom frame is reconstructed frame given
initial frame + predicted latent action
Latent space showed it’s capturing change! (notice blue)
Claude me to told add motion loss and it worked!
TLDR: add diff of next (predicted) frame and
actual previous frame as an additional loss term
How to build a playable fully-neural version of Atari Breakout - Paras Chopra
Action to latent model
LEFT
RIGHT
NOOP
FIRE Quantized latent
Got 83% accuracy for real to latent prediction model
First attempt at neural game via learned world model
Quantized latent
Previous frames
Next frame
reconstruction
Decoder
LEFT
RIGHT
NOOP
FIRE
Feed generated
frame back
Disappearing act! The generated game descended into randomness
Notice anything odd here?
LEFT
RIGHT
NOOP
FIRE Quantized latent
The same action could lead to different
latents depending on the state
Mommy, it’s a one to many mapping!
V2 of action to latent model
LEFT
RIGHT
NOOP
FIRE Quantized latent
Previous frames
Accuracy improved to 95%!
First attempt at neural game via learned world model
Quantized latent
Previous frames
Next frame
reconstruction
Decoder
LEFT
RIGHT
NOOP
FIRE
Feed generated
frame back
Previous frames
It should work now, right?
Paddle went to left, score increases from 0 to 7, and it stays there!
Nope :(
Debugging latents: error in full pipeline is 50% (19/35),
while isolated action to latent error is 5%
How to build a playable fully-neural version of Atari Breakout - Paras Chopra
After many days of debugging!
The frames were ordered RGB at one
place, but RBG at other place
(Python PIL reorders it!
“Dammit, Claude”
But also “Thanks Claude”
How to build a playable fully-neural version of Atari Breakout - Paras Chopra
Actions at the bottom -> mine
Notice score increase (0->1) and
life lost (5->4)
The entire game (including score
and life tracking) is generated in
pixels via a neural network
A walkthrough of the entire thing..
Questions?

More Related Content

ODP
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
PDF
孫民/從電腦視覺看人工智慧 : 下一件大事
PDF
Gamedev-grade debugging
PDF
Bugs from Outer Space | while42 SF #6
PDF
Killzone Shadow Fall Demo Postmortem
PPTX
FGS 2011: Making A Game With Molehill: Zombie Tycoon
PPTX
Gdc gameplay replication in acu with videos
PDF
John Carmack’s Notes From His Upper Bound 2025 Talk
4Developers 2015: Gamedev-grade debugging - Leszek Godlewski
孫民/從電腦視覺看人工智慧 : 下一件大事
Gamedev-grade debugging
Bugs from Outer Space | while42 SF #6
Killzone Shadow Fall Demo Postmortem
FGS 2011: Making A Game With Molehill: Zombie Tycoon
Gdc gameplay replication in acu with videos
John Carmack’s Notes From His Upper Bound 2025 Talk

Similar to How to build a playable fully-neural version of Atari Breakout - Paras Chopra (20)

PPTX
Case Study of the Unexplained
PDF
J-Fall 2017 - AI Self-learning Game Playing
PDF
Introduction to programming - class 10
PPT
Threading Successes 04 Hellgate
PPTX
Deep learning: what? how? why? How to win a Kaggle competition
PPTX
Understanding AlphaGo
PDF
Funky file formats - 31c3
PDF
Killer Bugs From Outer Space
PDF
Claudia Doppioslash - Time Travel for game development with Elm
PPTX
A Bizarre Way to do Real-Time Lighting
PDF
Separating Hype from Reality in Deep Learning with Sameer Farooqui
PDF
Umbra Ignite 2015: Alex Evans – Learning from failure – prototypes, R&D, iter...
PPTX
2 Dimensions Of Awesome: Advanced ActionScript For Platform Games by Iain Lobb
PDF
How shit works: the CPU
PPTX
Benoit fouletier guillaume martin unity day- modern 2 d techniques-gce2014
PPTX
Gan seminar
PPTX
Oop’s Concept and its Real Life Applications
PDF
Programming the Cell Processor A simple raytracer from pseudo-code to spu-code
PPTX
rendering and lighting images and jupitar notebook module 4
PDF
Motionblur
Case Study of the Unexplained
J-Fall 2017 - AI Self-learning Game Playing
Introduction to programming - class 10
Threading Successes 04 Hellgate
Deep learning: what? how? why? How to win a Kaggle competition
Understanding AlphaGo
Funky file formats - 31c3
Killer Bugs From Outer Space
Claudia Doppioslash - Time Travel for game development with Elm
A Bizarre Way to do Real-Time Lighting
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Umbra Ignite 2015: Alex Evans – Learning from failure – prototypes, R&D, iter...
2 Dimensions Of Awesome: Advanced ActionScript For Platform Games by Iain Lobb
How shit works: the CPU
Benoit fouletier guillaume martin unity day- modern 2 d techniques-gce2014
Gan seminar
Oop’s Concept and its Real Life Applications
Programming the Cell Processor A simple raytracer from pseudo-code to spu-code
rendering and lighting images and jupitar notebook module 4
Motionblur
Ad

More from Razin Mustafiz (20)

PDF
The 2025 InfraRed Report - Redpoint Ventures
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
PDF
Citizen Perception Survey (CPS) 2024 | Bangladesh Bureau of Statistics
PDF
Book of Clarity on Wingify's success - Paras Chopra
PDF
AI Trends - Mary Meeker
PDF
System Card: Claude Opus 4 & Claude Sonnet 4
PDF
John Carmack’s Slides From His Upper Bound 2025 Talk
PDF
How To Think Like Rick Rubin - Shaan Puri.pdf
PDF
Threads Strategy Deck ((FTC v. Meta Platforms, Inc.)
PDF
How to validate your startup idea - Paras Chopra
PDF
Meta's Opening Statement (FTC v. Meta Platforms, Inc.)
PDF
Syllabus for Peter Thiel's GERMAN 270: Sovereignty and the Limits of Globaliz...
PDF
Redpoint Ventures Market Update March 2025
PDF
Conviction LP Letter - Jan 2025 [Redacted]
PDF
Conviction LP Letter - May 2024 [Redacted]
PDF
Conviction LP Letter - May 2023 [Redacted]
PDF
Conviction LP Letter - Dec 2023 [Redacted]
PDF
The Duolingo Handbook: 14 Years of Big Learnings in One Little Handbook
PDF
DTV - Michael Moritz (unpublished book about Don Valentine)
PDF
Facebook's Little Red Book (Updated 2024)
The 2025 InfraRed Report - Redpoint Ventures
ICONIQ State of AI Report 2025 - The Builder's Playbook
Citizen Perception Survey (CPS) 2024 | Bangladesh Bureau of Statistics
Book of Clarity on Wingify's success - Paras Chopra
AI Trends - Mary Meeker
System Card: Claude Opus 4 & Claude Sonnet 4
John Carmack’s Slides From His Upper Bound 2025 Talk
How To Think Like Rick Rubin - Shaan Puri.pdf
Threads Strategy Deck ((FTC v. Meta Platforms, Inc.)
How to validate your startup idea - Paras Chopra
Meta's Opening Statement (FTC v. Meta Platforms, Inc.)
Syllabus for Peter Thiel's GERMAN 270: Sovereignty and the Limits of Globaliz...
Redpoint Ventures Market Update March 2025
Conviction LP Letter - Jan 2025 [Redacted]
Conviction LP Letter - May 2024 [Redacted]
Conviction LP Letter - May 2023 [Redacted]
Conviction LP Letter - Dec 2023 [Redacted]
The Duolingo Handbook: 14 Years of Big Learnings in One Little Handbook
DTV - Michael Moritz (unpublished book about Don Valentine)
Facebook's Little Red Book (Updated 2024)
Ad

Recently uploaded (20)

PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
A Presentation on Artificial Intelligence
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
Chapter 5: Probability Theory and Statistics
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Encapsulation theory and applications.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
TLE Review Electricity (Electricity).pptx
Zenith AI: Advanced Artificial Intelligence
SOPHOS-XG Firewall Administrator PPT.pptx
cloud_computing_Infrastucture_as_cloud_p
Encapsulation_ Review paper, used for researhc scholars
A comparative analysis of optical character recognition models for extracting...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Tartificialntelligence_presentation.pptx
Getting Started with Data Integration: FME Form 101
A Presentation on Artificial Intelligence
Building Integrated photovoltaic BIPV_UPV.pdf
Hindi spoken digit analysis for native and non-native speakers
Chapter 5: Probability Theory and Statistics
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Hybrid model detection and classification of lung cancer
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Encapsulation theory and applications.pdf
OMC Textile Division Presentation 2021.pptx
TLE Review Electricity (Electricity).pptx

How to build a playable fully-neural version of Atari Breakout - Paras Chopra

  • 1. Neural Atari How to build a playable fully-neural version of Atari Breakout @paraschopra, founder of Lossfunk Code: https://github.com/paraschopra/atari-pixels
  • 5. How cool is it to generate interactive experiences entirely from a neural networks
  • 6. My plan ● Select a game: Atari Breakout ● Train an agent using Reinforcement Learning ○ An excuse to learn RL ● Generate videos of the agent playing the game ● Learn a world model ○ That takes in current frame + latent action to produce the next frame ● Map real actions (LEFT, RIGHT, etc.) to latent actions ● Deploy the world model as a playable game ○ Real actions to latent actions ○ Latent
  • 7. Train an agent to play Atari Breakout I used Q learning. Theory is pretty simple! Exploration in any environment gives us these tuples (state, action, next state, reward, done) Your job is to learn a function that estimates cumulative future rewards for each possible action given a state.
  • 8. Q-values LEFT 40 RIGHT 10 NOOP 5 FIRE 20 Future total reward from the state CNN mostly
  • 9. I used Double Q Learning ● You have two networks ○ Given (state, action, reward, next state) ○ Policy network: estimates Q value for a given state and the action taken ○ Target network: a lagging version of policy network that gives you target value to calculate loss against ■ Next action chosen = argmax(policy_network(next state)) ■ Target Q value = Immediate reward + gamma * target_network(next action chosen) ● There is an exploration parameterized by epsilon ○ Epsilon probability -> random action (this decays over time) ○ Else -> take action with maximum Q value
  • 10. This helped me train an agent that reached a score ~20 ● It’s normal to get to scores 200 or more, but I just wanted to test water flowing through the pipes
  • 12. I vibecoded what was SOTA in 2013
  • 13. Caution: I wasted 10 days chasing a subtle bug My agent was getting stuck in a local optima. It went LEFT, scored a point, and then did nothing. I went mad trying to debug, ended up learning a lot about RL. Finally realized that LLM generated code was normalizing data twice (divide by 255), once while passing frames and once in forward pass.
  • 14. But it was fixed and I had lots of lots of videos of Atari Breakout!
  • 15. World model for dynamics of Atari Breakout Previous frame Next frame Quantized latent Previous frames Next frame reconstruction Encoder Decoder
  • 16. First attempt: ball disappeared Frames were getting reconstructed, but the ball was missing. - Top frame is initial frame - Middle frame is next frame - Bottom frame is reconstructed frame given initial frame + predicted latent action
  • 17. Latent space showed it’s capturing change! (notice blue)
  • 18. Claude me to told add motion loss and it worked! TLDR: add diff of next (predicted) frame and actual previous frame as an additional loss term
  • 20. Action to latent model LEFT RIGHT NOOP FIRE Quantized latent
  • 21. Got 83% accuracy for real to latent prediction model
  • 22. First attempt at neural game via learned world model Quantized latent Previous frames Next frame reconstruction Decoder LEFT RIGHT NOOP FIRE Feed generated frame back
  • 23. Disappearing act! The generated game descended into randomness
  • 24. Notice anything odd here? LEFT RIGHT NOOP FIRE Quantized latent
  • 25. The same action could lead to different latents depending on the state Mommy, it’s a one to many mapping!
  • 26. V2 of action to latent model LEFT RIGHT NOOP FIRE Quantized latent Previous frames
  • 28. First attempt at neural game via learned world model Quantized latent Previous frames Next frame reconstruction Decoder LEFT RIGHT NOOP FIRE Feed generated frame back Previous frames
  • 29. It should work now, right?
  • 30. Paddle went to left, score increases from 0 to 7, and it stays there! Nope :(
  • 31. Debugging latents: error in full pipeline is 50% (19/35), while isolated action to latent error is 5%
  • 33. After many days of debugging! The frames were ordered RGB at one place, but RBG at other place (Python PIL reorders it! “Dammit, Claude” But also “Thanks Claude”
  • 35. Actions at the bottom -> mine Notice score increase (0->1) and life lost (5->4) The entire game (including score and life tracking) is generated in pixels via a neural network
  • 36. A walkthrough of the entire thing..