SlideShare a Scribd company logo
TETRIS WAR
Team 4
2008.12.1
JungKyu Lee
SangYun Kim
Shinnyue Kang
1
Contents
• General Idea Description
– Approximation using feature based MDP
– policy iteration

• Apply to Tetris
– Problem description
– MDP formulation
– Feature based MDP formulation

• Result
• Conclusion
2
General Idea
• Infinite horizon MDP with discount factor

0

1

• Goal : Find a policy
: X that maximize
A
V
value function cost-to-go vector

– Let policy

{ 0,

1

,...}
3
Cost-to-go value V

*

• Definition of optimal cost-to-go vector

V

*

• By bellman’s optimal equation
• Using Optimal stationary policy

{ , ,...}

• Then optimal equation is given as

4
Policy iteration
•

Policy iteration

• The value is updated as follows
• Vector

thas

components given by

• Temporal difference(TD) associated with each
)
transition (i, junder
t 1
5
Tetris
• Board size
– Width 10 ; height 22

• Blocks
– 7 pieces with predetermined
probability

• Score
– (number of erased line)2X100

• Action
– Left, Right, Rotate, No move
6
MDP formulation
• MDP Model for tetris
– States : X={wall configuration + Piece}
– Actions : A={rotation, right, left, nomove}
– Transitions : deterministic new wall after(i,a) + uniform
random new piece
– Reward : r(i,a,j)=number of lines removed after (i,a)

• A value function can be computed only on the set
of wall configurations.
• The optimal value function V* is the best average
score!
7
Approximation for tetris
• Number of state is too Large
to compute
– Feature-based MDP

• Feature
– Each column’s height, width
– Absolute difference between
adjacency column
– Maximum height
– Number of holes
8
Value function
~
• We defined approximated value function using
V
above feature

• Where

is a vector of features for state k

• Finally
• Our decision is as follows

9
Weight vector
~
• Iteration to approximate V the optimal value
to
function for V *
policy iteration
• weight vector (equation 1)

– M games
–
:state sequence of the game m
–
:termination state of game m
–
10
Minimum squared error technique
using psedoinverse
• To solve equation (1)
• Goal : find a weight vector a satisfying
following equation
– d : # of feature
– n : # of samples

• Formal solution
– Y is nonsingular

• Error vector
11
Squared error criterion function
• Minimize the squared length of error vector
• Define the error criterion function
• Using gradient method for simplify

• Necessary condition yield by above equation
has 0 value
–

: psedoinverse
12
Apply to tetris problem
• Let

• Equation (1)
– M : # of games
– n : # of samples
– : feature vector

13
Simulation Result
• Let
–
= 0.6
– Test 100 game
• using random seed 0 to 100

• Simple TD algorithm is our heuristic algorithm
• Our learning algorithm improve 2010% of the
heuristic algorithm
14
Simulation result
• zxzxz

15
Conclusion
• Goal of project
– Make a algorithm to achieve the highest average
cost

• Our learning algorithm is powerful
– Average score and maximum score is satisfied to
compare with heuristic algorithm

• Problem of deviation
– Deviation : difference between the highest score and
lowest score
– Our learning algorithm gives big deviation

• Suggest
– Reduce the deviation without dropped average score
16
Reference
[1] Bertsekas, D. P. and Tsitsiklis, J. N., 1996, "Neuro-Dynam
ic Programming", Athena Scientific.
[2] Colin Fahey., 2003, "Tetris AI", http://www.colinfahey.com
[3] Dimitri P. Bertsekas2 and Sergey Ioffe.1996, "Temporal Di
fferences-Based Policy Iterationand Applications in NeuroDynamic Programming" LIDS-P-2349
[4] Donald Carr., 2005, "Applying reinforcement learning to T
etris", Dept. of CS, Rhodes University. South Africa.
[5] Niko Böhm at el., 2005, "An Evolutionary Approach to Tetr
is", MIC2005: The Sixth Metaheuristics International Confe
rence, Vienna, Austria.
[6] Richard S. Sutton and Andrew G. Barto., 1998, "Reinforce
ment Learning: An Introduction", The MIT Press.

17

More Related Content

PPTX
Dynamic programming
PPTX
Dynamic programming Basics
PDF
Dynamic programming
PPT
Dynamic programming
PPTX
Daa:Dynamic Programing
PPTX
A note on word embedding
PPTX
Introduction to dynamic programming
PDF
Overview on Optimization algorithms in Deep Learning
Dynamic programming
Dynamic programming Basics
Dynamic programming
Dynamic programming
Daa:Dynamic Programing
A note on word embedding
Introduction to dynamic programming
Overview on Optimization algorithms in Deep Learning

What's hot (20)

PPTX
Dynamic Programming
PPT
simplex method
PPT
Dynamic programming in Algorithm Analysis
PPT
09d transform & conquer spring2015
PPTX
Statistics - SoftMax Equation
PPTX
linear programming
PPTX
K-Means Clustering Simply
PPTX
dynamic programming complete by Mumtaz Ali (03154103173)
PPT
4 greedy methodnew
PPTX
Support Vector Machines Simply
PPTX
Svm algorithm
PDF
Dynamic programming
PPTX
Dynamic programming - fundamentals review
PPT
Greedy method by Dr. B. J. Mohite
PPT
Sum of subsets problem by backtracking 
PPTX
Simplex Method Flowchart/Algorithm
PPT
Lecture 8 dynamic programming
PDF
3D scattered data interpolation and approximation2005
PPTX
Greedy Algorithm - Knapsack Problem
PPT
Greedy method
Dynamic Programming
simplex method
Dynamic programming in Algorithm Analysis
09d transform & conquer spring2015
Statistics - SoftMax Equation
linear programming
K-Means Clustering Simply
dynamic programming complete by Mumtaz Ali (03154103173)
4 greedy methodnew
Support Vector Machines Simply
Svm algorithm
Dynamic programming
Dynamic programming - fundamentals review
Greedy method by Dr. B. J. Mohite
Sum of subsets problem by backtracking 
Simplex Method Flowchart/Algorithm
Lecture 8 dynamic programming
3D scattered data interpolation and approximation2005
Greedy Algorithm - Knapsack Problem
Greedy method
Ad

Viewers also liked (20)

PDF
8. Logistic Regression
PDF
Attention mechanisms with tensorflow
PDF
Dynamic Programming and Reinforcement Learning applied to Tetris Game
PDF
Eigenvalues of regular graphs
PDF
3 Generative models for discrete data
PDF
Jensen's inequality, EM 알고리즘
PPTX
Murpy's Machine Learning 9. Generalize Linear Model
PDF
ThinkBayes: chapter 13  simulation
PDF
머피의 머신러닝: Undirencted Graphical Model
PDF
ThinkBayes: Chapter 9 two_dimensions
PPTX
Murpy's Machine Learning:14. Kernel
PPTX
파이널 판타지 3 루트 공략
PPTX
Murpy's Machine Learing: 10. Directed Graphical Model
PPTX
머피's 머신러닝: Latent Linear Model
PPTX
머피's 머신러닝: Latent Linear Model
PPTX
머피의 머신러닝 13 Sparse Linear Model
PDF
7. Linear Regression
PDF
4. Gaussian Model
PDF
앙상블 학습 기반의 추천시스템 개발
PPTX
머피의 머신러닝: 17장 Markov Chain and HMM
8. Logistic Regression
Attention mechanisms with tensorflow
Dynamic Programming and Reinforcement Learning applied to Tetris Game
Eigenvalues of regular graphs
3 Generative models for discrete data
Jensen's inequality, EM 알고리즘
Murpy's Machine Learning 9. Generalize Linear Model
ThinkBayes: chapter 13  simulation
머피의 머신러닝: Undirencted Graphical Model
ThinkBayes: Chapter 9 two_dimensions
Murpy's Machine Learning:14. Kernel
파이널 판타지 3 루트 공략
Murpy's Machine Learing: 10. Directed Graphical Model
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model
머피의 머신러닝 13 Sparse Linear Model
7. Linear Regression
4. Gaussian Model
앙상블 학습 기반의 추천시스템 개발
머피의 머신러닝: 17장 Markov Chain and HMM
Ad

Similar to TETRIS AI WITH REINFORCEMENT LEARNING (20)

PPTX
Machine Learning Introduction by Dr.C.R.Dhivyaa Kongu Engineering College
PPTX
UNIT I (6).pptx
PPTX
Unit 2 TOMMichlwjwjwjwjwwjejejejejejejej
PPT
vorl1.ppt
PPTX
Introdution and designing a learning system
PDF
employed to cover the tampering traces of a tampered image.
PDF
自然方策勾配法の基礎と応用
PDF
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
PDF
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
PPTX
ML_ Unit_1_PART_A
PPT
Machine Learning Fall, 2007 Course Information
PDF
Reinfrocement Learning
PDF
Two algorithms to accelerate training of back-propagation neural networks
PDF
Module 1.pdf
PDF
A STRATEGIC HYBRID TECHNIQUE TO DEVELOP A GAME PLAYER
PPTX
UNIT 1 Machine Learning [KCS-055] (1).pptx
PPT
introducción a Machine Learning
PPT
introducción a Machine Learning
PPT
ML_Lecture_1.ppt
PPT
Introduction
Machine Learning Introduction by Dr.C.R.Dhivyaa Kongu Engineering College
UNIT I (6).pptx
Unit 2 TOMMichlwjwjwjwjwwjejejejejejejej
vorl1.ppt
Introdution and designing a learning system
employed to cover the tampering traces of a tampered image.
自然方策勾配法の基礎と応用
Higher-order Factorization Machines(第5回ステアラボ人工知能セミナー)
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
ML_ Unit_1_PART_A
Machine Learning Fall, 2007 Course Information
Reinfrocement Learning
Two algorithms to accelerate training of back-propagation neural networks
Module 1.pdf
A STRATEGIC HYBRID TECHNIQUE TO DEVELOP A GAME PLAYER
UNIT 1 Machine Learning [KCS-055] (1).pptx
introducción a Machine Learning
introducción a Machine Learning
ML_Lecture_1.ppt
Introduction

Recently uploaded (20)

PPTX
Safety_Pharmacology_Tier2_Edibbbbbbbbbbbbbbbable.pptx
PDF
Download FL Studio Crack Latest version 2025
PDF
oppenheimer and the story of the atomic bomb
PDF
Keanu Reeves Beyond the Legendary Hollywood Movie Star.pdf
PPTX
Other Dance Forms - G10 MAPEH Reporting.pptx
PPTX
BULAN K3 NASIONAL PowerPt Templates.pptx
PPTX
wegen seminar ppt.pptxhkjbkhkjjlhjhjhlhhvg
PPTX
genderandsexuality.pptxjjjjjjjjjjjjjjjjjjjj
PDF
My Oxford Year- A Love Story Set in the Halls of Oxford
PPTX
providenetworksystemadministration.pptxhnnhgcbdjckk
PDF
MAGNET STORY- Coaster Sequence (Rough Version 2).pdf
PPTX
Hacking Movie – Best Films on Cybercrime & Digital Intrigue
PPT
business model and some other things that
PDF
WKA #29: "FALLING FOR CUPID" TRANSCRIPT.pdf
PDF
High-Quality PDF Backlinking for Better Rankings
PDF
How Old Radio Shows in the 1940s and 1950s Helped Ella Fitzgerald Grow.pdf
PPTX
What Makes an Entertainment App Addictive?
PDF
WKA #29: "FALLING FOR CUPID" TRANSCRIPT.pdf
PDF
Rare Big Band Arrangers Who Revolutionized Big Band Music in USA.pdf
PDF
Commercial arboriculture Commercial Tree consultant Essex, Kent, Thaxted.pdf
Safety_Pharmacology_Tier2_Edibbbbbbbbbbbbbbbable.pptx
Download FL Studio Crack Latest version 2025
oppenheimer and the story of the atomic bomb
Keanu Reeves Beyond the Legendary Hollywood Movie Star.pdf
Other Dance Forms - G10 MAPEH Reporting.pptx
BULAN K3 NASIONAL PowerPt Templates.pptx
wegen seminar ppt.pptxhkjbkhkjjlhjhjhlhhvg
genderandsexuality.pptxjjjjjjjjjjjjjjjjjjjj
My Oxford Year- A Love Story Set in the Halls of Oxford
providenetworksystemadministration.pptxhnnhgcbdjckk
MAGNET STORY- Coaster Sequence (Rough Version 2).pdf
Hacking Movie – Best Films on Cybercrime & Digital Intrigue
business model and some other things that
WKA #29: "FALLING FOR CUPID" TRANSCRIPT.pdf
High-Quality PDF Backlinking for Better Rankings
How Old Radio Shows in the 1940s and 1950s Helped Ella Fitzgerald Grow.pdf
What Makes an Entertainment App Addictive?
WKA #29: "FALLING FOR CUPID" TRANSCRIPT.pdf
Rare Big Band Arrangers Who Revolutionized Big Band Music in USA.pdf
Commercial arboriculture Commercial Tree consultant Essex, Kent, Thaxted.pdf

TETRIS AI WITH REINFORCEMENT LEARNING

  • 1. TETRIS WAR Team 4 2008.12.1 JungKyu Lee SangYun Kim Shinnyue Kang 1
  • 2. Contents • General Idea Description – Approximation using feature based MDP – policy iteration • Apply to Tetris – Problem description – MDP formulation – Feature based MDP formulation • Result • Conclusion 2
  • 3. General Idea • Infinite horizon MDP with discount factor 0 1 • Goal : Find a policy : X that maximize A V value function cost-to-go vector – Let policy { 0, 1 ,...} 3
  • 4. Cost-to-go value V * • Definition of optimal cost-to-go vector V * • By bellman’s optimal equation • Using Optimal stationary policy { , ,...} • Then optimal equation is given as 4
  • 5. Policy iteration • Policy iteration • The value is updated as follows • Vector thas components given by • Temporal difference(TD) associated with each ) transition (i, junder t 1 5
  • 6. Tetris • Board size – Width 10 ; height 22 • Blocks – 7 pieces with predetermined probability • Score – (number of erased line)2X100 • Action – Left, Right, Rotate, No move 6
  • 7. MDP formulation • MDP Model for tetris – States : X={wall configuration + Piece} – Actions : A={rotation, right, left, nomove} – Transitions : deterministic new wall after(i,a) + uniform random new piece – Reward : r(i,a,j)=number of lines removed after (i,a) • A value function can be computed only on the set of wall configurations. • The optimal value function V* is the best average score! 7
  • 8. Approximation for tetris • Number of state is too Large to compute – Feature-based MDP • Feature – Each column’s height, width – Absolute difference between adjacency column – Maximum height – Number of holes 8
  • 9. Value function ~ • We defined approximated value function using V above feature • Where is a vector of features for state k • Finally • Our decision is as follows 9
  • 10. Weight vector ~ • Iteration to approximate V the optimal value to function for V * policy iteration • weight vector (equation 1) – M games – :state sequence of the game m – :termination state of game m – 10
  • 11. Minimum squared error technique using psedoinverse • To solve equation (1) • Goal : find a weight vector a satisfying following equation – d : # of feature – n : # of samples • Formal solution – Y is nonsingular • Error vector 11
  • 12. Squared error criterion function • Minimize the squared length of error vector • Define the error criterion function • Using gradient method for simplify • Necessary condition yield by above equation has 0 value – : psedoinverse 12
  • 13. Apply to tetris problem • Let • Equation (1) – M : # of games – n : # of samples – : feature vector 13
  • 14. Simulation Result • Let – = 0.6 – Test 100 game • using random seed 0 to 100 • Simple TD algorithm is our heuristic algorithm • Our learning algorithm improve 2010% of the heuristic algorithm 14
  • 16. Conclusion • Goal of project – Make a algorithm to achieve the highest average cost • Our learning algorithm is powerful – Average score and maximum score is satisfied to compare with heuristic algorithm • Problem of deviation – Deviation : difference between the highest score and lowest score – Our learning algorithm gives big deviation • Suggest – Reduce the deviation without dropped average score 16
  • 17. Reference [1] Bertsekas, D. P. and Tsitsiklis, J. N., 1996, "Neuro-Dynam ic Programming", Athena Scientific. [2] Colin Fahey., 2003, "Tetris AI", http://www.colinfahey.com [3] Dimitri P. Bertsekas2 and Sergey Ioffe.1996, "Temporal Di fferences-Based Policy Iterationand Applications in NeuroDynamic Programming" LIDS-P-2349 [4] Donald Carr., 2005, "Applying reinforcement learning to T etris", Dept. of CS, Rhodes University. South Africa. [5] Niko Böhm at el., 2005, "An Evolutionary Approach to Tetr is", MIC2005: The Sixth Metaheuristics International Confe rence, Vienna, Austria. [6] Richard S. Sutton and Andrew G. Barto., 1998, "Reinforce ment Learning: An Introduction", The MIT Press. 17