TETRIS AI WITH REINFORCEMENT LEARNING

TETRIS WAR
Team 4
2008.12.1
JungKyu Lee
SangYun Kim
Shinnyue Kang
1

Contents
• General Idea Description
– Approximation using feature based MDP
– policy iteration

• Apply to Tetris
– Problem description
– MDP formulation
– Feature based MDP formulation

• Result
• Conclusion
2

General Idea
• Infinite horizon MDP with discount factor

0

1

• Goal : Find a policy
: X that maximize
A
V
value function cost-to-go vector

– Let policy

{ 0,

1

,...}
3

Cost-to-go value V

*

• Definition of optimal cost-to-go vector

V

*

• By bellman’s optimal equation
• Using Optimal stationary policy

{ , ,...}

• Then optimal equation is given as

4

Policy iteration
•

Policy iteration

• The value is updated as follows
• Vector

thas

components given by

• Temporal difference(TD) associated with each
)
transition (i, junder
t 1
5

Tetris
• Board size
– Width 10 ; height 22

• Blocks
– 7 pieces with predetermined
probability

• Score
– (number of erased line)2X100

• Action
– Left, Right, Rotate, No move
6

MDP formulation
• MDP Model for tetris
– States : X={wall configuration + Piece}
– Actions : A={rotation, right, left, nomove}
– Transitions : deterministic new wall after(i,a) + uniform
random new piece
– Reward : r(i,a,j)=number of lines removed after (i,a)

• A value function can be computed only on the set
of wall configurations.
• The optimal value function V* is the best average
score!
7

Approximation for tetris
• Number of state is too Large
to compute
– Feature-based MDP

• Feature
– Each column’s height, width
– Absolute difference between
adjacency column
– Maximum height
– Number of holes
8

Value function
~
• We defined approximated value function using
V
above feature

• Where

is a vector of features for state k

• Finally
• Our decision is as follows

9

Weight vector
~
• Iteration to approximate V the optimal value
to
function for V *
policy iteration
• weight vector (equation 1)

– M games
–
:state sequence of the game m
–
:termination state of game m
–
10

Minimum squared error technique
using psedoinverse
• To solve equation (1)
• Goal : find a weight vector a satisfying
following equation
– d : # of feature
– n : # of samples

• Formal solution
– Y is nonsingular

• Error vector
11

Squared error criterion function
• Minimize the squared length of error vector
• Define the error criterion function
• Using gradient method for simplify

• Necessary condition yield by above equation
has 0 value
–

: psedoinverse
12

Apply to tetris problem
• Let

• Equation (1)
– M : # of games
– n : # of samples
– : feature vector

13

Simulation Result
• Let
–
= 0.6
– Test 100 game
• using random seed 0 to 100

• Simple TD algorithm is our heuristic algorithm
• Our learning algorithm improve 2010% of the
heuristic algorithm
14

Simulation result
• zxzxz

15

Conclusion
• Goal of project
– Make a algorithm to achieve the highest average
cost

• Our learning algorithm is powerful
– Average score and maximum score is satisfied to
compare with heuristic algorithm

• Problem of deviation
– Deviation : difference between the highest score and
lowest score
– Our learning algorithm gives big deviation

• Suggest
– Reduce the deviation without dropped average score
16

Reference
[1] Bertsekas, D. P. and Tsitsiklis, J. N., 1996, "Neuro-Dynam
ic Programming", Athena Scientific.
[2] Colin Fahey., 2003, "Tetris AI", http://www.colinfahey.com
[3] Dimitri P. Bertsekas2 and Sergey Ioffe.1996, "Temporal Di
fferences-Based Policy Iterationand Applications in NeuroDynamic Programming" LIDS-P-2349
[4] Donald Carr., 2005, "Applying reinforcement learning to T
etris", Dept. of CS, Rhodes University. South Africa.
[5] Niko Böhm at el., 2005, "An Evolutionary Approach to Tetr
is", MIC2005: The Sixth Metaheuristics International Confe
rence, Vienna, Austria.
[6] Richard S. Sutton and Andrew G. Barto., 1998, "Reinforce
ment Learning: An Introduction", The MIT Press.

17

TETRIS AI WITH REINFORCEMENT LEARNING

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to TETRIS AI WITH REINFORCEMENT LEARNING (20)

Recently uploaded (20)

TETRIS AI WITH REINFORCEMENT LEARNING