IaGo: an Othello AI inspired by AlphaGo

IaGo: an Othello AI
inspired by AlphaGo
Shion HONDA
@DSP

Overview
2
• I implemented an Othello AI (IaGo) inspired
by AlphaGo algorithm
• AlphaGo is composed of 3 parts:
• SL policy network: predict next action
• Value network: evaluate board state
• MCTS: choose action using 2 networks

Background
Game Search space AI Year
Othello 10^60 NEC Logistello 1997
Go 10^360 DeepMind AlphaGo 2016
3
• Go has extremely huge search space: 10360
• c.f. Estimated number of all atoms existing in the
universe: 1080
• Before AlphaGo, it had been thought to take
10 more years for Go AIs to beat human
professional due to its huge search space
• Since I don’t have enough machine resources
for replicating AlphaGo, I made Othello
version

Dataset
4
Board state Place of next stone
6 million -> 48 million
• Data were from online Othello game records
• 6 million sets of board state & the place of
next stone
• Augmented them by 8 times using rotation &
transposition symmetry

SL policy network (classification)
• Input: 2-ch matrices of board state
• Output: Probability distribution of next choice
• Network: 9 layers of convolution with
softmax output layer
• 57% accuracy of prediction
5

RL policy network
• Polished SL policy with policy gradients
-> Reinforcement Learning policy network
• After training, generated teacher data for
value network
• Played games between RL policy networks
-> 1.25 million sets of board state and result
• Augmented by 8 times -> 10 million
6
SL policy network
SL policy network
(opponent)
VS
WIN -> encourage its plays
LOSE -> discourage its plays
(32*400=12,800 times)

Value network (regression)
• Input: 2-ch matrices of board state
• Output: Value of the board state
(Win: +1, Lose: -1, Draw: 0)
• Network: 9 layers of convolution (similar to
the SL policy network)
7
Prediction examples

Monte Carlo tree search
• Rollout policy: simplified SL policy network
that works faster
• MCTS: search deeper for a good path
1. Make child node by
SL policy network
2. Evaluate current node
by value network and
the result of rollout policy
self-play
3. Update ancestor nodes’ value
4. Choose most visited node
8

Results
• IaGo (complete) beat simple SL policy in
approx. 90% of games!
• Still, there is room for improvement…
• It takes too long time for calculation
• IaGo seems to have a weak point
• Teacher data were from games
between amateurs
• Objective/quantitative evaluation is
needed
• Graphical User Interface
-> Upload to web!
9

Summary
• IaGo is composed of 3 parts:
• SL policy network: predict next action
• Value network: evaluate board state
• MCTS: choose action using 2 networks
• IaGo became a good player through training
10

IaGo: an Othello AI inspired by AlphaGo

More Related Content

What's hot (20)

Similar to IaGo: an Othello AI inspired by AlphaGo (20)

More from Shion Honda (11)

Recently uploaded (20)

IaGo: an Othello AI inspired by AlphaGo

Editor's Notes