SlideShare a Scribd company logo
IaGo: an Othello AI
inspired by AlphaGo
Shion HONDA
@DSP
Overview
2
• I implemented an Othello AI (IaGo) inspired
by AlphaGo algorithm
• AlphaGo is composed of 3 parts:
• SL policy network: predict next action
• Value network: evaluate board state
• MCTS: choose action using 2 networks
Background
Game Search space AI Year
Othello 10^60 NEC Logistello 1997
Go 10^360 DeepMind AlphaGo 2016
3
• Go has extremely huge search space: 10360
• c.f. Estimated number of all atoms existing in the
universe: 1080
• Before AlphaGo, it had been thought to take
10 more years for Go AIs to beat human
professional due to its huge search space
• Since I don’t have enough machine resources
for replicating AlphaGo, I made Othello
version
Dataset
4
Board state Place of next stone
6 million -> 48 million
• Data were from online Othello game records
• 6 million sets of board state & the place of
next stone
• Augmented them by 8 times using rotation &
transposition symmetry
SL policy network (classification)
• Input: 2-ch matrices of board state
• Output: Probability distribution of next choice
• Network: 9 layers of convolution with
softmax output layer
• 57% accuracy of prediction
5
RL policy network
• Polished SL policy with policy gradients
-> Reinforcement Learning policy network
• After training, generated teacher data for
value network
• Played games between RL policy networks
-> 1.25 million sets of board state and result
• Augmented by 8 times -> 10 million
6
SL policy network
SL policy network
(opponent)
VS
WIN -> encourage its plays
LOSE -> discourage its plays
(32*400=12,800 times)
Value network (regression)
• Input: 2-ch matrices of board state
• Output: Value of the board state
(Win: +1, Lose: -1, Draw: 0)
• Network: 9 layers of convolution (similar to
the SL policy network)
7
Prediction examples
Monte Carlo tree search
• Rollout policy: simplified SL policy network
that works faster
• MCTS: search deeper for a good path
1. Make child node by
SL policy network
2. Evaluate current node
by value network and
the result of rollout policy
self-play
3. Update ancestor nodes’ value
4. Choose most visited node
8
Results
• IaGo (complete) beat simple SL policy in
approx. 90% of games!
• Still, there is room for improvement…
• It takes too long time for calculation
• IaGo seems to have a weak point
• Teacher data were from games
between amateurs
• Objective/quantitative evaluation is
needed
• Graphical User Interface
-> Upload to web!
9
Summary
• IaGo is composed of 3 parts:
• SL policy network: predict next action
• Value network: evaluate board state
• MCTS: choose action using 2 networks
• IaGo became a good player through training
10

More Related Content

PPTX
Cryptocurrency - Digital Currency
PPTX
Introduction To Cryptocurrency
PDF
Crypto currencies presentation by Dr. Andre Gholam
PPTX
The Blockchain and the Future of Cybersecurity
PPTX
Cryptocurrency
PPTX
Privacy in the Age of AI Strategies for Navigating Ethical Challenges.pptx
PPS
Becoming a trusted advisor - what consultancy really is about
Cryptocurrency - Digital Currency
Introduction To Cryptocurrency
Crypto currencies presentation by Dr. Andre Gholam
The Blockchain and the Future of Cybersecurity
Cryptocurrency
Privacy in the Age of AI Strategies for Navigating Ethical Challenges.pptx
Becoming a trusted advisor - what consultancy really is about

What's hot (20)

PPTX
Crypto currency Ppt Presentation
PDF
Effective decision making
PPTX
Block chain technology
PPTX
Bitcoin
PPTX
Why people lie
PPSX
Cryptocurrency
PPTX
Cryptocurrency
PDF
Cryptocurrencies and Blockchain technology
PPTX
Introduction to the World of Cryptocurrency (Summary)
PDF
The fundamentals of cryptocurrencies
PPTX
AI and Privacy
PPT
48 Laws Of Power
PPTX
Cryptocurrency
PPTX
Crypto currency presentation
PPTX
Cryptocurrency
PPTX
BITCOIN TECHNOLOGY AND ITS USES
PPTX
Treating Explosive Kids - Part 2
PDF
Advantages and Disadvantages of Cryptocurrencies such as OneCoin
PPTX
Cryptocurrency
PPT
Thinking fast and slow. Decision making
Crypto currency Ppt Presentation
Effective decision making
Block chain technology
Bitcoin
Why people lie
Cryptocurrency
Cryptocurrency
Cryptocurrencies and Blockchain technology
Introduction to the World of Cryptocurrency (Summary)
The fundamentals of cryptocurrencies
AI and Privacy
48 Laws Of Power
Cryptocurrency
Crypto currency presentation
Cryptocurrency
BITCOIN TECHNOLOGY AND ITS USES
Treating Explosive Kids - Part 2
Advantages and Disadvantages of Cryptocurrencies such as OneCoin
Cryptocurrency
Thinking fast and slow. Decision making
Ad

Similar to IaGo: an Othello AI inspired by AlphaGo (20)

PDF
AlphaGo and AlphaGo Zero
PDF
Alpha go 16110226_김영우
PDF
Introduction to Alphago Zero
PDF
J-Fall 2017 - AI Self-learning Game Playing
PPTX
Study on Evaluation Function Design of Mahjong using Supervised Learning
PDF
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
PDF
G-Store: High-Performance Graph Store for Trillion-Edge Processing
PPTX
ConvNets_C_Focke2
PDF
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
PDF
How DeepMind Mastered The Game Of Go
PPTX
Final Presentation - Edan&Itzik
PDF
Games.4
PPTX
Starcraft 2016
PPTX
Artificial neural networks introduction
PDF
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
PDF
04 accelerating dl inference with (open)capi and posit numbers
PPT
Harlan Beverly Lag The Barrier to innovation gdc austin 2009
PDF
Monte Carlo C++
PPTX
Gdc gameplay replication in acu with videos
PDF
.NET Memory Primer (Martin Kulov)
AlphaGo and AlphaGo Zero
Alpha go 16110226_김영우
Introduction to Alphago Zero
J-Fall 2017 - AI Self-learning Game Playing
Study on Evaluation Function Design of Mahjong using Supervised Learning
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
G-Store: High-Performance Graph Store for Trillion-Edge Processing
ConvNets_C_Focke2
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
How DeepMind Mastered The Game Of Go
Final Presentation - Edan&Itzik
Games.4
Starcraft 2016
Artificial neural networks introduction
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
04 accelerating dl inference with (open)capi and posit numbers
Harlan Beverly Lag The Barrier to innovation gdc austin 2009
Monte Carlo C++
Gdc gameplay replication in acu with videos
.NET Memory Primer (Martin Kulov)
Ad

More from Shion Honda (11)

PDF
BERTをブラウザで動かしたい! ―MobileBERTとTensorFlow.js―
PPTX
Bridging between Vision and Language
PPTX
Graph U-Nets
PPTX
Deep Learning Chap. 12: Applications
PPTX
Deep Learning Chap. 6: Deep Feedforward Networks
PPTX
画像認識 第9章 さらなる話題
PPTX
Towards Predicting Molecular Property by Graph Neural Networks
PPTX
画像認識 6.3-6.6 畳込みニューラル ネットワーク
PPTX
深層学習による自然言語処理 第2章 ニューラルネットの基礎
PDF
BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...
PDF
Planning chemical syntheses with deep neural networks and symbolic AI
BERTをブラウザで動かしたい! ―MobileBERTとTensorFlow.js―
Bridging between Vision and Language
Graph U-Nets
Deep Learning Chap. 12: Applications
Deep Learning Chap. 6: Deep Feedforward Networks
画像認識 第9章 さらなる話題
Towards Predicting Molecular Property by Graph Neural Networks
画像認識 6.3-6.6 畳込みニューラル ネットワーク
深層学習による自然言語処理 第2章 ニューラルネットの基礎
BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...
Planning chemical syntheses with deep neural networks and symbolic AI

Recently uploaded (20)

PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Transcultural that can help you someday.
PPTX
Leprosy and NLEP programme community medicine
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
New ISO 27001_2022 standard and the changes
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
modul_python (1).pptx for professional and student
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Introduction to the R Programming Language
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Qualitative Qantitative and Mixed Methods.pptx
annual-report-2024-2025 original latest.
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Database Infoormation System (DBIS).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Transcultural that can help you someday.
Leprosy and NLEP programme community medicine
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
[EN] Industrial Machine Downtime Prediction
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
A Complete Guide to Streamlining Business Processes
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
CYBER SECURITY the Next Warefare Tactics
New ISO 27001_2022 standard and the changes
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
modul_python (1).pptx for professional and student
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to the R Programming Language
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt

IaGo: an Othello AI inspired by AlphaGo

  • 1. IaGo: an Othello AI inspired by AlphaGo Shion HONDA @DSP
  • 2. Overview 2 • I implemented an Othello AI (IaGo) inspired by AlphaGo algorithm • AlphaGo is composed of 3 parts: • SL policy network: predict next action • Value network: evaluate board state • MCTS: choose action using 2 networks
  • 3. Background Game Search space AI Year Othello 10^60 NEC Logistello 1997 Go 10^360 DeepMind AlphaGo 2016 3 • Go has extremely huge search space: 10360 • c.f. Estimated number of all atoms existing in the universe: 1080 • Before AlphaGo, it had been thought to take 10 more years for Go AIs to beat human professional due to its huge search space • Since I don’t have enough machine resources for replicating AlphaGo, I made Othello version
  • 4. Dataset 4 Board state Place of next stone 6 million -> 48 million • Data were from online Othello game records • 6 million sets of board state & the place of next stone • Augmented them by 8 times using rotation & transposition symmetry
  • 5. SL policy network (classification) • Input: 2-ch matrices of board state • Output: Probability distribution of next choice • Network: 9 layers of convolution with softmax output layer • 57% accuracy of prediction 5
  • 6. RL policy network • Polished SL policy with policy gradients -> Reinforcement Learning policy network • After training, generated teacher data for value network • Played games between RL policy networks -> 1.25 million sets of board state and result • Augmented by 8 times -> 10 million 6 SL policy network SL policy network (opponent) VS WIN -> encourage its plays LOSE -> discourage its plays (32*400=12,800 times)
  • 7. Value network (regression) • Input: 2-ch matrices of board state • Output: Value of the board state (Win: +1, Lose: -1, Draw: 0) • Network: 9 layers of convolution (similar to the SL policy network) 7 Prediction examples
  • 8. Monte Carlo tree search • Rollout policy: simplified SL policy network that works faster • MCTS: search deeper for a good path 1. Make child node by SL policy network 2. Evaluate current node by value network and the result of rollout policy self-play 3. Update ancestor nodes’ value 4. Choose most visited node 8
  • 9. Results • IaGo (complete) beat simple SL policy in approx. 90% of games! • Still, there is room for improvement… • It takes too long time for calculation • IaGo seems to have a weak point • Teacher data were from games between amateurs • Objective/quantitative evaluation is needed • Graphical User Interface -> Upload to web! 9
  • 10. Summary • IaGo is composed of 3 parts: • SL policy network: predict next action • Value network: evaluate board state • MCTS: choose action using 2 networks • IaGo became a good player through training 10

Editor's Notes

  • #2: Thank you Mr. Bayne. Good afternoon! Recently I learned about AlphaGo, an AI for playing game of Go, and implemented its algorithm in an othello version. So, let me tell you how I made it and how it works.
  • #3: AlphaGo is composed of these 3 parts: First, policy network, that predicts next action. Second, value network. that evaluates board state. And third, Monte Carlo tree search, that chooses action using two networks. So, I’ll now explain them a little in detail.
  • #4: First of all, let me mention that go has extremely huge search space of 10 to the 360th power. I guess it's hard to imagine, So I'll give you one example. Estimated number of all atoms existing in the universe. It's 10 to the 80th power. Again, the search space of Go is 10 to the 360th power, so it's far far far bigger than the number of all atoms in the universe . Because of this huge search space, before AlphaGo, it had been thought to take 10 more years for Go AIs to beat human professionals. Imagine what a big achievement AlphaGo made! But since I don't have enough machine resources for replicating AlphaGo, I made an Othello version. The search space of Othello is just 10 to the 60 power.
  • #5: I’ve now told you about the background. I’ll move on to dataset I used for training IaGo. Data were from online Othello game records that you can get for free on the internet. It includes 6 million sets of Board state and the place of next stone. Then I augmented them by 8 times using rotation and transposition symmetry. So finally, I got 48 million sets of board state and the place of next stone.
  • #6: The first part of IaGo: Supervised Learning policy network. It got 2 channel matrix of board state as an input, and output probability distribution of next choice, next action. The network was 9 layers of convolution with softmax output layer. After training, it predicted human plays at the accuracy of 57%.
  • #7: Next, I polished SL policy network with policy gradients algorithm. The polished network is called reinforcement learning policy Network or RL policy network for short. In the process of reinforcement learning, 2 SL policy networks played games against each other. Parameters of network was updated so that good actions were encouraged and bad actions were discouraged, according to the result of the game. I repeated this for more than 12000 times. After training, RL policy network generated teacher data for value Network. 2 RL policy networks played games against each other. Then I got 1.25 Million sets of board state and result. Again I augmented them by 8 times so finally I got 10 million sets of  Board State and result.
  • #8: Next I'll talk about Value Network. This Network is very similar to the SL policy Network in terms of the structure. What’s the difference? While SL policy network is for classification of next action, value network is for regression of the game result. Value network gets 2 channel Matrix of board state and outputs the value of the board state. I defined the value of the Board State as +1 for win, - 1 for lose, and 0 for draw. So the value means the likelihood of winning of the white player. Look at the example pictures. For the left one, white player is almost winning, so the value is 0.67 roughly equal 1. For one on the center, the white player is almost losing so the value is nearly equal to -1. And for the right one, you'll never know the result so the value is around to 0.
  • #9: Let's move on to the final part of the algorithm, Monte Carlo tree search. First I made a rollout policy. This is a simplified SL policy Network. Its prediction accuracy was lower than SL policy network but worked much faster. In MCTS I have to run many many simulations so I need a predictor that works fast. MCTS, in short, is an algorithm that searches deeper for a good path in the game tree using self-play simulation. And it’s composed of four steps. Step 1, make a child node by SL policy Network. Step 2, evaluate current node by value Network and the result of rollout policy self play. Step 3, update ancestor nodes’ value according to the rollout policy self-play. Step 4, choose most visited node.
  • #10: I’ve told you about the algorithm of Iago, so I’ll now talk about its performance. Iago played some games against simple SL policy Network and won approximately 90% of games. Still, there is room for improvement. First, it takes too long time for calculation. If I can make it shorter, then IaGo can run more simulations and will become stronger. Second Iago seems to have a weak point. The picture on the right side was taken when I beat complete version of Iago. I took all of its stones, and the game was over in the course of it. I'm not sure about it's cause, but I guess one reason is that teacher data were from games between amateur players, not professionals. Fourth, I couldn't really evaluate IaGo’s performance in an objective or quantitative way, so a more appropriate evaluation is needed. And finally, I’d like to develop a sophisticated graphical user interface and uploaded it to the web so that everyone can play it easily just by clicking.
  • #11: Let me summarize my presentation. I’ve explained IaGo’s algorithm and its performance. IaGo is composed of three parts. SL policy Network that predicts next action Value network that evaluates board state. Monte Carlo tree search that uses action using these two Networks. And Iago became a good player through training using huge dataset. That's it for my presentation. Do you have any questions?