SlideShare a Scribd company logo
Study on Evaluation Function Design of Mahjong
using Supervised Learning
Hokaido University
Graduate School of Information Science and Technology
Harmonious Systems Engineering Laboratory
Yeqin Zheng
1
Background
• Perfect information games
– 1997 -- Deep blue vs. world champion on chess
– 2007 -- Quackle vs. world champion on scrabble
– 2016 -- AlphaGo vs. world champion on Go
• Monte Carlo tree search theory
• Deep learning method for pre-train network
– AlphaGo Zero vs. AlphaGo on Go
• Deep learning method
• Reinforcement learning
• Imperfect information games
– Uncertainty
– Randomness
– Complex rules
– Difficult for simulation
*
Previous research’s model.
• Naoki Mizukami and Yoshimasa Tsuruoka. Building a Computer
Mahjong Player Based on Monte Carlo Simulation and
Opponent Models, Proceedings of the 2015 IEEE Conference
on Computational Intelligence and Games (CIG 2015), pp.275-
283, Aug. 2015.
• Monte Carlo tree search to simulate opponents' movement
• Prediction of game states.
3
Purpose
• This study is about using supervised learning
theory and deep learning method on imperfect
information game -- Mahjong.
• Improvement:
– New feature engineering
• Improve the training results of networks
– Discard method
• Improve aggressive during games
4
Introduction of Mahjong Rule
• Mahjong tiles consist of 4 types, 34 different tile
and each tile has 4 pieces, totally 136 pieces.
5
Hand:
consist of tiles
River:
discarded tiles
Dora tile
Mountain:
invisible tiles for
stealing
Meld: Open hands
Goal of Mahjong
• Goal of mahjong is to make a winning hand into a special
format.
• There are two different types for earning points:
6
Tsumo: get last tile from mountain and earn
points from all other players
Ron: use other player’s last discarded tile
and earn points from that player
Difficulty & Approach
• Difficulty
– Imperfect information game has much more states than
perfect information game.
• It's almost impossible to meet a same game state from any game
you played ever.
– Randomness and uncertainty will fill the entire game
process.
• Approach
– Dividing the movements during games into several types.
– Using multi-networks and methods to make different
movements in different states.
7
Introduction of Tenhou.net
Tenhou is the one of the most popular online mahjong
service in Japan.
• 4,870,311 users totally.
• About 5000 players on
line on the same time.
• Our training data are all
from “houou” table
8
Tenhou.net Model
Game states
Decision
Introduction of Tenhou.net's API 9
Data from Tenhou.net Mean Example
T/U/V/W (+ ID) T/U/V/W: Player's position
ID: Tile ID from 0 to 135
T123 #Dealer steals a North
V #Player in position west steal a tile
D/E/F/G + ID D/E/F/G: Player's position
ID: Tile ID from 0 to 135
E123 #Player in position south
discards a North
Reach who= “Player's
position”
Who makes a call of riichi Reach who="2"
#Player in position west calls a riichi
N who=”Player's position”
m=”meld"
Who makes a call of meld N who="3" m=``34567"
#Player in position north calls a meld
Agari Who makes a call of winning and his hands, point changes, waiting
tile, yaku and who lose point
Ryuukyoku End a round without anyone wins and the point changes
Data to Tenhou.net Mean Example
T + ID Discard a tile and the tile's ID T123 #You discard a North
Reach who=”0" Make a call of riichi
N who=”0" m=”meld" Make a call of meld N who="0" m=``34567"
Agari Make a call of winning
Process of Decision Making 10
Player: Steal a tile
System: Win check
Player: Decide a tile to be discarded
Player: Call winYes
System: riichi check
Player: Discard
Player: Call riichi & discardYes
Last player's turn
Next player's turn
System: Win check Opponent: Call winYes
Introduction of Related Terminology
• Waiting/Tenpai: One or more players have made winning
hands and waiting for the last tiles to earn score.
• N shanten: After n effective tiles drawn into hands by
player will make hands into winning hands and enter
waiting state.
11
Aggressive Move
• Two types of game states
– No one is in waiting (Attack route): Discard a tile to make hands closer to winning
hands and earn score, which may lead to a decrease in number of shanten.
– Someone may in waiting (Defense route):
• Aggressive move: Player choose a tile that may decrease the number of shanten and
unsafe for current game state, also may lead to a decrease in player’s score because other
players may have entered waiting states and waiting for this tile.
• Safe move: Discard a tile that has less danger of losing score and give up to win, which
may make hands away from winning hands and lead to an increase in number of shanten.
12
In this case, player D has make a riichi
(Someone is in waiting):
- Aggressive move: discard a tile to
turn into waiting state which may
lead to losing point.
- Safe move: discard a tile that player
D has discarded will lead to an
increase in number of shanten.
Without aggressive move:
- Fold always
- Difficult to make a winning hands
Model Details -- Networks 13
Choose a tile
to discard
Waiting-tile network
(WTR)
Waiting-or-not
network
(WR)
opponents'
waiting
probability
probability of
34 tiles that
maybe waited
Discard network
(DR)
Lose-point
network
(LP)
probability
of 34 tiles
that maybe
discarded
probability
of point may
lose for the
tiles in hand
Defense/fold route
WR > threshold
6*6*107(108)
feature map
Discard network
(DR)
WR ≤ threshold
probability
of 34 tiles
that maybe
discarded
Attack route
Model Details -- Networks
• No one is in waiting
– Maximum of the output from discard network
• Someone may in waiting
– Minimum of lose point expert (LPE)
– LPEi = WR * (DRi *) WTRi * LPi ,
where i is tile ID which in hands. In order to increse aggressive
move, the output from discard network will be multiplied to LPE.
• The threshold to turn mode
– Collecting the data of games states when there is player in
waiting.
– Using the waiting-or-not network to calculate the
probability for these games states.
– Calculating the average of outputs which is 0.245.
14
Model Details -- Feature Engineering
• Matrix with strong connection between each adjacent nodes in
matrix performs better for convolutional neural network (CNN).
• Modeling each non-repeating tile into a vector space.
• Turning the vector space into 6 * 6 matrix base.
15
Features in Feature Map
Feature map
hands, 4 layers
river, 4 layers
turn's movement, 24 * 4 layers
dora tiles, 1 layer
invisible tiles, 1 layer
close hand, 1 layer
(discard tile, 1 layer)
16
107 layers feature map will not
include the discard tile feature.
Networks Details 17
Network Content Output Data
amount
WR waiting-or-not
network
predict the probability that
other players are waiting
a probability about whether other
players is in waiting or not (From 0
to 1)
300,000
WTR waiting-tiles
network
predict the probabilities of
tiles that others may wait for
a list of 34 probabilities about how
dangerous 34 tiles
4,000*34
DR discard network predict which tile in hand will
be discarded if player is a
mahjong high level player
a list of probability which are 34
tiles' probabilities of being discarded
100,000*34
Training data:
Waiting-or-not network:
Input: 107 layer feature map
Output:
1: someone is in waiting
0: no one is in waiting
Waiting-tiles network:
Input: 107 layer feature map
Output:
1: tiles being waited
0: other tiles
In waiting Wait for 1s and 4s
Networks Details 18
Network Content Output Data amount
LP lose-point
network
predict how many point will
lost if discard one tile
a list that consists of 6 probabilities
about how many han in other hand
if he wins this round
16,500*6
Training data:
Lose-point network:
• Input: 108 layer feature map
• Output: the lost for this discarded
tile
Networks Details 19
Number of
convolutional
kernels
Size of
convolutional
kernels
Edge
processing
padding
Activation
function
512 4*4 same relu
512 3*3 same none
512 2*2 same relu
Dropout
256 2*2 same none
256 3*3 same relu
Dropout
128 3*3 same none
128 2*2 same relu
Dropout
Full connected
6*6*107(108)
feature map as
input layer
Hidden layer
(Totally 7 layers)
...
...
Output layer and
full connected layer
Final Accuracy of Each Network 20
Network Accuracy
Waiting-or-not network 82.7%
Waiting-tiles network 40.2%
Lose-point network 88.7%
Discard network 88.4%
The Waiting-tiles
network has the
accuracy only 40.2% is
that the result only
calculate whether the
maximum of output is
being waited.
Experiment and Result
• Comparison of three models in our experiment
21
Model Game state Attack route Defense route
Best choice
algorithm (BCA)
Make a call of
riichi or open
hands with over
three melds
Choose the tile
which can make
hands closer to
winning hands
Choose the tile
which the in-
waiting player has
discarded
Combine BCA's
attack mode with
deep model for
defense
Make a prediction
that someone
may in waiting
Choose the tile
which can make
hands closer to
winning hands
Choose the tile
which will lead to
the least loss
Deep model
Make a prediction
that someone
may in waiting
Imitate expert
players discard
base on current
game state
Choose the tile
which will lead to
the least loss
Experiment and Result
• We perform 60 games for each model on “Ippan” table,
which every player can participate in.
22
Ippan
table
(avg. lv.
1.5)
Top 2nd 3rd 4th Win rate
Feed
rate
Aggressive
move
BCA 27% 30% 25% 18% 24% 11% 14%
BCA +
defense
model
17% 28% 45% 10% 18% 8% 0%
Deep
model
22% 27% 33% 18% 20% 9% 8%
Players'
average
(Tenhou)
20% 23% 27% 30% 20% 19% -
Geen: Worst performance Red: Best performance
Experiment and Result
• We perform 100 games for each model on “Joukyuu”
table.
23
Joukyuu
table
(avg. lv.
11.75)
Top 2nd 3rd 4th Win rate
Feed
rate
Aggressive
move
BCA 19% 23% 30% 28% 16% 18% 12%
BCA with
deep
model
22% 28% 33% 17% 17% 8% 1%
Deep
model
24% 29% 27% 20% 21% 11% 7%
Players'
average
(Tenhou)
25% 25% 25% 25% 23% 15% 17%
Geen: Worst performance Red: Best performance
Competition Between Each Model 24
1st/2nd/3rd/4th 1 BCA
1 BCA + defense
model
1 Deep model
3 BCA - 2/6/9/3 3/7/6/4
3 BCA + defense
model
6/4/5/5 - 5/6/5/4
3 Deep model 4/5/5/6 4/7/7/2 -
The result table shows that:
• BCA
• Good in attack
• Easy to be defended
• BCA + defense mode
• Great in defense
• Less aggressive move
• Deep model
• Good in defense
• Balance in defensive and offensive
We performed 20 game for each model with a 1 vs 3 games.
Comparison Between Discard Method
• Two discard methods show different performance during expriment.
• Make a comparison for these two methods.
• It’s easier to be speculate the non-deep learning AI’s state and what
tiles it’s waiting for.
• Deep model performs more like a human player than non-deep
learning AI in attack which we can get from the top rate and win rate.
25
Discard method Waiting
Waiting
rate
Waiting
prediction
Waiting
tiles
prediction
BCA 438 53.94% 91.32% 57.53%
Discard model 411 49.58% 83.43% 39.90%
Conclusion
• The deep model in this study shows a good performance
during Mahjong games.
– High 2nd rate.
– Aggressive move.
• New feature engineering performs good.
• Performance when model predicts that someone is in
waiting are better than human player’s average.
• It’s possible to make a better multi-network model based
on this experiment.
Thank you for listening.
26
Research performance
・Information Processing Society of Japan
1) Yeqin Zheng, Soichiro Yokoyama, Tomohisa Yamashita,
Hidenori Kawamura: Study on Evaluation Function Design of Mahjong
using Supervised Learning, Special Internet Groups(Sig), Vol 194,
Hokkaido(2019)
27

More Related Content

ODP
Choosing between several options in uncertain environments
PPT
PPTX
IaGo: an Othello AI inspired by AlphaGo
PDF
AlphaGo and AlphaGo Zero
PPTX
AI based Tic Tac Toe game using Minimax Algorithm
PPTX
AI3391 Artificial Intelligence Session 18 Monto carlo search tree.pptx
PDF
J-Fall 2017 - AI Self-learning Game Playing
PPTX
AI3391 Artificial intelligence Session 15 Min Max Algorithm.pptx
Choosing between several options in uncertain environments
IaGo: an Othello AI inspired by AlphaGo
AlphaGo and AlphaGo Zero
AI based Tic Tac Toe game using Minimax Algorithm
AI3391 Artificial Intelligence Session 18 Monto carlo search tree.pptx
J-Fall 2017 - AI Self-learning Game Playing
AI3391 Artificial intelligence Session 15 Min Max Algorithm.pptx

Similar to Study on Evaluation Function Design of Mahjong using Supervised Learning (20)

PPTX
AlphaGo
PDF
Android application - Tic Tac Toe
PPTX
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptx
PDF
Alpha go 16110226_김영우
PPTX
Starcraft 2016
PDF
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
PPTX
An evolutionary tic tac toe player ccit2012
PDF
Games.4
PDF
Introduction to Alphago Zero
PPTX
AI3391 Artificial Intelligence Session 20 partially observed games.pptx
PPTX
22PCOAM11 Unit 2: Session 7 Adversarial Search .pptx
PPTX
Intelligent Heuristics for the Game Isolation
PDF
Stratego
PDF
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
PDF
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...
PPTX
Analysis on steam platform
PDF
Multiplayer Networking Game
PDF
Gdmc v11 presentation
PDF
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
PPT
games, infosec, privacy, adversaries .ppt
AlphaGo
Android application - Tic Tac Toe
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptx
Alpha go 16110226_김영우
Starcraft 2016
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
An evolutionary tic tac toe player ccit2012
Games.4
Introduction to Alphago Zero
AI3391 Artificial Intelligence Session 20 partially observed games.pptx
22PCOAM11 Unit 2: Session 7 Adversarial Search .pptx
Intelligent Heuristics for the Game Isolation
Stratego
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...
Analysis on steam platform
Multiplayer Networking Game
Gdmc v11 presentation
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
games, infosec, privacy, adversaries .ppt
Ad

More from harmonylab (20)

PDF
【卒業論文】LLMを用いたMulti-Agent-Debateにおける反論の効果に関する研究
PDF
【卒業論文】深層学習によるログ異常検知モデルを用いたサイバー攻撃検知に関する研究
PDF
【卒業論文】LLMを用いたエージェントの相互作用による俳句の生成と評価に関する研究
PPTX
【修士論文】帝国議会および国会議事速記録における可能表現の長期的変遷に関する研究
PPTX
【修士論文】競輪における注目レース選定とLLMを用いたレース紹介記事生成に関する研究
PDF
【卒業論文】ステレオカメラによる車両制御における深層学習の適用に関する研究(A Study on Application of Deep Learning...
PDF
A Study on the Method for Generating Deformed Route Maps for Supporting Detou...
PPTX
【修士論文】LLMを用いた俳句推敲と批評文生成に関する研究
PDF
【修士論文】視覚言語モデルを用いた衣服画像ペアの比較文章生成に関する研究(A Study on the Generation of Comparative...
PPTX
【DLゼミ】Generative Image Dynamics, CVPR2024
PDF
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Tr...
PDF
Generating Automatic Feedback on UI Mockups with Large Language Models
PDF
【DLゼミ】XFeat: Accelerated Features for Lightweight Image Matching
PPTX
【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也
PPTX
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究
PPTX
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
PPTX
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究
PPTX
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究
PPTX
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...
PPTX
DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone
【卒業論文】LLMを用いたMulti-Agent-Debateにおける反論の効果に関する研究
【卒業論文】深層学習によるログ異常検知モデルを用いたサイバー攻撃検知に関する研究
【卒業論文】LLMを用いたエージェントの相互作用による俳句の生成と評価に関する研究
【修士論文】帝国議会および国会議事速記録における可能表現の長期的変遷に関する研究
【修士論文】競輪における注目レース選定とLLMを用いたレース紹介記事生成に関する研究
【卒業論文】ステレオカメラによる車両制御における深層学習の適用に関する研究(A Study on Application of Deep Learning...
A Study on the Method for Generating Deformed Route Maps for Supporting Detou...
【修士論文】LLMを用いた俳句推敲と批評文生成に関する研究
【修士論文】視覚言語モデルを用いた衣服画像ペアの比較文章生成に関する研究(A Study on the Generation of Comparative...
【DLゼミ】Generative Image Dynamics, CVPR2024
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Tr...
Generating Automatic Feedback on UI Mockups with Large Language Models
【DLゼミ】XFeat: Accelerated Features for Lightweight Image Matching
【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...
DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone
Ad

Recently uploaded (20)

PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPTX
UNIT - 3 Total quality Management .pptx
PDF
PPT on Performance Review to get promotions
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
PPTX
Current and future trends in Computer Vision.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
737-MAX_SRG.pdf student reference guides
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT
Occupational Health and Safety Management System
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Safety Seminar civil to be ensured for safe working.
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
UNIT - 3 Total quality Management .pptx
PPT on Performance Review to get promotions
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
III.4.1.2_The_Space_Environment.p pdffdf
R24 SURVEYING LAB MANUAL for civil enggi
Integrating Fractal Dimension and Time Series Analysis for Optimized Hyperspe...
Current and future trends in Computer Vision.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
737-MAX_SRG.pdf student reference guides
Nature of X-rays, X- Ray Equipment, Fluoroscopy
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Occupational Health and Safety Management System
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

Study on Evaluation Function Design of Mahjong using Supervised Learning

  • 1. Study on Evaluation Function Design of Mahjong using Supervised Learning Hokaido University Graduate School of Information Science and Technology Harmonious Systems Engineering Laboratory Yeqin Zheng 1
  • 2. Background • Perfect information games – 1997 -- Deep blue vs. world champion on chess – 2007 -- Quackle vs. world champion on scrabble – 2016 -- AlphaGo vs. world champion on Go • Monte Carlo tree search theory • Deep learning method for pre-train network – AlphaGo Zero vs. AlphaGo on Go • Deep learning method • Reinforcement learning • Imperfect information games – Uncertainty – Randomness – Complex rules – Difficult for simulation *
  • 3. Previous research’s model. • Naoki Mizukami and Yoshimasa Tsuruoka. Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models, Proceedings of the 2015 IEEE Conference on Computational Intelligence and Games (CIG 2015), pp.275- 283, Aug. 2015. • Monte Carlo tree search to simulate opponents' movement • Prediction of game states. 3
  • 4. Purpose • This study is about using supervised learning theory and deep learning method on imperfect information game -- Mahjong. • Improvement: – New feature engineering • Improve the training results of networks – Discard method • Improve aggressive during games 4
  • 5. Introduction of Mahjong Rule • Mahjong tiles consist of 4 types, 34 different tile and each tile has 4 pieces, totally 136 pieces. 5 Hand: consist of tiles River: discarded tiles Dora tile Mountain: invisible tiles for stealing Meld: Open hands
  • 6. Goal of Mahjong • Goal of mahjong is to make a winning hand into a special format. • There are two different types for earning points: 6 Tsumo: get last tile from mountain and earn points from all other players Ron: use other player’s last discarded tile and earn points from that player
  • 7. Difficulty & Approach • Difficulty – Imperfect information game has much more states than perfect information game. • It's almost impossible to meet a same game state from any game you played ever. – Randomness and uncertainty will fill the entire game process. • Approach – Dividing the movements during games into several types. – Using multi-networks and methods to make different movements in different states. 7
  • 8. Introduction of Tenhou.net Tenhou is the one of the most popular online mahjong service in Japan. • 4,870,311 users totally. • About 5000 players on line on the same time. • Our training data are all from “houou” table 8 Tenhou.net Model Game states Decision
  • 9. Introduction of Tenhou.net's API 9 Data from Tenhou.net Mean Example T/U/V/W (+ ID) T/U/V/W: Player's position ID: Tile ID from 0 to 135 T123 #Dealer steals a North V #Player in position west steal a tile D/E/F/G + ID D/E/F/G: Player's position ID: Tile ID from 0 to 135 E123 #Player in position south discards a North Reach who= “Player's position” Who makes a call of riichi Reach who="2" #Player in position west calls a riichi N who=”Player's position” m=”meld" Who makes a call of meld N who="3" m=``34567" #Player in position north calls a meld Agari Who makes a call of winning and his hands, point changes, waiting tile, yaku and who lose point Ryuukyoku End a round without anyone wins and the point changes Data to Tenhou.net Mean Example T + ID Discard a tile and the tile's ID T123 #You discard a North Reach who=”0" Make a call of riichi N who=”0" m=”meld" Make a call of meld N who="0" m=``34567" Agari Make a call of winning
  • 10. Process of Decision Making 10 Player: Steal a tile System: Win check Player: Decide a tile to be discarded Player: Call winYes System: riichi check Player: Discard Player: Call riichi & discardYes Last player's turn Next player's turn System: Win check Opponent: Call winYes
  • 11. Introduction of Related Terminology • Waiting/Tenpai: One or more players have made winning hands and waiting for the last tiles to earn score. • N shanten: After n effective tiles drawn into hands by player will make hands into winning hands and enter waiting state. 11
  • 12. Aggressive Move • Two types of game states – No one is in waiting (Attack route): Discard a tile to make hands closer to winning hands and earn score, which may lead to a decrease in number of shanten. – Someone may in waiting (Defense route): • Aggressive move: Player choose a tile that may decrease the number of shanten and unsafe for current game state, also may lead to a decrease in player’s score because other players may have entered waiting states and waiting for this tile. • Safe move: Discard a tile that has less danger of losing score and give up to win, which may make hands away from winning hands and lead to an increase in number of shanten. 12 In this case, player D has make a riichi (Someone is in waiting): - Aggressive move: discard a tile to turn into waiting state which may lead to losing point. - Safe move: discard a tile that player D has discarded will lead to an increase in number of shanten. Without aggressive move: - Fold always - Difficult to make a winning hands
  • 13. Model Details -- Networks 13 Choose a tile to discard Waiting-tile network (WTR) Waiting-or-not network (WR) opponents' waiting probability probability of 34 tiles that maybe waited Discard network (DR) Lose-point network (LP) probability of 34 tiles that maybe discarded probability of point may lose for the tiles in hand Defense/fold route WR > threshold 6*6*107(108) feature map Discard network (DR) WR ≤ threshold probability of 34 tiles that maybe discarded Attack route
  • 14. Model Details -- Networks • No one is in waiting – Maximum of the output from discard network • Someone may in waiting – Minimum of lose point expert (LPE) – LPEi = WR * (DRi *) WTRi * LPi , where i is tile ID which in hands. In order to increse aggressive move, the output from discard network will be multiplied to LPE. • The threshold to turn mode – Collecting the data of games states when there is player in waiting. – Using the waiting-or-not network to calculate the probability for these games states. – Calculating the average of outputs which is 0.245. 14
  • 15. Model Details -- Feature Engineering • Matrix with strong connection between each adjacent nodes in matrix performs better for convolutional neural network (CNN). • Modeling each non-repeating tile into a vector space. • Turning the vector space into 6 * 6 matrix base. 15
  • 16. Features in Feature Map Feature map hands, 4 layers river, 4 layers turn's movement, 24 * 4 layers dora tiles, 1 layer invisible tiles, 1 layer close hand, 1 layer (discard tile, 1 layer) 16 107 layers feature map will not include the discard tile feature.
  • 17. Networks Details 17 Network Content Output Data amount WR waiting-or-not network predict the probability that other players are waiting a probability about whether other players is in waiting or not (From 0 to 1) 300,000 WTR waiting-tiles network predict the probabilities of tiles that others may wait for a list of 34 probabilities about how dangerous 34 tiles 4,000*34 DR discard network predict which tile in hand will be discarded if player is a mahjong high level player a list of probability which are 34 tiles' probabilities of being discarded 100,000*34 Training data: Waiting-or-not network: Input: 107 layer feature map Output: 1: someone is in waiting 0: no one is in waiting Waiting-tiles network: Input: 107 layer feature map Output: 1: tiles being waited 0: other tiles In waiting Wait for 1s and 4s
  • 18. Networks Details 18 Network Content Output Data amount LP lose-point network predict how many point will lost if discard one tile a list that consists of 6 probabilities about how many han in other hand if he wins this round 16,500*6 Training data: Lose-point network: • Input: 108 layer feature map • Output: the lost for this discarded tile
  • 19. Networks Details 19 Number of convolutional kernels Size of convolutional kernels Edge processing padding Activation function 512 4*4 same relu 512 3*3 same none 512 2*2 same relu Dropout 256 2*2 same none 256 3*3 same relu Dropout 128 3*3 same none 128 2*2 same relu Dropout Full connected 6*6*107(108) feature map as input layer Hidden layer (Totally 7 layers) ... ... Output layer and full connected layer
  • 20. Final Accuracy of Each Network 20 Network Accuracy Waiting-or-not network 82.7% Waiting-tiles network 40.2% Lose-point network 88.7% Discard network 88.4% The Waiting-tiles network has the accuracy only 40.2% is that the result only calculate whether the maximum of output is being waited.
  • 21. Experiment and Result • Comparison of three models in our experiment 21 Model Game state Attack route Defense route Best choice algorithm (BCA) Make a call of riichi or open hands with over three melds Choose the tile which can make hands closer to winning hands Choose the tile which the in- waiting player has discarded Combine BCA's attack mode with deep model for defense Make a prediction that someone may in waiting Choose the tile which can make hands closer to winning hands Choose the tile which will lead to the least loss Deep model Make a prediction that someone may in waiting Imitate expert players discard base on current game state Choose the tile which will lead to the least loss
  • 22. Experiment and Result • We perform 60 games for each model on “Ippan” table, which every player can participate in. 22 Ippan table (avg. lv. 1.5) Top 2nd 3rd 4th Win rate Feed rate Aggressive move BCA 27% 30% 25% 18% 24% 11% 14% BCA + defense model 17% 28% 45% 10% 18% 8% 0% Deep model 22% 27% 33% 18% 20% 9% 8% Players' average (Tenhou) 20% 23% 27% 30% 20% 19% - Geen: Worst performance Red: Best performance
  • 23. Experiment and Result • We perform 100 games for each model on “Joukyuu” table. 23 Joukyuu table (avg. lv. 11.75) Top 2nd 3rd 4th Win rate Feed rate Aggressive move BCA 19% 23% 30% 28% 16% 18% 12% BCA with deep model 22% 28% 33% 17% 17% 8% 1% Deep model 24% 29% 27% 20% 21% 11% 7% Players' average (Tenhou) 25% 25% 25% 25% 23% 15% 17% Geen: Worst performance Red: Best performance
  • 24. Competition Between Each Model 24 1st/2nd/3rd/4th 1 BCA 1 BCA + defense model 1 Deep model 3 BCA - 2/6/9/3 3/7/6/4 3 BCA + defense model 6/4/5/5 - 5/6/5/4 3 Deep model 4/5/5/6 4/7/7/2 - The result table shows that: • BCA • Good in attack • Easy to be defended • BCA + defense mode • Great in defense • Less aggressive move • Deep model • Good in defense • Balance in defensive and offensive We performed 20 game for each model with a 1 vs 3 games.
  • 25. Comparison Between Discard Method • Two discard methods show different performance during expriment. • Make a comparison for these two methods. • It’s easier to be speculate the non-deep learning AI’s state and what tiles it’s waiting for. • Deep model performs more like a human player than non-deep learning AI in attack which we can get from the top rate and win rate. 25 Discard method Waiting Waiting rate Waiting prediction Waiting tiles prediction BCA 438 53.94% 91.32% 57.53% Discard model 411 49.58% 83.43% 39.90%
  • 26. Conclusion • The deep model in this study shows a good performance during Mahjong games. – High 2nd rate. – Aggressive move. • New feature engineering performs good. • Performance when model predicts that someone is in waiting are better than human player’s average. • It’s possible to make a better multi-network model based on this experiment. Thank you for listening. 26
  • 27. Research performance ・Information Processing Society of Japan 1) Yeqin Zheng, Soichiro Yokoyama, Tomohisa Yamashita, Hidenori Kawamura: Study on Evaluation Function Design of Mahjong using Supervised Learning, Special Internet Groups(Sig), Vol 194, Hokkaido(2019) 27