SlideShare a Scribd company logo
Parametric Action Pre-Selection for MCTS in
Real-Time Strategy Games
Abdessamed Ouessai, Mohammed Salem, and Antonio M. Mora
University of Mascara,
Algeria
University of Granada,
Spain
VI CoSECiVi-2020
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Introduction
→ First game AI research domain: Classic board games
→ Evolution of board games is constrained by physics
→ Video games represent an unconstrained medium
→ Real-Time Strategy sub-genre concretized abstract board games (Warfare)
→ RTS Games are an evolution of abstract board games
→ ++ Concrete | ++ Challenging for humans | ++ Complex for AI
1
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
RTS Games & AI
→ Multiplayer, zero-sum, non-deterministic game with imperfect information.
→ Top-down perspective. Recognizable mouse and keyboard-based UI.
General Strategy
Gather Build & Train Confront
Destruction of Opponent’s Forces
Units Structures Resources
Victory
Condition
2
RTS Games & AI
→ What does an RTS game-playing AI have to deal with?
3
Short decision cycles (~50/s) Simultaneous moves for different units
Durative actions (> one decision cycle)
Non-determinismPartial observability (opponent & environment)
Exponential growth of the decision/state spaces
Chess Go StarCraft
Branching Factor 36 180 1050
State Space 1047 10171 101685
Real-Time Aspect
Uncertainty
Complexity Large topographic environments
Approximate
Estimates
RTS Games & AI
→ Notable developments:
→ Scripts: Portfolio Greedy Search (Churchill et al, 2013), Puppet Search (Barriga et al, 2015)
→ Learning: Bayesian Models (Synnaeve et al, 2011), AlphaStar (Vinyals et al, 2019)
→ Planning: NaïveMCTS (Ontañón, 2013), AHTN (Ontañón and Buro, 2015), CCG (Kantharaju et al, 2018)
→ Evaluation: CNN (Stanescu et al, 2016), (Barriga et al, 2019)
→ Competitions:
→ IEEE CoG (StarCraft & µRTS), AAAI AIIDE (StarCraft), SSCAIT
→ RTS AI Testbeds:
→ ORTS – Wargus – BWAPI(SC) – SparCraft – SC2LE – ELF – DeepRTS - µRTS.
4
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Monte Carlo Tree Search
→ An iterative, anytime, sampling-based search framework
→ Main components:
→ Tree Policy
→ Default Policy
→ Popular variant:
→ UCT (UCB1 as Tree Policy)
→ Popular application:
→ Go (AlphaGo)
→ Downside:
→ Scalability issues
5
Tree Policy
Reward
Default Policy
(4) Backpropagation(3) Simulation(2) Expansion(1) Selection
Monte Carlo Tree Search
→ Proposed solutions to enhance MCTS scalability:
6
CMAB
Abstraction
→ Selection phase framed as a Combinatorial Multi-Armed Bandit problem
→ NaïveMCTS is based on a CMAB formulation and a naïve assumption
𝑎1 𝑎2 𝑎3 … 𝑎 𝑛
𝑣1 𝑣2 𝑣3 … 𝑣 𝑛
𝑢1 𝑢2 𝑢3 … 𝑢 𝑛Units
Player Action
(𝛼 𝑡)
Values
𝑣𝑖 =
𝑛
𝑖=1
𝑉(𝛼 𝑡)
(The naïve assumption)
→ Search the decision space induced by expert-authored scripts instead of the original
decision space
→ Downsides: (1) Sacrifices tactical performance. (2) Performance depends on scripts
→ Successfully adapts MCTS to combinatorial decision spaces (ex. RTS Games)
→ Downside: The algorithm is still affected by the dimensionality of the decision space.
Monte Carlo Tree Search
→ Our proposition:
→ A multi-stage parametric action pre-selection scheme to control the decision space
and its granularity
→ Combine abstraction with CMAB (NaïveMCTS) using small-scale parametric scripts
(heuristics)
→ Define a strategy as a collection of heuristics and parameters
7
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Parametric Action Pre-Selection
→ Expert-authored scripts usually encode a deterministic strategy using a limited portion of
the decision space
→ How to generate novel strategies that can better exploit the available actions?
→ How to preserve low-level tactical performance?
→ A strategy is a combination of heuristics
8
Direct offense
heuristic
Harvest heuristicTrain heuristic
Worker Rush
Strategy
→ Heuristic: A parametric single-goal procedure for
controlling a sub-group of units
→ Single unit:
ℎ ∈ H ∶ 𝑆 × 𝑈 × 𝐴𝑙
× 𝑅ℎ → 𝐴 𝑘
𝑘 ≤ 𝑙
→ 𝑆 : States, 𝑈 : Units, 𝐴 : Unit-Actions, 𝑅ℎ : Parameters
→ Group of units: applied to each member
→ In expert-authored scripts, 𝑘 = 1 and 𝑅ℎ = 1
Parametric Action Pre-Selection
→ Action Pre-Selection: Downsizing the decision space by selecting a subset of actions satisfying a certain
criterion (strategy), prior to planning
→ When 𝑘 > 1 the final decision will be made by a a search approach (ex. MCTS)
→ A unit partitioning 𝑑 ∈ D determines unit groups (manually or automatically)
→ Each unit group is associated with a heuristic. Heuristics’ output defines the search space
9
Planning (MCTS)Pre-Selected ActionsOriginal Actions
Partitioning
Heuristics
Parameters
Action
Pre-Selection
Parametric Action Pre-Selection
→ The general algorithm:
→ Pre-selected actions are refined over successive phases
→ Parametric Action Pre-Selection: 𝑇(𝑠, 𝑈, 𝐴0, 𝑥1, … , 𝑥 𝑛) with 𝑥𝑖(𝐴𝑖−1, 𝑑𝑖, 𝐻𝑖, 𝜃𝑖)
→ A strategy can be expressed as: 𝜎 = (𝑑1, … , 𝑑 𝑛, 𝐻𝑖, … , 𝐻 𝑛, 𝜃1, … , 𝜃 𝑛)
10
A
d1
g1
gm1
H1
h1
hm1
A
Ò1
d2
g1
gm2
H2
h1
h m2
Ò2
A n-110
dn
g1
gmn
H n
h1
hmn
Òn
Game State s
Units U
A n
Search
Execution
𝑥1 𝑥2 𝑥 𝑛
𝑇
Parametric Action Pre-Selection
→ Proposed implementation: ParaMCTS
→ A 2-phase action pre-selection process using NaïveMCTS for search
→ Inspired by the macro- and micro-management task decomposition
→ 47 parameter govern the behaviour of ParaMCTS, tuned manually
→ NaïveMCTS enhancement: Inactive player-action pruning (previous study)
11
Groups Heuristics Parameters
Harvesters <Harvest> maxU, buildMode, pf,
…
Offense <Attack> maxU, offMode,
maxTargets, pf, …
Defense <Defend> maxU, defMode,
defPerimeter, pf, …
Structures <Train> maxU, trainMode, …
Groups Heuristics Parameters
Front-Line <Front-Line Tactics> maxU, waitDuration,
…
Back <Back Tactics> waitDuration, …
Phase-1 (𝑥1) Phase-2 (𝑥2)
NaïveMCTS
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Experiments & Results
→ How can MCTS benefit from the downsized decision space?
→ Should we increasing the playout duration, the maximum search depth, or both? By how much?
→ How does the performance of ParaMCTS compare to state-of-the-art agents?
→ Experiments setting:
→ Computation budget: 100𝑚𝑠 per game cycle, Maps: basesWorkers 8 × 8, 16 × 16, 32 × 32
→ Tested maximum search depths: {10, 15, 20, 30, 50}. Tested playout durations: {100, 150, 200, 300, 500}
12
→ A lightweight, AI research-focused RTS simulator
→ Open source, written in Java by Santiago Ontañón
→ Includes a forward model and many baseline agents
→ Subject of a yearly AI competition as part of IEEE CoG
Testbed: µRTS (or microRTS)
Experiments & Results
→ Experiments 1: Two 120 iteration round-robin tournaments
1) Between ParaMCTS variants with a fixed playout duration (100 cycles) and different max search depths
2) Between ParaMCTS variants with a fixed max search depth (10) and different playout duration
→ Total matches: 4800 in each map. Score = Wins + Draws / 2, normalized.
→ Results:
13
Experiments & Results
→ Experiment 2: Maximum search depth and playout duration combinations
→ 100 match between each ParaMCTS(search depth, playout duration) variant and MixedBot
→ Sides switched after 50 matches. ParaMCTS implements a similar strategy to MixedBot
→ Total matches: 2500 in each map
→ Results:
14
Experiments & Results
→ Experiment 3: Vs. state-of-the-art.
→ 100 iteration round-robin tournament
→ Participants:
→ ParaMCTS
→ MixedBot
→ Izanagi
→ Droplet
→ NaïveMCTS*
→ NaïveMCTS
→ Total Matches: 3000 in each map
→ 11.9 to 19.1 overall margin
15
Top ranking agents from
2019’s µRTS competition
Same hyperparameters as
ParaMCTS
Using best hyperparameters
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Conclusion & Future Work
→ Parametric action pre-selection describes a general action/state abstraction framework,
applicable to any game with similar characteristics to RTS games
→ Using heuristics instead of scripts grants greater flexibility
→ A proposed implementation, ParaMCTS, significantly outperformed state-of-the-art
agents, using manually tuned parameters
→ Recovered computation budget is better used for deeper search
16
Future Work
→ ParaMCTS parameter optimization for different objectives (maps, opponents, …)
→ Dynamic parameter adaptation through RL
→ Heuristic/partitioning discovery
→ Difficulty adjustment given adequate heuristics and parameters
Thank You
abdessamed.ouessai@univ-mascara.dz
salem@univ-mascara.dz
amorag@ugr.es

More Related Content

PDF
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
PDF
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
PDF
Monte Carlo Tree Search for the Super Mario Bros
PPTX
Deep Learning for AI (3)
PPTX
Deep Learning for AI (2)
PDF
[한국어] Multiagent Bidirectional- Coordinated Nets for Learning to Play StarCra...
PPT
AI Lecture 5 (game playing)
PPTX
Ultrasound nerve segmentation, kaggle review
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
Monte Carlo Tree Search for the Super Mario Bros
Deep Learning for AI (3)
Deep Learning for AI (2)
[한국어] Multiagent Bidirectional- Coordinated Nets for Learning to Play StarCra...
AI Lecture 5 (game playing)
Ultrasound nerve segmentation, kaggle review

Similar to CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games (20)

PDF
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
PPTX
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
PDF
Dissertation defense
PDF
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
PPT
Gameplaying in artificial intelligence
PPT
Game Playing in Artificial Intelligence
PDF
GAME THEORY AND MONTE CARLO SEARCH SPACE TREE
PDF
This was a triumph: Evolving intelligent bots for videogames. And for Science.
PDF
Testing hybrid computational intelligence algorithms for general game playing...
PPTX
1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...
PPTX
Introduction To My Graduation Project
PPTX
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
PDF
Evolutionary Multi-Agent Systems for RTS Games
PPTX
Introduction To My Graduation Project
PPTX
How Healthy is Your Metagame? Testing Metagame Bounds on Products With Comple...
PPTX
Challenges for implementing Monte Carlo Tree Search in commercial games
PPTX
Resource Management in complex environments: an application to Real Time Stra...
PDF
Game Analytics & Machine Learning
PPTX
Mcts ai
PPTX
2017 Fighting Game AI Competition
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
Dissertation defense
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Gameplaying in artificial intelligence
Game Playing in Artificial Intelligence
GAME THEORY AND MONTE CARLO SEARCH SPACE TREE
This was a triumph: Evolving intelligent bots for videogames. And for Science.
Testing hybrid computational intelligence algorithms for general game playing...
1st Seminar- Intelligent Agent for Medium-Level Artificial Intelligence in Re...
Introduction To My Graduation Project
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
Evolutionary Multi-Agent Systems for RTS Games
Introduction To My Graduation Project
How Healthy is Your Metagame? Testing Metagame Bounds on Products With Comple...
Challenges for implementing Monte Carlo Tree Search in commercial games
Resource Management in complex environments: an application to Real Time Stra...
Game Analytics & Machine Learning
Mcts ai
2017 Fighting Game AI Competition
Ad

More from Sociedad Española para las Ciencias del Videojuego (20)

PDF
CoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en Perfiles
PDF
CoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivo
PDF
CoSECiVi 2020 - Games studies in architectural education: An experimental gra...
PDF
CoSECiVi 2020 - Multiresolution Foliage Rendering
PDF
CoSECiVi 2020 - Development of a User-Friendly Application for Creating Tacti...
PDF
CoSECiVi 2020 - Entornos parcialmente no euclidianos en realidad virtual
PDF
CoSECiVi 2020 - An Exploration on Automating Player Personality Identificatio...
PDF
CoSECiVi 2020 - Data mining of deck archetypes in Hearthstone
PDF
CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...
PDF
CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...
PDF
CoSECiVi'16 - Hacia la generación automática de mecánicas de juego: un edito...
PDF
CoSECiVi'16 - Computación Efímera: identificando retos para la investigación e...
PDF
CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...
PDF
CoSECiVi'16 - Extensión de los grafos de dependencia para incrementar la reju...
PDF
CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...
PDF
CoSECiVi'16 - Living-UGR: Una aventura gráfica geolocalizada para difundir el...
PDF
CoSECiVi'16 - Desarrollo de una plataforma basada en Unity3D para la aplicaci...
PDF
CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...
PDF
CoSECiVi'15 - Predicting the winner in two player StarCraft games
PDF
CoSECiVi'15 - Automatic gameplay testing for message passing architectures
CoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en Perfiles
CoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivo
CoSECiVi 2020 - Games studies in architectural education: An experimental gra...
CoSECiVi 2020 - Multiresolution Foliage Rendering
CoSECiVi 2020 - Development of a User-Friendly Application for Creating Tacti...
CoSECiVi 2020 - Entornos parcialmente no euclidianos en realidad virtual
CoSECiVi 2020 - An Exploration on Automating Player Personality Identificatio...
CoSECiVi 2020 - Data mining of deck archetypes in Hearthstone
CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...
CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...
CoSECiVi'16 - Hacia la generación automática de mecánicas de juego: un edito...
CoSECiVi'16 - Computación Efímera: identificando retos para la investigación e...
CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...
CoSECiVi'16 - Extensión de los grafos de dependencia para incrementar la reju...
CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...
CoSECiVi'16 - Living-UGR: Una aventura gráfica geolocalizada para difundir el...
CoSECiVi'16 - Desarrollo de una plataforma basada en Unity3D para la aplicaci...
CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...
CoSECiVi'15 - Predicting the winner in two player StarCraft games
CoSECiVi'15 - Automatic gameplay testing for message passing architectures
Ad

Recently uploaded (20)

PPTX
Application of enzymes in medicine (2).pptx
PPTX
Fluid dynamics vivavoce presentation of prakash
PPTX
Microbes in human welfare class 12 .pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Pharmacology of Autonomic nervous system
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Overview of calcium in human muscles.pptx
PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PPTX
BIOMOLECULES PPT........................
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
An interstellar mission to test astrophysical black holes
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PPTX
The Minerals for Earth and Life Science SHS.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Application of enzymes in medicine (2).pptx
Fluid dynamics vivavoce presentation of prakash
Microbes in human welfare class 12 .pptx
Placing the Near-Earth Object Impact Probability in Context
. Radiology Case Scenariosssssssssssssss
Pharmacology of Autonomic nervous system
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Hypertension_Training_materials_English_2024[1] (1).pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Overview of calcium in human muscles.pptx
CORDINATION COMPOUND AND ITS APPLICATIONS
BIOMOLECULES PPT........................
Phytochemical Investigation of Miliusa longipes.pdf
An interstellar mission to test astrophysical black holes
TOTAL hIP ARTHROPLASTY Presentation.pptx
BODY FLUIDS AND CIRCULATION class 11 .pptx
The Minerals for Earth and Life Science SHS.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...

CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games

  • 1. Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games Abdessamed Ouessai, Mohammed Salem, and Antonio M. Mora University of Mascara, Algeria University of Granada, Spain VI CoSECiVi-2020
  • 2. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 3. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 4. Introduction → First game AI research domain: Classic board games → Evolution of board games is constrained by physics → Video games represent an unconstrained medium → Real-Time Strategy sub-genre concretized abstract board games (Warfare) → RTS Games are an evolution of abstract board games → ++ Concrete | ++ Challenging for humans | ++ Complex for AI 1
  • 5. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 6. RTS Games & AI → Multiplayer, zero-sum, non-deterministic game with imperfect information. → Top-down perspective. Recognizable mouse and keyboard-based UI. General Strategy Gather Build & Train Confront Destruction of Opponent’s Forces Units Structures Resources Victory Condition 2
  • 7. RTS Games & AI → What does an RTS game-playing AI have to deal with? 3 Short decision cycles (~50/s) Simultaneous moves for different units Durative actions (> one decision cycle) Non-determinismPartial observability (opponent & environment) Exponential growth of the decision/state spaces Chess Go StarCraft Branching Factor 36 180 1050 State Space 1047 10171 101685 Real-Time Aspect Uncertainty Complexity Large topographic environments Approximate Estimates
  • 8. RTS Games & AI → Notable developments: → Scripts: Portfolio Greedy Search (Churchill et al, 2013), Puppet Search (Barriga et al, 2015) → Learning: Bayesian Models (Synnaeve et al, 2011), AlphaStar (Vinyals et al, 2019) → Planning: NaïveMCTS (Ontañón, 2013), AHTN (Ontañón and Buro, 2015), CCG (Kantharaju et al, 2018) → Evaluation: CNN (Stanescu et al, 2016), (Barriga et al, 2019) → Competitions: → IEEE CoG (StarCraft & µRTS), AAAI AIIDE (StarCraft), SSCAIT → RTS AI Testbeds: → ORTS – Wargus – BWAPI(SC) – SparCraft – SC2LE – ELF – DeepRTS - µRTS. 4
  • 9. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 10. Monte Carlo Tree Search → An iterative, anytime, sampling-based search framework → Main components: → Tree Policy → Default Policy → Popular variant: → UCT (UCB1 as Tree Policy) → Popular application: → Go (AlphaGo) → Downside: → Scalability issues 5 Tree Policy Reward Default Policy (4) Backpropagation(3) Simulation(2) Expansion(1) Selection
  • 11. Monte Carlo Tree Search → Proposed solutions to enhance MCTS scalability: 6 CMAB Abstraction → Selection phase framed as a Combinatorial Multi-Armed Bandit problem → NaïveMCTS is based on a CMAB formulation and a naïve assumption 𝑎1 𝑎2 𝑎3 … 𝑎 𝑛 𝑣1 𝑣2 𝑣3 … 𝑣 𝑛 𝑢1 𝑢2 𝑢3 … 𝑢 𝑛Units Player Action (𝛼 𝑡) Values 𝑣𝑖 = 𝑛 𝑖=1 𝑉(𝛼 𝑡) (The naïve assumption) → Search the decision space induced by expert-authored scripts instead of the original decision space → Downsides: (1) Sacrifices tactical performance. (2) Performance depends on scripts → Successfully adapts MCTS to combinatorial decision spaces (ex. RTS Games) → Downside: The algorithm is still affected by the dimensionality of the decision space.
  • 12. Monte Carlo Tree Search → Our proposition: → A multi-stage parametric action pre-selection scheme to control the decision space and its granularity → Combine abstraction with CMAB (NaïveMCTS) using small-scale parametric scripts (heuristics) → Define a strategy as a collection of heuristics and parameters 7
  • 13. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 14. Parametric Action Pre-Selection → Expert-authored scripts usually encode a deterministic strategy using a limited portion of the decision space → How to generate novel strategies that can better exploit the available actions? → How to preserve low-level tactical performance? → A strategy is a combination of heuristics 8 Direct offense heuristic Harvest heuristicTrain heuristic Worker Rush Strategy → Heuristic: A parametric single-goal procedure for controlling a sub-group of units → Single unit: ℎ ∈ H ∶ 𝑆 × 𝑈 × 𝐴𝑙 × 𝑅ℎ → 𝐴 𝑘 𝑘 ≤ 𝑙 → 𝑆 : States, 𝑈 : Units, 𝐴 : Unit-Actions, 𝑅ℎ : Parameters → Group of units: applied to each member → In expert-authored scripts, 𝑘 = 1 and 𝑅ℎ = 1
  • 15. Parametric Action Pre-Selection → Action Pre-Selection: Downsizing the decision space by selecting a subset of actions satisfying a certain criterion (strategy), prior to planning → When 𝑘 > 1 the final decision will be made by a a search approach (ex. MCTS) → A unit partitioning 𝑑 ∈ D determines unit groups (manually or automatically) → Each unit group is associated with a heuristic. Heuristics’ output defines the search space 9 Planning (MCTS)Pre-Selected ActionsOriginal Actions Partitioning Heuristics Parameters Action Pre-Selection
  • 16. Parametric Action Pre-Selection → The general algorithm: → Pre-selected actions are refined over successive phases → Parametric Action Pre-Selection: 𝑇(𝑠, 𝑈, 𝐴0, 𝑥1, … , 𝑥 𝑛) with 𝑥𝑖(𝐴𝑖−1, 𝑑𝑖, 𝐻𝑖, 𝜃𝑖) → A strategy can be expressed as: 𝜎 = (𝑑1, … , 𝑑 𝑛, 𝐻𝑖, … , 𝐻 𝑛, 𝜃1, … , 𝜃 𝑛) 10 A d1 g1 gm1 H1 h1 hm1 A Ò1 d2 g1 gm2 H2 h1 h m2 Ò2 A n-110 dn g1 gmn H n h1 hmn Òn Game State s Units U A n Search Execution 𝑥1 𝑥2 𝑥 𝑛 𝑇
  • 17. Parametric Action Pre-Selection → Proposed implementation: ParaMCTS → A 2-phase action pre-selection process using NaïveMCTS for search → Inspired by the macro- and micro-management task decomposition → 47 parameter govern the behaviour of ParaMCTS, tuned manually → NaïveMCTS enhancement: Inactive player-action pruning (previous study) 11 Groups Heuristics Parameters Harvesters <Harvest> maxU, buildMode, pf, … Offense <Attack> maxU, offMode, maxTargets, pf, … Defense <Defend> maxU, defMode, defPerimeter, pf, … Structures <Train> maxU, trainMode, … Groups Heuristics Parameters Front-Line <Front-Line Tactics> maxU, waitDuration, … Back <Back Tactics> waitDuration, … Phase-1 (𝑥1) Phase-2 (𝑥2) NaïveMCTS
  • 18. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 19. Experiments & Results → How can MCTS benefit from the downsized decision space? → Should we increasing the playout duration, the maximum search depth, or both? By how much? → How does the performance of ParaMCTS compare to state-of-the-art agents? → Experiments setting: → Computation budget: 100𝑚𝑠 per game cycle, Maps: basesWorkers 8 × 8, 16 × 16, 32 × 32 → Tested maximum search depths: {10, 15, 20, 30, 50}. Tested playout durations: {100, 150, 200, 300, 500} 12 → A lightweight, AI research-focused RTS simulator → Open source, written in Java by Santiago Ontañón → Includes a forward model and many baseline agents → Subject of a yearly AI competition as part of IEEE CoG Testbed: µRTS (or microRTS)
  • 20. Experiments & Results → Experiments 1: Two 120 iteration round-robin tournaments 1) Between ParaMCTS variants with a fixed playout duration (100 cycles) and different max search depths 2) Between ParaMCTS variants with a fixed max search depth (10) and different playout duration → Total matches: 4800 in each map. Score = Wins + Draws / 2, normalized. → Results: 13
  • 21. Experiments & Results → Experiment 2: Maximum search depth and playout duration combinations → 100 match between each ParaMCTS(search depth, playout duration) variant and MixedBot → Sides switched after 50 matches. ParaMCTS implements a similar strategy to MixedBot → Total matches: 2500 in each map → Results: 14
  • 22. Experiments & Results → Experiment 3: Vs. state-of-the-art. → 100 iteration round-robin tournament → Participants: → ParaMCTS → MixedBot → Izanagi → Droplet → NaïveMCTS* → NaïveMCTS → Total Matches: 3000 in each map → 11.9 to 19.1 overall margin 15 Top ranking agents from 2019’s µRTS competition Same hyperparameters as ParaMCTS Using best hyperparameters
  • 23. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 24. Conclusion & Future Work → Parametric action pre-selection describes a general action/state abstraction framework, applicable to any game with similar characteristics to RTS games → Using heuristics instead of scripts grants greater flexibility → A proposed implementation, ParaMCTS, significantly outperformed state-of-the-art agents, using manually tuned parameters → Recovered computation budget is better used for deeper search 16 Future Work → ParaMCTS parameter optimization for different objectives (maps, opponents, …) → Dynamic parameter adaptation through RL → Heuristic/partitioning discovery → Difficulty adjustment given adequate heuristics and parameters