CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games

Parametric Action Pre-Selection for MCTS in
Real-Time Strategy Games
Abdessamed Ouessai, Mohammed Salem, and Antonio M. Mora
University of Mascara,
Algeria
University of Granada,
Spain
VI CoSECiVi-2020

→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview

Introduction
→ First game AI research domain: Classic board games
→ Evolution of board games is constrained by physics
→ Video games represent an unconstrained medium
→ Real-Time Strategy sub-genre concretized abstract board games (Warfare)
→ RTS Games are an evolution of abstract board games
→ ++ Concrete | ++ Challenging for humans | ++ Complex for AI
1

RTS Games & AI
→ Multiplayer, zero-sum, non-deterministic game with imperfect information.
→ Top-down perspective. Recognizable mouse and keyboard-based UI.
General Strategy
Gather Build & Train Confront
Destruction of Opponent’s Forces
Units Structures Resources
Victory
Condition
2

RTS Games & AI
→ What does an RTS game-playing AI have to deal with?
3
Short decision cycles (~50/s) Simultaneous moves for different units
Durative actions (> one decision cycle)
Non-determinismPartial observability (opponent & environment)
Exponential growth of the decision/state spaces
Chess Go StarCraft
Branching Factor 36 180 1050
State Space 1047 10171 101685
Real-Time Aspect
Uncertainty
Complexity Large topographic environments
Approximate
Estimates

RTS Games & AI
→ Notable developments:
→ Scripts: Portfolio Greedy Search (Churchill et al, 2013), Puppet Search (Barriga et al, 2015)
→ Learning: Bayesian Models (Synnaeve et al, 2011), AlphaStar (Vinyals et al, 2019)
→ Planning: NaïveMCTS (Ontañón, 2013), AHTN (Ontañón and Buro, 2015), CCG (Kantharaju et al, 2018)
→ Evaluation: CNN (Stanescu et al, 2016), (Barriga et al, 2019)
→ Competitions:
→ IEEE CoG (StarCraft & µRTS), AAAI AIIDE (StarCraft), SSCAIT
→ RTS AI Testbeds:
→ ORTS – Wargus – BWAPI(SC) – SparCraft – SC2LE – ELF – DeepRTS - µRTS.
4

Monte Carlo Tree Search
→ An iterative, anytime, sampling-based search framework
→ Main components:
→ Tree Policy
→ Default Policy
→ Popular variant:
→ UCT (UCB1 as Tree Policy)
→ Popular application:
→ Go (AlphaGo)
→ Downside:
→ Scalability issues
5
Tree Policy
Reward
Default Policy
(4) Backpropagation(3) Simulation(2) Expansion(1) Selection

→ Proposed solutions to enhance MCTS scalability:
6
CMAB
Abstraction
→ Selection phase framed as a Combinatorial Multi-Armed Bandit problem
→ NaïveMCTS is based on a CMAB formulation and a naïve assumption
𝑎1 𝑎2 𝑎3 … 𝑎 𝑛
𝑣1 𝑣2 𝑣3 … 𝑣 𝑛
𝑢1 𝑢2 𝑢3 … 𝑢 𝑛Units
Player Action
(𝛼 𝑡)
Values
𝑣𝑖 =
𝑛
𝑖=1
𝑉(𝛼 𝑡)
(The naïve assumption)
→ Search the decision space induced by expert-authored scripts instead of the original
decision space
→ Downsides: (1) Sacrifices tactical performance. (2) Performance depends on scripts
→ Successfully adapts MCTS to combinatorial decision spaces (ex. RTS Games)
→ Downside: The algorithm is still affected by the dimensionality of the decision space.

→ Our proposition:
→ A multi-stage parametric action pre-selection scheme to control the decision space
and its granularity
→ Combine abstraction with CMAB (NaïveMCTS) using small-scale parametric scripts
(heuristics)
→ Define a strategy as a collection of heuristics and parameters
7

Parametric Action Pre-Selection
→ Expert-authored scripts usually encode a deterministic strategy using a limited portion of
the decision space
→ How to generate novel strategies that can better exploit the available actions?
→ How to preserve low-level tactical performance?
→ A strategy is a combination of heuristics
8
Direct offense
heuristic
Harvest heuristicTrain heuristic
Worker Rush
Strategy
→ Heuristic: A parametric single-goal procedure for
controlling a sub-group of units
→ Single unit:
ℎ ∈ H ∶ 𝑆 × 𝑈 × 𝐴𝑙
× 𝑅ℎ → 𝐴 𝑘
𝑘 ≤ 𝑙
→ 𝑆 : States, 𝑈 : Units, 𝐴 : Unit-Actions, 𝑅ℎ : Parameters
→ Group of units: applied to each member
→ In expert-authored scripts, 𝑘 = 1 and 𝑅ℎ = 1

→ Action Pre-Selection: Downsizing the decision space by selecting a subset of actions satisfying a certain
criterion (strategy), prior to planning
→ When 𝑘 > 1 the final decision will be made by a a search approach (ex. MCTS)
→ A unit partitioning 𝑑 ∈ D determines unit groups (manually or automatically)
→ Each unit group is associated with a heuristic. Heuristics’ output defines the search space
9
Planning (MCTS)Pre-Selected ActionsOriginal Actions
Partitioning
Heuristics
Parameters
Action
Pre-Selection

→ The general algorithm:
→ Pre-selected actions are refined over successive phases
→ Parametric Action Pre-Selection: 𝑇(𝑠, 𝑈, 𝐴0, 𝑥1, … , 𝑥 𝑛) with 𝑥𝑖(𝐴𝑖−1, 𝑑𝑖, 𝐻𝑖, 𝜃𝑖)
→ A strategy can be expressed as: 𝜎 = (𝑑1, … , 𝑑 𝑛, 𝐻𝑖, … , 𝐻 𝑛, 𝜃1, … , 𝜃 𝑛)
10
A
d1
g1
gm1
H1
h1
hm1
A
Ò1
d2
g1
gm2
H2
h1
h m2
Ò2
A n-110
dn
g1
gmn
H n
h1
hmn
Òn
Game State s
Units U
A n
Search
Execution
𝑥1 𝑥2 𝑥 𝑛
𝑇

→ Proposed implementation: ParaMCTS
→ A 2-phase action pre-selection process using NaïveMCTS for search
→ Inspired by the macro- and micro-management task decomposition
→ 47 parameter govern the behaviour of ParaMCTS, tuned manually
→ NaïveMCTS enhancement: Inactive player-action pruning (previous study)
11
Groups Heuristics Parameters
Harvesters <Harvest> maxU, buildMode, pf,
…
Offense <Attack> maxU, offMode,
maxTargets, pf, …
Defense <Defend> maxU, defMode,
defPerimeter, pf, …
Structures <Train> maxU, trainMode, …
Groups Heuristics Parameters
Front-Line <Front-Line Tactics> maxU, waitDuration,
…
Back <Back Tactics> waitDuration, …
Phase-1 (𝑥1) Phase-2 (𝑥2)
NaïveMCTS

Experiments & Results
→ How can MCTS benefit from the downsized decision space?
→ Should we increasing the playout duration, the maximum search depth, or both? By how much?
→ How does the performance of ParaMCTS compare to state-of-the-art agents?
→ Experiments setting:
→ Computation budget: 100𝑚𝑠 per game cycle, Maps: basesWorkers 8 × 8, 16 × 16, 32 × 32
→ Tested maximum search depths: {10, 15, 20, 30, 50}. Tested playout durations: {100, 150, 200, 300, 500}
12
→ A lightweight, AI research-focused RTS simulator
→ Open source, written in Java by Santiago Ontañón
→ Includes a forward model and many baseline agents
→ Subject of a yearly AI competition as part of IEEE CoG
Testbed: µRTS (or microRTS)

→ Experiments 1: Two 120 iteration round-robin tournaments
1) Between ParaMCTS variants with a fixed playout duration (100 cycles) and different max search depths
2) Between ParaMCTS variants with a fixed max search depth (10) and different playout duration
→ Total matches: 4800 in each map. Score = Wins + Draws / 2, normalized.
→ Results:
13

→ Experiment 2: Maximum search depth and playout duration combinations
→ 100 match between each ParaMCTS(search depth, playout duration) variant and MixedBot
→ Sides switched after 50 matches. ParaMCTS implements a similar strategy to MixedBot
→ Total matches: 2500 in each map
→ Results:
14

→ Experiment 3: Vs. state-of-the-art.
→ 100 iteration round-robin tournament
→ Participants:
→ ParaMCTS
→ MixedBot
→ Izanagi
→ Droplet
→ NaïveMCTS*
→ NaïveMCTS
→ Total Matches: 3000 in each map
→ 11.9 to 19.1 overall margin
15
Top ranking agents from
2019’s µRTS competition
Same hyperparameters as
ParaMCTS
Using best hyperparameters

Conclusion & Future Work
→ Parametric action pre-selection describes a general action/state abstraction framework,
applicable to any game with similar characteristics to RTS games
→ Using heuristics instead of scripts grants greater flexibility
→ A proposed implementation, ParaMCTS, significantly outperformed state-of-the-art
agents, using manually tuned parameters
→ Recovered computation budget is better used for deeper search
16
Future Work
→ ParaMCTS parameter optimization for different objectives (maps, opponents, …)
→ Dynamic parameter adaptation through RL
→ Heuristic/partitioning discovery
→ Difficulty adjustment given adequate heuristics and parameters

Thank You
abdessamed.ouessai@univ-mascara.dz
salem@univ-mascara.dz
amorag@ugr.es

CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games

More Related Content

Similar to CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games (20)

More from Sociedad Española para las Ciencias del Videojuego (20)

Recently uploaded (20)

CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games