Malmotutorial

Artificial General Intelligence in Games

Human-level Control
Through Deep
Reinforcement
Learning
V. Mnih et al.
https://storage.googleapis.com/deep
mind-
media/dqn/DQNNaturePaper.pdf

General Video Game
AI: a Multi-Track
Framework for
Evaluating Agents,
Games and Content
Generation Algorithms
Diego Perez-Liebana, Jialin Liu,
Ahmed Khalifa, Raluca D.
Gaina, Julian Togelius, Simon
M. Lucas
https://arxiv.org/pdf/1802.10363
http://www.gvgai.net

Project Malmo
aka.ms/Malmo
github.com/Microsoft/malmo
The Malmo Platform for Artificial Intelligence
Experimentation
Matthew Johnson, Katja Hofmann, Tim
Hutton, & David Bignell 2016

Malmo design principles
Beyond “narrow AI” with multi-task learning
Wired for multi-agent tasks (including human agents)

Use Cases and Design Principles
into the game through
an intuitive yet powerful API
– building on existing Minecraft
capabilities
Built for extensions and novel uses – open
source; “plug-and-play” design of
observation, command, reward handlers
Low entry
barrier: provide
cross-language
(currently: Java,
.NET, C/C++,
Python, Lua) &
cross-platform
(Windows, Linux,
MacOS) API

Malmo = Minecraft Mod + API + tools

import
import
MalmoPython
MalmoPython “my_mission.xml"
MalmoPython “save.tgz"
// start a new mission
while
// interpret world state
for in
print "Summed reward:"
for in
print "Observation:"
// act
"move 1"
"turn 0.5"
“jump 1"
Python example

Example: Tabular Q-Learning in Malmo

Example: Deep Q-Learning in Malmo

Low entry barrier, yet powerful
Wired for multi-agent tasks (including human agents)

<ServerHandlers>
<FlatWorldGenerator
generatorString="3;7,220*..."/>
<DrawingDecorator>
[...]
<DrawCuboid x1="-2" y1="45" z1="-2" x2="7"
y2="45" z2="18" type="lava" /> 
<DrawCuboid x1="1" y1="45" z1="1" [...]
type="sandstone" /> 
<DrawBlock x="4" y="45" z="1" type="cobblestone"
/> 
[...]
</ServerHandlers>
<AgentHandlers>
<ObservationFromFullStats/>
<DiscreteMovementCommands>
<ModifierList type="deny-list">
<command>attack</command>
</ModifierList>
</DiscreteMovementCommands>
<RewardForTouchingBlockType>
<Block reward="-100.0" type="lava“
behaviour="onceOnly"/>
<Block reward="100.0" type="lapis_block“
behaviour="onceOnly"/>
</RewardForTouchingBlockType>
<RewardForSendingCommand reward="-1"/>
</AgentHandlers>
Example Task
(Mission XML)

Creating new tasks is easy http://sameersingh.org/courses/ai
proj/sp17/projects.html

Low entry barrier, yet powerful
Beyond “narrow AI” with multi-task learning

A natural environment for multi-agent learning

Goal: foster research in
collaborative AI
Details: https://www.microsoft.com/en-us/research/academic-program/collaborative-ai-challenge

MARLÖ Competition –
The Multi-Agent Reinforcement
Learning in MalmÖ

MARLO: Motivation
• General-reward settings are the most realistic for many real-world
applications but are also notoriously challenging
• More research on insights and approaches that generalize beyond individual
tasks and opponent types.
• The cost of creating tasks and opponents amortizes as both can be shared by
a large community

Overview
• Participants develop agents which play tasks on Malmo platform
• The agents play in multiple games of different scenarios
• Each game has a different set of multi-agent tasks for training, validation and
final test
• Participants use those tasks to train and validate their agents
• The agents play the final test task to determine the winner of MARLO in a
tournament

Evaluation
• Each league (P players in a group) is played across the same N games, with T
repetitions on the private task of each game.
• Each game has its own leaderboard, ranking entries and awarding points: 25
points for the 1st, 18 for the 2nd, 15, 12, 10, 8, 6, 4, 2, 1 and 0 for the 11th
onwards.
• The final ranking for each league is determined by summing points across all
games.

Schedule (draft)
• Same version as
multi-agent tasks
but using bots,
which run locally
• Top 32 evaluated
teams are invited
to the final round
• Multi-agent games
in remote server
for final
tournament
• Live competition!!

Participation: Eligibility
• A team consists of up to five participants
• 18 years of age or older. If any team member is 18 years of age or older, but is
considered a minor in their place of residence, they should ask their parent’s
or legal guardian’s permission prior to submitting an entry into the
Competition
• Award: available only for participants affiliated with a University or a non-
profit research organization

What you get from the competition
• Award
• 1st place: 10,000 USD-equivalent Azure plus a travel grant to join a relevant academic
conference or workshop.
• 2nd place: 5,000 USD-equivalent Azure.
• 3rd place: 3,000 USD-equivalent Azure.
• Publication
• The top three entries will be invited as co-authors in a paper summarizing the
competition structure, rules, approaches, results and main take-aways.

Mob Chase
1 point
0.2 points
-0.02 points

Mob Chase
__________
_wwwwwwwww
_w*.....=w
ww......ww
w=...*..w_
ww......w_
_w.*..*.ww
_w......=w
_ww=wwwwww
__www_____
_www______
_w=wwwwww_
_w..*.*.w_
_w*.....w_
_w......w_
_w.*.*..w_
_w......w_
_w......w_
_w==wwwww_
_wwww_____
________www_
wwwwwwwww=w_
w=......*.w_
ww........ww
_w....*.*.=w
ww........ww
w=.....*..w_
ww*.*.....w_
_w........w_
_w........w_
_wwwwwwwwww_
____________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
_wwwwwwww_
_w______w_
_w______w_
_w______w_
_w______w_
_w______w_
_w______w_
_wwwwwwww_
__________
__________
_wwwwwwww_
_w......w_
_w......w_
_w......w_
_w......w_
_w......w_
_w......w_
_wwwwwwww_
__________
__________
_wwwwwwwww
_w......=w
ww......ww
w=......w_
ww......w_
_w......ww
_w......=w
_ww=wwwwww
__www_____
__________
_wwwwwwww_
_w......=_
_w......w_
_=......w_
_w......w_
_w......w_
_w......=_
_ww=wwwww_
__________
__________
_wwwwwwwww
_w*.....=w
ww......ww
w=...*..w_
ww......w_
_w.*..*.ww
_w......=w
_ww=wwwwww
__www_____

Build Battle
1 point
• +.2 points
• -.2 points
-0.02 points

Treasure Hunt
0.5 points
• 0.25 points
• -1 points
-0.02 points

Treasure Hunt
wwwwwwwwwwwwwwwwwwww
w...e+.............w
w..................w
w.gggggggggggggggggw
w...............A..w
w.................Aw
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w......=...........w
wwwwwwwwwwwwwwwwwww
wg...............=w
wg................w
wg................w
wg...A............w
w.................w
wg........*......Aw
wg................w
wg................w
wg............+...w
wg......e.........w
wg..............*.w
wg................w
wg.e..............w
wg............*...w
wg................w
wggggggggg.gggggggw
wg................w
wg................w
wwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwwww
w........g..........w
we.+.....g..........w
w........g......+...w
w*....e..g..........w
w........g..........w
w........g..........w
w........g..........w
wA.......g..........w
w...A...............w
w........g.........=w
w........g..........w
w........g...+......w
w........g..........w
w........g..........w
w..*.....g..........w
wwwwwwwwwwwwwwwwwwwww
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w....+.............w
w..................w
w..................w
w..................w
w..................w
w.............+..+.w
w.......+..........w
w..................w
w..................w
w..................w
w..................w
w..................w
wggggggggggggggggggw
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w...e+.............w
w..................w
w..................w
w..................w
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w..................w
w...e+.............w
w..................w
w..................w
w..................w
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w......=...........w
w...e+.............w
w..................w
w...............A..w
w.................Aw
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w......=...........w

Qualifying Task
• MarLo-FindTheGoal-v0
• 7x7 room
• Goal: find the goal ☺
(yellow block)
• Rewards:
• -0.01 per command
• 100 commands max
• 1.0 find goal
• -0.1 out of time

Participating in the Competition

Project MARLÖ
• Multi-Agent Reinforcement Learning
in Malmo
• Reinforcement learning wrapper
build on top of project Malmo
• Proposes to inspire the creation of
extremely potent general agents
through a multi-agent, multi-game
environment
• Uses OpenAI GYM format
• Also on GitHub!
• https://github.com/crowdAI/marLo

• Install Malmo
• Anaconda (recommended)
• Pip (+ git)
• Repack
• Manual compilation
• Install Marlo
Installation instructions

Agents: a semi-technical view
• Agents in Marlo are simple and work in a very Gym-like format:
• Start up a Minecraft client on port 10000
• Use “marlo.make()” function to make an environment. This returns a user token
• Use the user token to generate an image of the environment for agent use with
“marlo.init()”
• Run an agent to play the game
• We have seen a sample random agent that plays any game it connects to
• We also provide examples of more complex agents:
• ChainerRL agents (DQN, PPO)
• TensorBoard-Chainer plotting compatible
• Other environments (TensorFlow, KerasRL, PyBrain) are possible – the only
requirement is that they comply with the Gym API

Experiments
• A simple script which trains an agent over a set number of steps and
episodes is provided within the Marlo package
• The underlying functionality is simple: at the beginning of training, reset
the environment:

Experiments
• Main loop with stopping condition:
• Episode ends or maximum number of steps reached

Experiments
• Log results of the episode
• We incorporate an example to plot using Tensorboard-Chainer

Plotting results (Tensorboard-Chainer)
• Works much like your typical
Tensorboard, only it’s
abstracted to work with
Chainer
• Can be used to gather
images, text, audio and
histograms

Submission
1. Create a private repository on gitlab.crowdai.org. It must contain:
• Dockerfile that installs dependencies and sets up everything
• crowdai.json file with this mandatory fields:
• challenge_id - ”marLo"
• grader_id - " marLo"
• author - name of the author (string), for teams, pleas also create a field 'authors'
containing a list with all authors

Submission
2. Submitting to crowdAI:
• Create and push a new tag
• Each tag counts as a new submission:
• You will be able to see your AI agent actually play the game and see more
details about your submission evaluation of your submission on:
https://gitlab.crowdai.org/<your-crowdAI-user-name>/marLo/issues
• A video of the game will also be generated and available from the leaderboard

• Follow
Malmo: @Project_Malmo and website (aka.ms/malmo)
People on Twitter: @diego_pliebana, @katjahofmann, @MeMohanty
• MARLO Github:
https://github.com/crowdAI/marLo
• MARLO Documentation:
https://marlo.readthedocs.io/en/latest/
• Competition website
https://www.crowdai.org/challenges/marlo-2018
• AIIDE 2018 Workshop
https://marlo-ai.github.io/
Follow the project

Hands-On Time
1. Install Malmo and Marlo
2. Play the games
3. Execute agents
Doc: https://marlo.readthedocs.io/en/latest/
Code: https://github.com/crowdAI/marLo/
Competition: https://www.crowdai.org/challenges/marlo-2018
We’re here to help!

Malmotutorial

More Related Content

Similar to Malmotutorial (20)

More from Hirono Jumpei (20)

Recently uploaded (20)

Malmotutorial