SlideShare a Scribd company logo
Malmotutorial
This tutorial
General Game AI Research
Game AI Competitions
Artificial General Intelligence in Games
Human-level Control
Through Deep
Reinforcement
Learning
V. Mnih et al.
https://storage.googleapis.com/deep
mind-
media/dqn/DQNNaturePaper.pdf
General Video Game
AI: a Multi-Track
Framework for
Evaluating Agents,
Games and Content
Generation Algorithms
Diego Perez-Liebana, Jialin Liu,
Ahmed Khalifa, Raluca D.
Gaina, Julian Togelius, Simon
M. Lucas
https://arxiv.org/pdf/1802.10363
http://www.gvgai.net
Malmotutorial
Project Malmo
aka.ms/Malmo
github.com/Microsoft/malmo
The Malmo Platform for Artificial Intelligence
Experimentation
Matthew Johnson, Katja Hofmann, Tim
Hutton, & David Bignell 2016
Malmo design principles
Beyond “narrow AI” with multi-task learning
Wired for multi-agent tasks (including human agents)
Use Cases and Design Principles
into the game through
an intuitive yet powerful API
– building on existing Minecraft
capabilities
Built for extensions and novel uses – open
source; “plug-and-play” design of
observation, command, reward handlers
Low entry
barrier: provide
cross-language
(currently: Java,
.NET, C/C++,
Python, Lua) &
cross-platform
(Windows, Linux,
MacOS) API
Malmo = Minecraft Mod + API + tools
import
import
MalmoPython
MalmoPython “my_mission.xml"
MalmoPython “save.tgz"
// start a new mission
while
// interpret world state
for in
print "Summed reward:"
for in
print "Observation:"
// act
"move 1"
"turn 0.5"
“jump 1"
Python example
import
import
MalmoPython
MalmoPython “my_mission.xml"
MalmoPython “save.tgz"
// start a new mission
while
// interpret world state
for in
print "Summed reward:"
for in
print "Observation:"
// act
"move 1"
"turn 0.5"
“jump 1"
Python example
import
import
MalmoPython
MalmoPython “my_mission.xml"
MalmoPython “save.tgz"
// start a new mission
while
// interpret world state
for in
print "Summed reward:"
for in
print "Observation:"
// act
"move 1"
"turn 0.5"
“jump 1"
Python example
import
import
MalmoPython
MalmoPython “my_mission.xml"
MalmoPython “save.tgz"
// start a new mission
while
// interpret world state
for in
print "Summed reward:"
for in
print "Observation:"
// act
"move 1"
"turn 0.5"
“jump 1"
Python example
import
import
MalmoPython
MalmoPython “my_mission.xml"
MalmoPython “save.tgz"
// start a new mission
while
// interpret world state
for in
print "Summed reward:"
for in
print "Observation:"
// act
"move 1"
"turn 0.5"
“jump 1"
Python example
import
import
MalmoPython
MalmoPython “my_mission.xml"
MalmoPython “save.tgz"
// start a new mission
while
// interpret world state
for in
print "Summed reward:"
for in
print "Observation:"
// act
"move 1"
"turn 0.5"
“jump 1"
Python example
Example: Tabular Q-Learning in Malmo
Example: Deep Q-Learning in Malmo
Malmo design principles
Low entry barrier, yet powerful
Wired for multi-agent tasks (including human agents)
<ServerHandlers>
<FlatWorldGenerator
generatorString="3;7,220*..."/>
<DrawingDecorator>
[...]
<DrawCuboid x1="-2" y1="45" z1="-2" x2="7"
y2="45" z2="18" type="lava" /> <!-- lava floor -->
<DrawCuboid x1="1" y1="45" z1="1" [...]
type="sandstone" /> <!-- floor of the arena -->
<DrawBlock x="4" y="45" z="1" type="cobblestone"
/> <!-- the starting marker -->
[...]
</ServerHandlers>
<AgentHandlers>
<ObservationFromFullStats/>
<DiscreteMovementCommands>
<ModifierList type="deny-list">
<command>attack</command>
</ModifierList>
</DiscreteMovementCommands>
<RewardForTouchingBlockType>
<Block reward="-100.0" type="lava“
behaviour="onceOnly"/>
<Block reward="100.0" type="lapis_block“
behaviour="onceOnly"/>
</RewardForTouchingBlockType>
<RewardForSendingCommand reward="-1"/>
</AgentHandlers>
Example Task
(Mission XML)
Creating new tasks is easy http://sameersingh.org/courses/ai
proj/sp17/projects.html
Malmo design principles
Low entry barrier, yet powerful
Beyond “narrow AI” with multi-task learning
A natural environment for multi-agent learning
Goal: foster research in
collaborative AI
Details: https://www.microsoft.com/en-us/research/academic-program/collaborative-ai-challenge
MARLÖ Competition –
The Multi-Agent Reinforcement
Learning in MalmÖ
Organizers
MARLO: Motivation
• General-reward settings are the most realistic for many real-world
applications but are also notoriously challenging
• More research on insights and approaches that generalize beyond individual
tasks and opponent types.
• The cost of creating tasks and opponents amortizes as both can be shared by
a large community
Overview
• Participants develop agents which play tasks on Malmo platform
• The agents play in multiple games of different scenarios
• Each game has a different set of multi-agent tasks for training, validation and
final test
• Participants use those tasks to train and validate their agents
• The agents play the final test task to determine the winner of MARLO in a
tournament
Competition structure
MARLO Tournament
Evaluation
• Each league (P players in a group) is played across the same N games, with T
repetitions on the private task of each game.
• Each game has its own leaderboard, ranking entries and awarding points: 25
points for the 1st, 18 for the 2nd, 15, 12, 10, 8, 6, 4, 2, 1 and 0 for the 11th
onwards.
• The final ranking for each league is determined by summing points across all
games.
Schedule (draft)
• Same version as
multi-agent tasks
but using bots,
which run locally
• Top 32 evaluated
teams are invited
to the final round
• Multi-agent games
in remote server
for final
tournament
• Live competition!!
Participation: Eligibility
• A team consists of up to five participants
• 18 years of age or older. If any team member is 18 years of age or older, but is
considered a minor in their place of residence, they should ask their parent’s
or legal guardian’s permission prior to submitting an entry into the
Competition
• Award: available only for participants affiliated with a University or a non-
profit research organization
What you get from the competition
• Award
• 1st place: 10,000 USD-equivalent Azure plus a travel grant to join a relevant academic
conference or workshop.
• 2nd place: 5,000 USD-equivalent Azure.
• 3rd place: 3,000 USD-equivalent Azure.
• Publication
• The top three entries will be invited as co-authors in a paper summarizing the
competition structure, rules, approaches, results and main take-aways.
Challenge Games
Mob Chase
Mob Chase
1 point
0.2 points
-0.02 points
Mob Chase
Mob Chase
Mob Chase
Mob Chase
__________
_wwwwwwwww
_w*.....=w
ww......ww
w=...*..w_
ww......w_
_w.*..*.ww
_w......=w
_ww=wwwwww
__www_____
_www______
_w=wwwwww_
_w..*.*.w_
_w*.....w_
_w......w_
_w.*.*..w_
_w......w_
_w......w_
_w==wwwww_
_wwww_____
________www_
wwwwwwwww=w_
w=......*.w_
ww........ww
_w....*.*.=w
ww........ww
w=.....*..w_
ww*.*.....w_
_w........w_
_w........w_
_wwwwwwwwww_
____________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
_wwwwwwww_
_w______w_
_w______w_
_w______w_
_w______w_
_w______w_
_w______w_
_wwwwwwww_
__________
__________
_wwwwwwww_
_w......w_
_w......w_
_w......w_
_w......w_
_w......w_
_w......w_
_wwwwwwww_
__________
__________
_wwwwwwwww
_w......=w
ww......ww
w=......w_
ww......w_
_w......ww
_w......=w
_ww=wwwwww
__www_____
__________
_wwwwwwww_
_w......=_
_w......w_
_=......w_
_w......w_
_w......w_
_w......=_
_ww=wwwww_
__________
__________
_wwwwwwwww
_w*.....=w
ww......ww
w=...*..w_
ww......w_
_w.*..*.ww
_w......=w
_ww=wwwwww
__www_____
Build Battle
Build Battle
1 point
• +.2 points
• -.2 points
-0.02 points
Build Battle
Build Battle
Treasure Hunt
Treasure Hunt
0.5 points
• 0.25 points
• -1 points
-0.02 points
Treasure Hunt
Treasure Hunt
Treasure Hunt
Treasure Hunt
wwwwwwwwwwwwwwwwwwww
w...e+.............w
w..................w
w.gggggggggggggggggw
w...............A..w
w.................Aw
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w......=...........w
wwwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwww
wg...............=w
wg................w
wg................w
wg...A............w
w.................w
wg........*......Aw
wg................w
wg................w
wg............+...w
wg......e.........w
wg..............*.w
wg................w
wg.e..............w
wg............*...w
wg................w
wggggggggg.gggggggw
wg................w
wg................w
wwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwwww
w........g..........w
we.+.....g..........w
w........g......+...w
w*....e..g..........w
w........g..........w
w........g..........w
w........g..........w
wA.......g..........w
w...A...............w
w........g.........=w
w........g..........w
w........g...+......w
w........g..........w
w........g..........w
w..*.....g..........w
wwwwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwww
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
wwwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwww
w....+.............w
w..................w
w.gggggggggggggggggw
w..................w
w..................w
w..................w
w.............+..+.w
w.......+..........w
w..................w
w..................w
w..................w
wwwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwww
w..................w
w..................w
wggggggggggggggggggw
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
wwwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwww
w..................w
w..................w
w.gggggggggggggggggw
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
wwwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwww
w...e+.............w
w..................w
w.gggggggggggggggggw
w..................w
w..................w
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w..................w
wwwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwww
w...e+.............w
w..................w
w.gggggggggggggggggw
w..................w
w..................w
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w......=...........w
wwwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwww
w...e+.............w
w..................w
w.gggggggggggggggggw
w...............A..w
w.................Aw
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w......=...........w
wwwwwwwwwwwwwwwwwwww
Qualifying Task
• MarLo-FindTheGoal-v0
• 7x7 room
• Goal: find the goal ☺
(yellow block)
• Rewards:
• -0.01 per command
• 100 commands max
• 1.0 find goal
• -0.1 out of time
Participating in the Competition
Project MARLÖ
• Multi-Agent Reinforcement Learning
in Malmo
• Reinforcement learning wrapper
build on top of project Malmo
• Proposes to inspire the creation of
extremely potent general agents
through a multi-agent, multi-game
environment
• Uses OpenAI GYM format
• Also on GitHub!
• https://github.com/crowdAI/marLo
• Install Malmo
• Anaconda (recommended)
• Pip (+ git)
• Repack
• Manual compilation
• Install Marlo
Installation instructions
• Install Malmo
• Anaconda (recommended)
• Pip (+ git)
• Repack
• Manual compilation
• Install Marlo
Installation instructions
• Install Malmo
• Anaconda (recommended)
• Pip (+ git)
• Repack
• Manual compilation
• Install Marlo
Installation instructions
• Install Malmo
• Anaconda (recommended)
• Pip (+ git)
• Repack
• Manual compilation
• Install Marlo
Installation instructions
• Install Malmo
• Anaconda (recommended)
• Pip (+ git)
• Repack
• Manual compilation
• Install Marlo
Installation instructions
First MARLO
agent
First MARLO
agent
First MARLO
agent
First MARLO
agent
How about multiple
agents?
How about multiple
agents?
How about multiple
agents?
How about multiple
agents?
How about multiple
agents?
Agents: a semi-technical view
• Agents in Marlo are simple and work in a very Gym-like format:
• Start up a Minecraft client on port 10000
• Use “marlo.make()” function to make an environment. This returns a user token
• Use the user token to generate an image of the environment for agent use with
“marlo.init()”
• Run an agent to play the game
• We have seen a sample random agent that plays any game it connects to
• We also provide examples of more complex agents:
• ChainerRL agents (DQN, PPO)
• TensorBoard-Chainer plotting compatible
• Other environments (TensorFlow, KerasRL, PyBrain) are possible – the only
requirement is that they comply with the Gym API
DQN Example
DQN Example
DQN Example
DQN Example
DQN Example
DQN Example
Experiments
• A simple script which trains an agent over a set number of steps and
episodes is provided within the Marlo package
• The underlying functionality is simple: at the beginning of training, reset
the environment:
Experiments
• Main loop with stopping condition:
• Episode ends or maximum number of steps reached
Experiments
• Log results of the episode
• We incorporate an example to plot using Tensorboard-Chainer
Plotting results (Tensorboard-Chainer)
• Works much like your typical
Tensorboard, only it’s
abstracted to work with
Chainer
• Can be used to gather
images, text, audio and
histograms
Submission
1. Create a private repository on gitlab.crowdai.org. It must contain:
• Dockerfile that installs dependencies and sets up everything
• crowdai.json file with this mandatory fields:
• challenge_id - ”marLo"
• grader_id - " marLo"
• author - name of the author (string), for teams, pleas also create a field 'authors'
containing a list with all authors
Submission
2. Submitting to crowdAI:
• Create and push a new tag
• Each tag counts as a new submission:
• You will be able to see your AI agent actually play the game and see more
details about your submission evaluation of your submission on:
https://gitlab.crowdai.org/<your-crowdAI-user-name>/marLo/issues
• A video of the game will also be generated and available from the leaderboard
• Follow
Malmo: @Project_Malmo and website (aka.ms/malmo)
People on Twitter: @diego_pliebana, @katjahofmann, @MeMohanty
• MARLO Github:
https://github.com/crowdAI/marLo
• MARLO Documentation:
https://marlo.readthedocs.io/en/latest/
• Competition website
https://www.crowdai.org/challenges/marlo-2018
• AIIDE 2018 Workshop
https://marlo-ai.github.io/
Follow the project
Hands-On Time
Hands-On Time
1. Install Malmo and Marlo
2. Play the games
3. Execute agents
Doc: https://marlo.readthedocs.io/en/latest/
Code: https://github.com/crowdAI/marLo/
Competition: https://www.crowdai.org/challenges/marlo-2018
We’re here to help!

More Related Content

PDF
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
PPTX
機械学習 / Deep Learning 大全 (7) 倫理 / プロジェクト編
PPTX
Deep Learning Jump Start
PDF
Beyond TensorBoard: AutoML을 위한 interactive visual analytics 서비스 개발 경험 공유
PDF
Enhance your java applications with deep learning using deep netts
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
PDF
ARISE
PDF
Create a Scalable and Destructible World in HITMAN 2*
[html5jロボット部 第7回勉強会] Microsoft Cognitive Toolkit (CNTK) Overview
機械学習 / Deep Learning 大全 (7) 倫理 / プロジェクト編
Deep Learning Jump Start
Beyond TensorBoard: AutoML을 위한 interactive visual analytics 서비스 개발 경험 공유
Enhance your java applications with deep learning using deep netts
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
ARISE
Create a Scalable and Destructible World in HITMAN 2*

Similar to Malmotutorial (20)

PDF
Introduction to brainCloud - Sept 2014
PDF
[KGC 2010] 게임과 보안, 암호 알고리즘과 프로토콜
PPTX
Recommendations for Building Machine Learning Software
PPTX
Mastering Multiplayer Stage3d and AIR game development for mobile devices
PPTX
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
PDF
Understanding and improving games through machine learning - Natasha Latysheva
PDF
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
PPTX
Data Driven Game Design @ Campus Party 2018
PDF
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
PDF
Machine learning workshop
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
PPTX
Vulnerabilities of machine learning infrastructure
PDF
So you want to be a red teamer
PDF
Million Dollar Case Studies
PDF
Eight Rules for Making Your First Great Game
PDF
Building A Machine Learning Platform At Quora (1)
PDF
Data Democratization at Nubank
PDF
RULE BOOK aDAVitya 2024 Final.pdf
PPTX
Ancient world online
PDF
Letswift18 워크숍#1 스위프트 클린코드와 코드리뷰
Introduction to brainCloud - Sept 2014
[KGC 2010] 게임과 보안, 암호 알고리즘과 프로토콜
Recommendations for Building Machine Learning Software
Mastering Multiplayer Stage3d and AIR game development for mobile devices
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Understanding and improving games through machine learning - Natasha Latysheva
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Data Driven Game Design @ Campus Party 2018
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
Machine learning workshop
Lessons Learned from Building Machine Learning Software at Netflix
Vulnerabilities of machine learning infrastructure
So you want to be a red teamer
Million Dollar Case Studies
Eight Rules for Making Your First Great Game
Building A Machine Learning Platform At Quora (1)
Data Democratization at Nubank
RULE BOOK aDAVitya 2024 Final.pdf
Ancient world online
Letswift18 워크숍#1 스위프트 클린코드와 코드리뷰
Ad

More from Hirono Jumpei (20)

PDF
20190822 Microsoftが考えるAI活用のロードマップ
PDF
20190719 minerlpl
PDF
Dllab2ndanniversarypl
PPTX
Microsoft Autonomousへの取り組み
PDF
Chainer on Azure 2 年の歴史
PDF
AI開発を円滑に進めるための契約・法務・知財
PDF
Iot algyan jhirono 20190111
PDF
Microsoft digital transformation and ai 20181126
PPTX
20180627 databricks ver1.1
PDF
DLLAB COMMUNITY UPDATE 201804
PDF
Deep learning lab AI Expo
PDF
20180323 dll standard
PPTX
Dll commuinity and academy update 201803 v2
PDF
深層学習の導入で抱える課題とユースケース実例
PDF
異常検知ナイトgLupe発表
PPTX
DLLAB commuinity and academy update 201802
PPTX
【Dll171201】深層学習利活用の紹介 掲載用
PDF
Microsoft の深層学習への取り組み
PDF
深層学習の導入で抱える課題とユースケース実例
PDF
20171201 deep learning lab albert
20190822 Microsoftが考えるAI活用のロードマップ
20190719 minerlpl
Dllab2ndanniversarypl
Microsoft Autonomousへの取り組み
Chainer on Azure 2 年の歴史
AI開発を円滑に進めるための契約・法務・知財
Iot algyan jhirono 20190111
Microsoft digital transformation and ai 20181126
20180627 databricks ver1.1
DLLAB COMMUNITY UPDATE 201804
Deep learning lab AI Expo
20180323 dll standard
Dll commuinity and academy update 201803 v2
深層学習の導入で抱える課題とユースケース実例
異常検知ナイトgLupe発表
DLLAB commuinity and academy update 201802
【Dll171201】深層学習利活用の紹介 掲載用
Microsoft の深層学習への取り組み
深層学習の導入で抱える課題とユースケース実例
20171201 deep learning lab albert
Ad

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
KodekX | Application Modernization Development
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Empathic Computing: Creating Shared Understanding
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
KodekX | Application Modernization Development
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Empathic Computing: Creating Shared Understanding
Diabetes mellitus diagnosis method based random forest with bat algorithm
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The AUB Centre for AI in Media Proposal.docx
Advanced methodologies resolving dimensionality complications for autism neur...
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Malmotutorial