Machine Learning in Unity - How to give your game AI a real brain

2
Привет!
My name is Ciro Continisio
Technical Evangelist at Unity

3
Привет!
My name is Alessia Nigretti
Technical Evangelist at Unity

5
Introduction
What is Machine Learning?

7
What is Machine Learning
Reinforcement Learning

8
A computer system
modelled on
the human brain and
nervous system
Neural Networks
What is Machine Learning

15
3D Ball
Goal:
Balance the ball on the platform
Reward:
● +0.1 for every frame the ball
remains on the platform
● -1.0 if the ball falls from the
platform

16
Propellers
Goal:
Have the cubes learn to float
Reward:
● +0.1 for each frame the cube floats
● -1.0 for each collision with the floor

17
Arena
Goal:
Push the crate out of the arena
Rewards:
● +0.2 for if closing on the crate
● +0.5 when crates gets further from
the center
● Neg. rewards for delaying, or falling

18
Bounce Ball
Goal:
Bounce ball on top of agent’s head
Reward:
● +0.1 for each frame the ball is
closer to the agent
● -0.1 for each frame the ball is
further away from the agent

19
Problem
Can you use Machine Learning
in a real game?

21
Roguelike Game
Ingredients
• A simple action game
• All entities are Agents, both the player and the enemies
• Establish a common “interaction language”
• The goal is survival, while attacking other entities

23
Setting up the training
Design and
ideas
• What are the game actions
• What you want the Agent
to learn
• What’s right or wrong
(what to reward)

24
Discrete vs. Continuous
Discrete means that the States/Actions can only have one value
at a time. Like an Enum. It’s either 0, or 1, or 2, or 3, etc.
⬝ Easier: Agents associate actions with rewards more easily
In Roguelike, we use Discrete for Actions. It can have 6 values:
0: Stay still / 1-4: Move in one direction / 5: Attack

25
Discrete vs. Continuous
Continuous means you can have multiple States (or Actions)
and they all have float values.
⬝ They require more memory for training (hyperparameters)
⬝ Hard to use: they can confuse the Agent
In Roguelike, we use Continuous for States:
health, canAttack, hasTarget, distanceFromTarget, …

26
The pseudo-algorithm (AgentStep)
If health > 50% then
If current distance < previous distance then
Reward
End
Else
If current distance > previous distance then
Reward
End
End
If input is attack
If can attack then
Start attack
Else
Punish
Else
If is not healing and health < max health then
Start healing
End
End
Movement Attack

27
Spoiler
This initial algorithm has changed
a lot

28
Tips on rewards
• Rewards can come in the AgentStep function, but also at
other times (OnCollisionEnter, etc.)
• Agents will find a way to exploit the rewards!
• Small details in rewards influence the learning process
reward = .2f / (distanceSqr + .01f); > reward = .2f;

29
Training scene
• Position and configure the
agent(s)
• Connect them to the
relevant Brains
• Configure the Academy

31
Tips for the training environment
• Different situations in parallel help Agents to learn better
• Heuristic Agents are the perfect training dummies!
• Before launching a 1 hour training:
• Double-check your logic so you don’t make wrong
assumptions
• Launch a 1x speed training to see what’s happening

32
Building and training
• Set the Brain to External
• Build!
• Set up python environment and hyperparameters
• Launch training
Training

34
Training with Tensorflow
• Observe the mean reward
• Stop when it looks stable
• Export the model, import into Unity
• Set the Brain to Internal
• Play!
Training

36
Final tips and
Key takeaways

37
Tips on hyperparameters
• Beta: is the randomisation of actions. If agents corner
themselves on a behaviour quickly, increase beta
• Batch size, Buffer size, Hidden units: they differ a lot
between using Discrete or Continuous spaces
Read the guide: github.com/Unity-Technologies/ml-agents
Training

38
Tips and takeaways
Physics
Because ML runs on the FixedUpdate (for stability):
• Remember Rigidbody.position doesn’t change mid-frame
• Switch Animators from Normal to Animate Physics if
animation is key in the training
• If using interpolation on the RB, Rigidbody.position or
Rigidbody.MovePosition( ) behave differently

39
Tips and takeaways
Build tools
The training process can be
long and repetitive.
Make your life easier by
building some little tools.

41
Next
What now?
• Mix Trained AI with Heuristic AI to obtain final behaviour
Learning from the player
• Gather players’ behaviour and train agents (offline)
based on the information you obtained
• Coming soon: Imitation Learning!

42
Спасибо!
Ciro Continisio
ciro@unity3d.com
@CiroContns
Alessia Nigretti
alessian@unity3d.com
@AlessiaNigretti
Get the demo and presentation:
bit.ly/UnityDevgammMinsk

Machine Learning in Unity - How to give your game AI a real brain

More Related Content

Similar to Machine Learning in Unity - How to give your game AI a real brain (20)

More from DevGAMM Conference (20)

Recently uploaded (20)

Machine Learning in Unity - How to give your game AI a real brain