(deep) reinforcement learning - CAB420

Juxi Leitner
arc centre of excellence for robotic vision
queensland university of technology
<j.leitner@qut.edu.au> 
http://Juxi.net
reinforcement
learning
Juxi
Guest Lecture, CAB420 Machine Learning
(deep)

http://roboticvision.org/
tinyurl.com/QUTRobotics
roboticvision.org

Dalle Molle Institute for AI (IDSIA)
Work
Juxi
Leitner
PhD Informatics / Intelligent Systems
MSc Space Robotics & Automation
BSc Information & Software Engineering
Intelligent (Space) Robots
European Space Agency (ESA)
Erasmus Intelligent Systems
Work (Humanoid) Robot Vision
Instituto Superior Técnico (IST)
Mobility Intelligent Space Systems Laboratory
About Me
Current Robotic Vision and Actions
Queensland University of Technology (QUT)
arc centre of excellence for robotic vision | qut 
juxi.net | roboticvision.org | bne-robotics.net | brisbane.ai

create agent that see and
interact with the world
http://Juxi.net/aboutme

coordination
eye-hand
Vision
and Ac4on
http://Juxi.net/projects/VisionAndAction/

BRISBANE.AI
deﬁning AI
study of "intelligent agents”:
any device that perceives its environment and takes actions
that maximize its chance of success at some goal

agent interacting 
with the world

IM-CLeVeR Teaser
https://www.youtube.com/watch?v=OyfonCDxUiU
full video: https://vimeo.com/51011081

BRISBANE.AI
machinelearning
supervised learning
…

BRISBANE.AI
machinelearning
unsupervised learning

BRISBANE.AI
learning
reinforcement learning
run
hug
?
agent
ageagent
see act
reward
machine

BRISBANE.AI
machinelearning
run
hug
agent
ageagent
see act
reward
hug :)

BRISBANE.AI
machinelearning
reward
reward

robot RL
Konidaris et al.
Autonomous Skill
Acquisition on 
a Mobile
Manipulator
https://www.youtube.com/watch?v=yUICAkSQTZY

Jan Peters et al.
Motor Skill 
Learning from 
Demonstration
robot RL
https://www.youtube.com/watch?v=qtqubguikMk

foundations
a policy, a reward signal, a value func,on,
and, op,onally, a model of the environment
http://cs.stanford.edu/people/karpathy/reinforcejs/http://karpathy.github.io/2016/05/31/rl/

policy
Policy deﬁnes the actions to be taken 
per state

value function
Value function is a prediction of future reward 
per state

mdp
An information state (a.k.a. Markov state)
contains all useful information from the history.
i.e. the state is a sufﬁcient statistic of the future
pomdp
what if: robot with camera vision isn’t told its absolute location
agent state != environment state 
 
Formally this is a partially observable Markov decision process
(POMDP)

mdp

Example 3.2: Pick-and-Place Robot 
Consider using reinforcement learning to control the mo4on of a robot arm in a
repe44ve pick-and-place task. If we want to learn movements that are fast and smooth,
the learning agent will have to control the motors directly and have low-latency
informa4on about the current posi4ons and veloci4es of the mechanical linkages. The
ac4ons in this case might be the voltages applied to each motor at each joint, and the
states might be the latest readings of joint angles and veloci4es. The reward might be
+1 for each object successfully picked up and placed. To encourage smooth movements,
on each 4me step a small, nega4ve reward can be given as a func4on of the moment-
to-moment “jerkiness” of the mo4on.

policy iteration

value iteration

RL agents

Q function

on/off policy
Q-learning (Watkins, 1989),
h7ps://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/

BRISBANE.AI
deephype
Dartmouth1956
Initial 
Hype
AI Winter
AI Winter
Expert 
Systems 
Hype
AI Spring
Deep
Learning 
Hype

deep RL
Sergey Levine / Google Brain
DeepMind
QUT

deep RL
DeepMind
idea: learn a policy to play a computer game
using only visual information
state: ?
actions: ?
reward: ?

deep RL
DeepMind
idea: learn a policy to play a computer game
using only visual information
state: ?
actions: ?
reward: ?
game screen
button press
game score
problems?

understanding limita,ons of deep nets, 
reinforcement learning and transfer of knowledge
deep learning visual control
[Tow et al, ACRA 2015]

[Zhang et al, ACRA 2015]

[Zhang et al, arxiv.org]

transfer visual control 
from simulation to real world
reward issue
percep,on issue
noise issue
explora,on issue

deep learning visual servoing
Perception Module Control Module
Conv1 Conv2 Conv3 FC_c2 FC_c3FC_c1
Q-values
7×7conv+ReLU
stride2
4×4conv+ReLU
stride2
3×3conv+ReLU
stride1
64 lters 64 lters 64 lters
fullyconn.
300units
9units
84×84
400units
fullyconn.
fullyconn.+ReLU
fullyconn.+ReLU
I BN
5units
θ
Bottleneck
Or
Occlusion
A B C ED
Occlusion Occlusion
Occlusion
[Zhang et al, arxiv.org]

ARC Centre of Excellence for Robotic Vision roboticvision.org
limita,ons of current robo,c systems 
reproducible research on TASKS not datasets  
picking benchmark
http://Juxi.net/dataset/acrv-picking-benchmark/
https://arxiv.org/abs/1609.05258

deep RL
Sergey Levine, Peter Pastor et al. (Google Brain)

artiﬁcial curiosity
Reward the reward-optimizing controller for
actions yielding data that cause
improvements of the adaptive predictor or
data compressor!
idea:
similar: intrinsic motivation

robot RL
Schmidhuber (IDSIA)
http://journal.frontiersin.org/article/10.3389/fnbot.2013.00025/full

robot RL
Oudeyer (INRIA)
https://www.youtube.com/watch?v=NOLAwD4ZTW0

Is NeuroEvolution coming back?
some recent papers:
"Evolving Deep Neural Networks" by Miikkulainen et al (Sentient Technologies)
"Large-scale Evolution of Image Classifiers" by Real et al (Google Brain)
"PathNet: Evolution Channels Gradient Descent in Super Neural Networks" by
Fernando et al. (DeepMind)
"Evolution Strategies as a Scalable Alternative to Reinforcement Learning" by
Salimans et al (OpenAI)
evolution and RL
Neuroevolution of augmented topologiesNEAT (2002):

RL applications
Motion Planning
Grasping
End-to-end
Vision (?)

challenges
Curiosity?
Exploration/Exploitation?
Oracles?
discreet —> continuous?
real-world reward?
https://github.com/aikorea/awesome-rl#theory

BRISBANE.AI
new developments
arxiv-sanity, twitter & get your hands dirty
come to Brisbane.AI meetups! :)
how to keep in the loop?
http://Juxi.net/workshop/deep-learning-rss-2017/
Tools and toolboxes
Neuroscience vs Deep Learning
&
Evolutionary approaches
Generative Adversarial Networks
Unsupervised Learning, Embodied Learning

BRISBANE.AI
Jürgen ‘Juxi’ Leitner
arc centre of excellence for robotic vision | qut 
juxi.net | roboticvision.org | bne-robotics.net | brisbane.ai
In which we try to explain why we consider ar,ﬁcial
intelligence to be a subject most worthy of study, and
in which we try to decide what exactly it is, this 
being a good thing to decide before embarking.
TUTORIAL ONE
BRISBANE ARTIFICIAL INTELLIGENCE
http://Juxi.net 
<juxi.leitner@gmail.com>

interested?
j.leitner@roboticvision.org
http://Juxi.net/projects
Juxi
BEB801/2 Projects
PhD positions
ideas…

for listening
thank you
ICRA 2018 
Brisbane,Australia
j.leitner@roboticvision.org
http://Juxi.net/projects
Juxi

(deep) reinforcement learning - CAB420

More Related Content

Similar to (deep) reinforcement learning - CAB420 (20)

More from Juxi Leitner (20)

Recently uploaded (20)

(deep) reinforcement learning - CAB420