Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lecture (EGH444, QUT)

Juxi Leitner
arc centre of excellence for robotic vision
queensland university of technology
<j.leitner@qut.edu.au> 
http://Juxi.net
reinforcement
learning
Juxi
(deep)

http://roboticvision.org/
tinyurl.com/QUTRobotics
roboticvision.org

Dalle Molle Institute for AI (IDSIA)
Work
Juxi
Leitner
PhD Informatics / Intelligent Systems
MSc Space Robotics & Automation
BSc Information & Software Engineering
Intelligent (Space) Robots
European Space Agency (ESA)
Erasmus Intelligent Systems
Work (Humanoid) Robot Vision
Instituto Superior Técnico (IST)
Mobility Intelligent Space Systems Laboratory
About Me
Current Robotic Vision and Actions
Queensland University of Technology (QUT)
arc centre of excellence for robotic vision | qut 
juxi.net | roboticvision.org | bne-robotics.net | brisbane.ai

BRISBANE ARTIFICIAL INTELLIGENCE
@BrisbaneAI #brai
sponsors
Event hosts

create agent that see and
interact with the world
http://Juxi.net/aboutme

coordination
eye-hand
Vision
and Ac4on
http://Juxi.net/projects/VisionAndAction/

BRISBANE.AI
deﬁning AI
study of "intelligent agents”:
any device that perceives its environment and takes actions
that maximize its chance of success at some goal

agent interacting 
with the world

IM-CLeVeR Teaser
https://www.youtube.com/watch?v=OyfonCDxUiU
full video: https://vimeo.com/51011081

BRISBANE.AI
machinelearning
supervised learning
…

BRISBANE.AI
machinelearning
unsupervised learning

BRISBANE.AI
learning
reinforcement learning
run
hug
?
agent
ageagent
see act
reward
machine

BRISBANE.AI
machinelearning
run
hug
agent
ageagent
see act
reward
hug :)

BRISBANE.AI
machinelearning
reward
reward

robot RL
Konidaris et al.
Autonomous Skill
Acquisition on 
a Mobile
Manipulator
https://www.youtube.com/watch?v=yUICAkSQTZY

Jan Peters et al.
Motor Skill 
Learning from 
Demonstration
robot RL
https://www.youtube.com/watch?v=qtqubguikMk

foundations
a policy, a reward signal, a value func,on,
and, op,onally, a model of the environment
http://cs.stanford.edu/people/karpathy/reinforcejs/http://karpathy.github.io/2016/05/31/rl/

policy
Policy deﬁnes the actions to be taken 
per state

value function
Value function is a prediction of future reward 
per state

mdp
An information state (a.k.a. Markov state)
contains all useful information from the history.
i.e. the state is a sufﬁcient statistic of the future
pomdp
what if: robot with camera vision isn’t told its absolute location
agent state != environment state 
 
Formally this is a partially observable Markov decision process
(POMDP)

mdp

policy iteration

value iteration

Q function

on/off policy
Q-learning (Watkins, 1989),
h7ps://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/

BRISBANE.AI
deephype
Dartmouth1956
Initial 
Hype
AI Winter
AI Winter
Expert 
Systems 
Hype
AI Spring
Deep
Learning 
Hype

deep RL
Sergey Levine / Google Brain
DeepMind
QUT

deep RL
DeepMind
idea: learn a policy to play a computer game
using only visual information
state: ?
actions: ?
reward: ?
game screen
button press
game score
problems?

understanding limita,ons of deep nets, 
reinforcement learning and transfer of knowledge
deep learning visual control
[Tow et al, ACRA 2015]

[Zhang et al, ACRA 2015]

[Zhang et al, arxiv.org]

transfer visual control 
from simulation to real world
reward issue
percep,on issue
noise issue
explora,on issue

deep learning visual servoing
Perception Module Control Module
Conv1 Conv2 Conv3 FC_c2 FC_c3FC_c1
Q-values
7×7conv+ReLU
stride2
4×4conv+ReLU
stride2
3×3conv+ReLU
stride1
64 lters 64 lters 64 lters
fullyconn.
300units
9units
84×84
400units
fullyconn.
fullyconn.+ReLU
fullyconn.+ReLU
I BN
5units
θ
Bottleneck
Or
Occlusion
A B C ED
Occlusion Occlusion
Occlusion
[Zhang et al, arxiv.org]

ARC Centre of Excellence for Robotic Vision roboticvision.org
limita,ons of current robo,c systems 
reproducible research on TASKS not datasets  
picking benchmark
http://Juxi.net/dataset/acrv-picking-benchmark/
https://arxiv.org/abs/1609.05258

deep RL
Sergey Levine, Peter Pastor et al. (Google Brain)

artiﬁcial curiosity
Reward the reward-optimizing controller for
actions yielding data that cause
improvements of the adaptive predictor or
data compressor!
idea:
similar: intrinsic motivation

RL applications
Motion Planning
Grasping
End-to-end
Vision (?)

challenges
Curiosity?
Exploration/Exploitation?
Oracles?
discreet —> continuous?
real-world reward?
https://github.com/aikorea/awesome-rl#theory

BRISBANE.AI
new developments
arxiv-sanity, twitter & get your hands dirty
come to Brisbane.AI meetups! :)
how to keep in the loop?
http://Juxi.net/workshop/deep-learning-rss-2017/
Tools and toolboxes
Neuroscience vs Deep Learning
&
Evolutionary approaches
Generative Adversarial Networks
Unsupervised Learning, Embodied Learning

BRISBANE.AI
Jürgen ‘Juxi’ Leitner
arc centre of excellence for robotic vision | qut 
juxi.net | roboticvision.org | bne-robotics.net | brisbane.ai
In which we try to explain why we consider ar,ﬁcial
intelligence to be a subject most worthy of study, and
in which we try to decide what exactly it is, this 
being a good thing to decide before embarking.
TUTORIAL ONE
BRISBANE ARTIFICIAL INTELLIGENCE
http://Juxi.net 
<juxi.leitner@gmail.com>

interested?
j.leitner@roboticvision.org
http://Juxi.net/projects
Juxi
BEB801/2 Projects
PhD positions
ideas…

Amazon Robotics Challenge
Jürgen ‘Juxi’ Leitner
arc centre of excellence for robotic vision
queensland university of technology
<j.leitner@qut.edu.au> http://Juxi.net

object manipulation in clutter
long term goal

ARC Centre of Excellence for Robotic Vision roboticvision.orghttp://roboticvision.org/
#cartman

Hardware
#cartman

Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lecture (EGH444, QUT)

s m a r t . r o b o t s .
SMRTRobots
END- 
EFFECTOR

Perception
#cartman
seman&c segmenta&on

Perception
#cartman
rapid training

Perception
#cartman

Perception
#cartman
grasp synthesis

in Action
#cartman
Videos
https://www.youtube.com/watch?time_continue=5&v=p-WhO0LF4oY (ARC Pick failure)
https://www.youtube.com/watch?v=BB5Pyh4dtxw (ARC Quick Learning of Items)
https://www.youtube.com/watch?v=VEKanLH2gFY (ARC Finals)
https://www.youtube.com/watch?v=a4_j6EAK3rs&feature=youtu.be (Reaching Learning)

in Action
#cartman
Papers / TechRep
Cartman: The low-cost Cartesian Manipulator that won the Amazon Robotics Challenge. Douglas Morrison, et al
https://arxiv.org/abs/1709.06283
Mechanical Design of a Cartesian Manipulator for Warehouse Pick and Place. M. McTaggart, et al  
http://juxi.net/papers/ACRV-TR-2017-02.pdf
Design of a Multi-Modal End-Effector and Grasping System: How Integrated Design helped win the ARC. S. Wade-
McCue, N. Kelly-Boxall, et al. http://juxi.net/papers/ACRV-TR-2017-03.pdf
Semantic Segmentation from Limited Training Data. Anton Milan, et al. http://juxi.net/papers/ACRV-TR-2017-04.pdf
Sim-to-real Transfer of Visuo-motor Policies for Reaching in Clutter: Domain Randomization & Adaptation with
Modular Nets. Fangyi Zhang, et al. https://arxiv.org/abs/1709.05746
Training Deep Neural Networks for Visual Servoing. Quentin Bateux, et al https://arxiv.org/abs/1705.08940
h7p://Juxi.net/papers

Nice Features
#cartman

Nice Features
#cartman
HW=12k
project=42k
travel=22k
w/o salaries

#teamACRVRoboticVisionAU
Adam Tow
Steve Mar&n
Rohan Smith
Jordan Erskine
Anthony Gillespie
Riccardo Grinover
Alec Gurman
Tom Hunn
Darryl Lee
Nathan Perkins
Gerard Rallos
Andrew Razjigaev
Juxi Leitner, Ian Reid, Peter Corke
http://facebook.com/TeamACRV
Doug Morrison
Ma7 McTaggert
Zheyu Zhuang
Norton Kelly-Boxall
Sean Wade-McCue
Thomas Rowntree
Trung Pham
Vijay Kumar
Ming Cai
Saroj Weerasekera
Chris Lehnert
Anton Milan
Thank
You!

Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lecture (EGH444, QUT)

More Related Content

Similar to Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lecture (EGH444, QUT) (20)

More from Juxi Leitner (20)

Recently uploaded (20)

Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lecture (EGH444, QUT)