SlideShare a Scribd company logo
Juxi Leitner
arc centre of excellence for robotic vision
queensland university of technology
<j.leitner@qut.edu.au>

http://Juxi.net
reinforcement
learning
Juxi
Guest Lecture, CAB420 Machine Learning
(deep)
http://roboticvision.org/
tinyurl.com/QUTRobotics
roboticvision.org
Dalle Molle Institute for AI (IDSIA)
Work
Juxi
Leitner
PhD Informatics / Intelligent Systems
MSc Space Robotics & Automation
BSc Information & Software Engineering
Intelligent (Space) Robots
European Space Agency (ESA)
Erasmus Intelligent Systems
Work (Humanoid) Robot Vision
Instituto Superior Técnico (IST)
Mobility Intelligent Space Systems Laboratory
About Me
Current Robotic Vision and Actions
Queensland University of Technology (QUT)
arc centre of excellence for robotic vision | qut

juxi.net | roboticvision.org | bne-robotics.net | brisbane.ai
startupsresearch industry
http://roboticvision.org/
create agent that see and
interact with the world
http://Juxi.net/aboutme
http://roboticvision.org/
coordination
eye-hand
Vision
and Ac4on
http://Juxi.net/projects/VisionAndAction/
BRISBANE.AI
defining AI
study of "intelligent agents”:
any device that perceives its environment and takes actions
that maximize its chance of success at some goal
http://roboticvision.org/
agent interacting

with the world
http://roboticvision.org/
IM-CLeVeR Teaser
https://www.youtube.com/watch?v=OyfonCDxUiU
full video: https://vimeo.com/51011081
BRISBANE.AI
machinelearning
supervised learning
…
BRISBANE.AI
machinelearning
unsupervised learning
BRISBANE.AI
learning
reinforcement learning
run
hug
?
agent
ageagent
see act
reward
machine
BRISBANE.AI
machinelearning
reinforcement learning
run
hug
agent
ageagent
see act
reward
hug :)
BRISBANE.AI
machinelearning
reinforcement learning
reward
reward
http://roboticvision.org/
robot RL
Konidaris et al.
Autonomous Skill
Acquisition on

a Mobile
Manipulator
https://www.youtube.com/watch?v=yUICAkSQTZY
http://roboticvision.org/
Jan Peters et al.
Motor Skill

Learning from

Demonstration
robot RL
https://www.youtube.com/watch?v=qtqubguikMk
http://roboticvision.org/
foundations
a policy, a reward signal, a value func,on,
and, op,onally, a model of the environment
http://cs.stanford.edu/people/karpathy/reinforcejs/http://karpathy.github.io/2016/05/31/rl/
http://roboticvision.org/
policy
Policy defines the actions to be taken

per state
http://roboticvision.org/
value function
Value function is a prediction of future reward

per state
http://roboticvision.org/
mdp
An information state (a.k.a. Markov state)
contains all useful information from the history.
i.e. the state is a sufficient statistic of the future
pomdp
what if: robot with camera vision isn’t told its absolute location
agent state != environment state



Formally this is a partially observable Markov decision process
(POMDP)
http://roboticvision.org/
mdp
http://roboticvision.org/
Example 3.2: Pick-and-Place Robot

Consider using reinforcement learning to control the mo4on of a robot arm in a
repe44ve pick-and-place task. If we want to learn movements that are fast and smooth,
the learning agent will have to control the motors directly and have low-latency
informa4on about the current posi4ons and veloci4es of the mechanical linkages. The
ac4ons in this case might be the voltages applied to each motor at each joint, and the
states might be the latest readings of joint angles and veloci4es. The reward might be
+1 for each object successfully picked up and placed. To encourage smooth movements,
on each 4me step a small, nega4ve reward can be given as a func4on of the moment-
to-moment “jerkiness” of the mo4on.
http://roboticvision.org/
policy iteration
http://roboticvision.org/
value iteration
http://roboticvision.org/
RL agents
http://roboticvision.org/
Q function
http://roboticvision.org/
on/off policy
Q-learning (Watkins, 1989),
h7ps://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/
BRISBANE.AI
deephype
Dartmouth1956
Initial

Hype
AI Winter
AI Winter
Expert

Systems

Hype
AI Spring
Deep
Learning

Hype
http://roboticvision.org/
deep RL
Sergey Levine / Google Brain
DeepMind
QUT
http://roboticvision.org/
deep RL
DeepMind
idea: learn a policy to play a computer game
using only visual information
state: ?
actions: ?
reward: ?
http://roboticvision.org/
deep RL
DeepMind
idea: learn a policy to play a computer game
using only visual information
state: ?
actions: ?
reward: ?
game screen
button press
game score
problems?
understanding limita,ons of deep nets,

reinforcement learning and transfer of knowledge
deep learning visual control
[Tow et al, ACRA 2015]
http://roboticvision.org/
[Zhang et al, ACRA 2015]
deep learning visual control
http://roboticvision.org/
[Zhang et al, arxiv.org]
deep learning visual control
understanding limita,ons of deep nets,

reinforcement learning and transfer of knowledge
http://roboticvision.org/
transfer visual control

from simulation to real world
reward issue
percep,on issue
noise issue
explora,on issue
deep learning visual servoing
Perception Module Control Module
Conv1 Conv2 Conv3 FC_c2 FC_c3FC_c1
Q-values
7×7conv+ReLU
stride2
4×4conv+ReLU
stride2
3×3conv+ReLU
stride1
64 lters 64 lters 64 lters
fullyconn.
300units
9units
84×84
400units
fullyconn.
fullyconn.+ReLU
fullyconn.+ReLU
I BN
5units
θ
Bottleneck
Or
Occlusion
A B C ED
Occlusion Occlusion
Occlusion
[Zhang et al, arxiv.org]
understanding limita,ons of deep nets,

reinforcement learning and transfer of knowledge
ARC Centre of Excellence for Robotic Vision roboticvision.org
limita,ons of current robo,c systems

reproducible research on TASKS not datasets 

picking benchmark
http://Juxi.net/dataset/acrv-picking-benchmark/
https://arxiv.org/abs/1609.05258
http://roboticvision.org/
deep RL
Sergey Levine, Peter Pastor et al. (Google Brain)
http://roboticvision.org/
artificial curiosity
Reward the reward-optimizing controller for
actions yielding data that cause
improvements of the adaptive predictor or
data compressor!
idea:
similar: intrinsic motivation
http://roboticvision.org/
robot RL
Schmidhuber (IDSIA)
http://journal.frontiersin.org/article/10.3389/fnbot.2013.00025/full
http://roboticvision.org/
robot RL
Oudeyer (INRIA)
https://www.youtube.com/watch?v=NOLAwD4ZTW0
http://roboticvision.org/
Is NeuroEvolution coming back?
some recent papers:
"Evolving Deep Neural Networks" by Miikkulainen et al (Sentient Technologies)
"Large-scale Evolution of Image Classifiers" by Real et al (Google Brain)
"PathNet: Evolution Channels Gradient Descent in Super Neural Networks" by
Fernando et al. (DeepMind)
"Evolution Strategies as a Scalable Alternative to Reinforcement Learning" by
Salimans et al (OpenAI)
evolution and RL
Neuroevolution of augmented topologiesNEAT (2002):
http://roboticvision.org/
RL applications
Motion Planning
Grasping
End-to-end
Vision (?)
http://roboticvision.org/
challenges
Curiosity?
Exploration/Exploitation?
Oracles?
discreet —> continuous?
real-world reward?
https://github.com/aikorea/awesome-rl#theory
BRISBANE.AI
new developments
arxiv-sanity, twitter & get your hands dirty
come to Brisbane.AI meetups! :)
how to keep in the loop?
http://Juxi.net/workshop/deep-learning-rss-2017/
Tools and toolboxes
Neuroscience vs Deep Learning
&
Evolutionary approaches
Generative Adversarial Networks
Unsupervised Learning, Embodied Learning
BRISBANE.AI
Jürgen ‘Juxi’ Leitner
arc centre of excellence for robotic vision | qut

juxi.net | roboticvision.org | bne-robotics.net | brisbane.ai
In which we try to explain why we consider ar,ficial
intelligence to be a subject most worthy of study, and
in which we try to decide what exactly it is, this

being a good thing to decide before embarking.
TUTORIAL ONE
BRISBANE ARTIFICIAL INTELLIGENCE
http://Juxi.net

<juxi.leitner@gmail.com>
interested?
j.leitner@roboticvision.org
http://Juxi.net/projects
Juxi
BEB801/2 Projects
PhD positions
ideas…
for listening
thank you
ICRA 2018

Brisbane,Australia
j.leitner@roboticvision.org
http://Juxi.net/projects
Juxi

More Related Content

PDF
Team ACRV's experience at #AmazonPickingChallenge 2016
PPT
PDF
Improving Robotic Manipulation with Vision and Learning @AmazonDevCentre Berlin
PDF
Data, Not Answers
PDF
Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lec...
PDF
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
PDF
MILA DL & RL summer school highlights
PDF
Robots that Grasp the World
Team ACRV's experience at #AmazonPickingChallenge 2016
Improving Robotic Manipulation with Vision and Learning @AmazonDevCentre Berlin
Data, Not Answers
Deep Reinforcement Learning | Amazon Robotics Challenge, Image Processing Lec...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
MILA DL & RL summer school highlights
Robots that Grasp the World

Similar to (deep) reinforcement learning - CAB420 (20)

PDF
PAISS (PRAIRIE AI Summer School) Digest July 2018
PDF
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
PDF
AI-based Robotic Manipulation
PDF
20181212 Queensland AI Meetup
PDF
Making Robots Learn
PPT
Machine Learning and Robotics
PPTX
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
PPTX
2nd DL Meetup @ Dublin - Irene
PPT
AI Robotics
PDF
State Representation Learning for control: an overview
PDF
True Artificial Intelligence Will Change Everything
PDF
Research in Deep Learning: A Perspective from NSF
PDF
The Need For Robots To Grasp the World
PDF
From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...
PPT
Robotics lover
PDF
Racing with Artificial Intelligence
PDF
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
PDF
Deep Learning Representations for All (a.ka. the AI hype)
PDF
Dmytro Kuzmenko: State-of-the-Art AI in Robotics (UA)
PDF
IRJET - Obstacle Detection using a Stereo Vision of a Car
PAISS (PRAIRIE AI Summer School) Digest July 2018
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
AI-based Robotic Manipulation
20181212 Queensland AI Meetup
Making Robots Learn
Machine Learning and Robotics
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
2nd DL Meetup @ Dublin - Irene
AI Robotics
State Representation Learning for control: an overview
True Artificial Intelligence Will Change Everything
Research in Deep Learning: A Perspective from NSF
The Need For Robots To Grasp the World
From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...
Robotics lover
Racing with Artificial Intelligence
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
Deep Learning Representations for All (a.ka. the AI hype)
Dmytro Kuzmenko: State-of-the-Art AI in Robotics (UA)
IRJET - Obstacle Detection using a Stereo Vision of a Car
Ad

More from Juxi Leitner (20)

PDF
Cartman, how to win the amazon robotics challenge with robotic vision and dee...
PDF
ACRV Picking Benchmark: how to benchmark pick and place robotics research
PDF
ACRV : Robotic Vision presentation in Lisbon at IST
PDF
The Australian Centre for Robotic Vision (ACRV)
PDF
How to place 6th in the Amazon Picking Challenge (ENB329, QUT)
PDF
LunaRoo: Designing a Hopping Lunar Science Payload #space #exploration
PDF
Robotic Vision - Vision for Robotics #IEEE #QLD #CIS #Colloquium
PDF
ROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRV
PDF
ACRV Research Fellow Intro/Tutorial [Vision and Action]
PDF
Reactive Reaching and Grasping on a Humanoid: Towards Closing the Action-Perc...
PDF
Tele-operation of a Humanoid Robot, Using Operator Bio-data
PDF
Improving Robot Vision Models for Object Detection Through Interaction #ijcnn...
PDF
How does it feel to be a SpaceMaster? [Erasmus Mundus - ACE Talk]
PDF
Appetizer Talk Slides
PDF
Towards Autonomous and Adaptive Humanoids [PhD Proposal @ Università della Sv...
PDF
ALife in Humanoid Robots #ecal2013
PDF
Artificial Neural Networks For Spatial Perception: Towards Visual Object Loca...
PDF
Humanoid Learns to Detect Its Own Hands #cec2013
PDF
Autonomous Learning of Robust Visual Object Detection & Identification on a H...
PDF
An Integrated, Modular Framework for Computer Vision & Cognitive Robotics Res...
Cartman, how to win the amazon robotics challenge with robotic vision and dee...
ACRV Picking Benchmark: how to benchmark pick and place robotics research
ACRV : Robotic Vision presentation in Lisbon at IST
The Australian Centre for Robotic Vision (ACRV)
How to place 6th in the Amazon Picking Challenge (ENB329, QUT)
LunaRoo: Designing a Hopping Lunar Science Payload #space #exploration
Robotic Vision - Vision for Robotics #IEEE #QLD #CIS #Colloquium
ROS Hands-On Intro/Tutorial (Robotic Vision Summer School 2015) #RVSS #ACRV
ACRV Research Fellow Intro/Tutorial [Vision and Action]
Reactive Reaching and Grasping on a Humanoid: Towards Closing the Action-Perc...
Tele-operation of a Humanoid Robot, Using Operator Bio-data
Improving Robot Vision Models for Object Detection Through Interaction #ijcnn...
How does it feel to be a SpaceMaster? [Erasmus Mundus - ACE Talk]
Appetizer Talk Slides
Towards Autonomous and Adaptive Humanoids [PhD Proposal @ Università della Sv...
ALife in Humanoid Robots #ecal2013
Artificial Neural Networks For Spatial Perception: Towards Visual Object Loca...
Humanoid Learns to Detect Its Own Hands #cec2013
Autonomous Learning of Robust Visual Object Detection & Identification on a H...
An Integrated, Modular Framework for Computer Vision & Cognitive Robotics Res...
Ad

Recently uploaded (20)

PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPT
protein biochemistry.ppt for university classes
PPTX
Overview of calcium in human muscles.pptx
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Pharmacology of Autonomic nervous system
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PDF
The scientific heritage No 166 (166) (2025)
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
famous lake in india and its disturibution and importance
PPTX
BIOMOLECULES PPT........................
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
POSITIONING IN OPERATION THEATRE ROOM.ppt
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Phytochemical Investigation of Miliusa longipes.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
protein biochemistry.ppt for university classes
Overview of calcium in human muscles.pptx
Microbiology with diagram medical studies .pptx
Pharmacology of Autonomic nervous system
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
The scientific heritage No 166 (166) (2025)
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
lecture 2026 of Sjogren's syndrome l .pdf
famous lake in india and its disturibution and importance
BIOMOLECULES PPT........................
. Radiology Case Scenariosssssssssssssss
TOTAL hIP ARTHROPLASTY Presentation.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
Introduction to Cardiovascular system_structure and functions-1

(deep) reinforcement learning - CAB420