SlideShare a Scribd company logo
An Introduction to
Reinforcement Learning
Jie-Han Chen
NetDB, National Cheng Kung University
3/27, 2018 @ National Cheng Kung University, Taiwan
1
The content in this lecture were borrowed from:
1. Rich Sutton’s textbook
2. David Silver’s Reinforcement Learning class in UCL
3. Sergey Levine’s Deep Reinforcement Learning class in UCB
2
Disclamier
Syllabus
● Introduction to Reinforcement Learning
● Markov Decision Process
● Dynamic Programming
● Monte Carlo method
● Temporal Difference method
● Deep Reinforcement Learning
● Policy Gradient
● Hierarchical Reinforcement Learning and Multiagent Reinforcement Learning
● Active Research Issue
3
Resources
Textbooks:
● Reinforcement Learning: An Introduction, Sutton and Barto
● Algorithms for Reinforcement Learning, Szepesvari
Course:
● CS 294 Deep Reinforcement Learning, Berkeley
● David Silver’s Reinforcement Learning course, UCL
● CMU 10703 Deep Reinforcement Learning and Control, CMU
● Shan-Hung Wu’s Deep Learning course in NTHU
All of them are our reference materials in this lecture.
4
Outline
● Syllabus
● Introduction
● Elements of reinforcement learning and its objective
● History of RL
● Applications
● The challenge and active research fields in RL
● Research institute and notable researchers
5
Machine Learning
From David Silver’s RL course 6
Introduction to Reinforcement Learning
Reinforcement learning is a learning framework different from supervised learning
and unsupervised learning.
It is composed of series of perception and interaction between agent and
environment.
From Sutton’s book 7
Agent and Environment
At each step t the agent:
● Receives scalar reward Rt
● Receives observaiotn Ot
● Executes action At
The environment:
● Receives action At
● Emits observation Ot+1
● Emits scalar reward Rt+1
8
Introduction to Reinforcement Learning
Reinforcement Learning is often used to solve sequential decision problem.
● Goal: select actions to maximize total future reward
● Action may have long term consequences
● Reward may be delayed
● It may be better to sacrifice immediate reward to gain more long-term reward
● Eg:
○ A financial investiment
○ Chess game
9
Supervised Learning & Unsupervised Learning
The input data are independent (i.i.d).
Current output will not affect the next
input.
10
Reinforcement Learning
The agent’s action do affect the data
received in the future.
Figure from Wikipedia, made by waldoalvarez11
Introduction to Reinforcement Learning
● In reinforcement learning the
agent learns from trial and error.
● The better experience make the
agent learn better policy.
● What kind of experience is
better?
The image is from :
http://www.homemeeting.us/franktmc/maze_2.jpg
12
Elements of reinforcement learning
● Policy
● Reward signal
● Value function
● Model of environment (optional)
13
Elements of reinforcement learning - policy
Policy
● Define the learning agents’ way of behaving at a given time. Could be a
simple function or lookup table or search process
● Often denoted by
● Could be deterministic or stochastic
14
Elements of reinforcement learning - policy
If you are Russell Westbrook, and now
is defended by James Harden. With
this situation, you have 3 choices:
● Cut
● Shoot
● Pass
15
Stochastic policy
Probability
Action
16
Deterministic policy
Probability
Action
17
Policies - Action space
In reinforcement learning, we can categorize the problem by the action space into
2 types.
● Discrete action space
● Continuous action space
In previous example, the decision or the action are in discrete space, but there are
many example of continuous control, eg: robotic arm. The stochastic policy of
continuous control problem would like a probability density function.
18
Elements of reinforcement learning - reward
Reward: r / Rt
● Defines the goal in a reinforcement learning problem
● Indicates how well agent is doing at step t
● Immediately percepted from the environment
19
Elements of reinforcement learning - reward
+2
0 or -0.2?
20
Elements of reinforcement learning - reward
In chess or Go, the reward is defined
by its outcome.
● Win: +1
● Draw: 0
● Lose: -1
In most steps, we don’t receive any
reward(value = 0). It’s a kind of sparse
reward problem.
21
Elements of reinforcement learning - reward
If we want to reach the goal by less
steps, we often define the reward to
-1 when you take a step.
22
Elements of reinforcement learning - value function
Value function
● Indicates which decision is good in the long run.
● There are two forms:
○ state-value function
○ action-value function
● Unlike reward, value function is an estmated value.
23
Elements of reinforcement learning - value function
The game comes to 99 vs 98(our) and just
left 5 seconds to the end of the game.
Now, If you need to throw in in midfield,
which one would you pass the ball to?
1. 櫻木花道
2. 三井壽
24
Elements of reinforcement learning - model
Model of environments (optional)
● Use something to mimic the behavior of the environment.
● Allow inferences to be made about how the environment will behave.
(planning)
● Methods for solving reinforcement learning problems that use models for
planning are called model-based methods. The opposites are model-free
methods.
25
Elements of reinforcement learning - model
Interaction, inferences
Learn the model
The image is from David Silver’s RL course 26
Just like ...
27
Elements of reinforcement learning - model
28
Elements of reinforcement learning - model
29
Elements of reinforcement learning
● Policy
● Reward signal
● Value function
● Model of environment (optional)
30
The objective of reinforcement learning
Reinforcement learning is a framework
of goal directed learning.
The objective of reinforcement learning
is to maximize accumulative rewards in
each task.
The image is from:
https://www.wikijob.co.uk/content/interview-advice/competencies/decision-making31
History of Reinforcement Learning
Reinforcement Learning is inspired by two domain knowledge
● Optimal control
● Biological learning system: Animal learning
32
Optimal control
It is a mathematical optimization method for deriving control policies
especially under certain constraints.
The optimization method is largely due to the work of Lev Pontryagin and
Richard Bellman in the 1950s.
33
Richard Bellman
Richard Bellman was an applied
mathematician, who introduced dynamic
programming in 1953.
Work:
● Bellman Equation
● Curse of dimensionality
● Bellman-Ford algorithm
34
Animal Learning
● Teach dog - positive reward
35
Animal Learning
● Teach dog - penalty (negative reward)
36
Some question about RL
● Why do we need to learn Reinforcement Learning?
● What make Reinforcement Learning spring up like mushrooms?
37
Backgammon (IBM, 1992)
Temporal difference learning and TD-Gammon, by
Gerald Tesauro, 1992
Gammon is 雙陸棋 in Chinese.
source: from wikipedia
38
Autonomous Helicopter (Stanford, 2000)
The aerobatics fo helicopter has been studied from 2000 by Andrew Ng and
Pieter Abbeel in Stanford.
You can see more details on: http://heli.stanford.edu/39
Deep reinforcement learning in Atari game (2013)
Deep Q Network: proposed by V Mnih et al. It’s the first reinforcement learning
end-to-end model to combine deep learning with raw inputs.
40
Deep reinforcement learning in Atari game (2013)
41
Deep Reinforcement Learning for Robotic Manipulation
42
AlphaGo (DeepMind, 2016)
43
AlphaGo (DeepMind, 2016)
AlphaGo: David Silver, Aja Huang et al., use Monte Carlo Tree search (MCTS) and
deep reinforcement learning (policy gradient) to master the game of Go.
44
AlphaGo Zero (DeepMind, 2017)
AlphaGo Zero: David Silver et al., use MCTS and policy iteration with ResNet with
2-head architecture to learn from scratch without human knowledge.
45
46
AlphaGo Zero (DeepMind, 2017)
Dota2 (OpenAI, 2017)
● Beats the world’s top professionals at 1v1 matches
● The bot learned from scratch by self-play
47
Dota2 (OpenAI, 2017)
48
Dota2 (OpenAI, 2017)
49
Alibaba (Starcraft1, multiagent)
50
Deep RL for Dialogue Generation (Li et al., 2016)
● RL agent generates more interactive responses
● RL agent tends to end a sentence with a question and hand the conversation
over to the user
● Next step: explore intrinsic rewards, large-scale training
From the slides on http://opendialogue.miulab.tw51
The Challenge of reinforcement learning
● Sparse reward issue
● Reward credit assignment
● Large space for exploration (trial-and-error)
● Imperfect information, partial observation
52
Active research domain
● Multiagent reinforcement learning
● Hierarchical reinforcement learning
● Inverse reinforcement learning
● Multi-task Transfer learning in reinforcement learning
● Meta learning
● One-shot reinforcement learning
● Deep reinforcement learning in dialogue generation
53
Research institute and notable researchers
54
The research scientists in RL you must know!
● Richard S. Sutton
● David Silver
● Pieter Abbeel
● Sergey Levine
55
Richard S. Sutton
● The founding father of reinforcement
learning
● Professor of Computer Science at University
of Alberta
● Temporal difference learning
● Dyna architecture
56
David Silver
● The research scientist in DeepMind
● Lead researcher on AlphaGo and AlphaGo
Zero team
● Supervised by Sutton in Ph.D
● A professor in University College London
before
57
Pieter Abbeel
● Professor in UC Berkeley
● Director of the UC Berkeley Robot Learning Lab
● Research scientist and advisor in OpenAI
58
Sergey Levine
● Assistant Professor in UC Berkeley
● Research scientist in Google Brain
● Autonomous robots
59
Question?
60

More Related Content

PPTX
Reinforcement learning
PPT
Reinforcement learning
PDF
Introduction to Transformers for NLP - Olga Petrova
PPTX
6G mobile technology
PPTX
Bsides 2019 - Intelligent Threat Hunting
PDF
Generative AI
PPTX
Image segmentation in Digital Image Processing
PPT
Digital logic design part1
Reinforcement learning
Reinforcement learning
Introduction to Transformers for NLP - Olga Petrova
6G mobile technology
Bsides 2019 - Intelligent Threat Hunting
Generative AI
Image segmentation in Digital Image Processing
Digital logic design part1

What's hot (20)

PPTX
Reinforcement Learning : A Beginners Tutorial
PPTX
Deep Reinforcement Learning
PDF
Markov decision process
PPTX
Reinforcement Learning
PPTX
Intro to Deep Reinforcement Learning
PPTX
An introduction to reinforcement learning
PDF
An introduction to deep reinforcement learning
PDF
Deep Reinforcement Learning
PPTX
Reinforcement Learning
PDF
Reinforcement learning
PPT
Reinforcement Learning Q-Learning
PPT
Reinforcement learning 7313
PPTX
Reinforcement Learning
PDF
Reinforcement learning, Q-Learning
PDF
Lecture 9 Markov decision process
PDF
Deep reinforcement learning
PPTX
Deep sarsa, Deep Q-learning, DQN
PDF
Introduction of Deep Reinforcement Learning
PDF
Reinforcement Learning
PDF
Actor critic algorithm
Reinforcement Learning : A Beginners Tutorial
Deep Reinforcement Learning
Markov decision process
Reinforcement Learning
Intro to Deep Reinforcement Learning
An introduction to reinforcement learning
An introduction to deep reinforcement learning
Deep Reinforcement Learning
Reinforcement Learning
Reinforcement learning
Reinforcement Learning Q-Learning
Reinforcement learning 7313
Reinforcement Learning
Reinforcement learning, Q-Learning
Lecture 9 Markov decision process
Deep reinforcement learning
Deep sarsa, Deep Q-learning, DQN
Introduction of Deep Reinforcement Learning
Reinforcement Learning
Actor critic algorithm
Ad

Similar to An introduction to reinforcement learning (20)

PPTX
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
PDF
Frontier in reinforcement learning
PDF
Reinforcement learning
PDF
Human-level Control Through Deep Reinforcement Learning (Presentation)
PDF
DRL 1 Course Introduction Reinforcement.ppt
PDF
Sequential Decision Making in Recommendations
PDF
Learning To Run
PPT
acai01-updated.ppt
PDF
Teacher-Aware Active Robot Learning
PDF
Deep reinforcement learning from scratch
PDF
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
PDF
Reinforcement learning in a nutshell
PPTX
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
PDF
Shanghai deep learning meetup 4
PDF
reinforcement-learning-141009013546-conversion-gate02.pdf
PPTX
reinforcement-learning-141009013546-conversion-gate02.pptx
PPTX
Machine Learning in Unity - How to give your game AI a real brain
PDF
Unit5: Learning
PPTX
An efficient use of temporal difference technique in Computer Game Learning
PDF
Introduction to reinforcement learning
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Frontier in reinforcement learning
Reinforcement learning
Human-level Control Through Deep Reinforcement Learning (Presentation)
DRL 1 Course Introduction Reinforcement.ppt
Sequential Decision Making in Recommendations
Learning To Run
acai01-updated.ppt
Teacher-Aware Active Robot Learning
Deep reinforcement learning from scratch
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement learning in a nutshell
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
Shanghai deep learning meetup 4
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pptx
Machine Learning in Unity - How to give your game AI a real brain
Unit5: Learning
An efficient use of temporal difference technique in Computer Game Learning
Introduction to reinforcement learning
Ad

More from Jie-Han Chen (8)

PDF
Temporal difference learning
PDF
Policy gradient
PDF
Temporal difference learning
PDF
Multi armed bandit
PDF
Discrete sequential prediction of continuous actions for deep RL
PDF
BiCNet presentation (multi-agent reinforcement learning)
PDF
Data science-toolchain
PDF
The artofreadablecode
Temporal difference learning
Policy gradient
Temporal difference learning
Multi armed bandit
Discrete sequential prediction of continuous actions for deep RL
BiCNet presentation (multi-agent reinforcement learning)
Data science-toolchain
The artofreadablecode

Recently uploaded (20)

PDF
The scientific heritage No 166 (166) (2025)
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
famous lake in india and its disturibution and importance
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
2. Earth - The Living Planet earth and life
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
. Radiology Case Scenariosssssssssssssss
PPT
protein biochemistry.ppt for university classes
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
An interstellar mission to test astrophysical black holes
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Placing the Near-Earth Object Impact Probability in Context
The scientific heritage No 166 (166) (2025)
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Comparative Structure of Integument in Vertebrates.pptx
famous lake in india and its disturibution and importance
Viruses (History, structure and composition, classification, Bacteriophage Re...
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
2. Earth - The Living Planet earth and life
INTRODUCTION TO EVS | Concept of sustainability
. Radiology Case Scenariosssssssssssssss
protein biochemistry.ppt for university classes
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
An interstellar mission to test astrophysical black holes
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
AlphaEarth Foundations and the Satellite Embedding dataset
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
bbec55_b34400a7914c42429908233dbd381773.pdf
Biophysics 2.pdffffffffffffffffffffffffff
Placing the Near-Earth Object Impact Probability in Context

An introduction to reinforcement learning

  • 1. An Introduction to Reinforcement Learning Jie-Han Chen NetDB, National Cheng Kung University 3/27, 2018 @ National Cheng Kung University, Taiwan 1
  • 2. The content in this lecture were borrowed from: 1. Rich Sutton’s textbook 2. David Silver’s Reinforcement Learning class in UCL 3. Sergey Levine’s Deep Reinforcement Learning class in UCB 2 Disclamier
  • 3. Syllabus ● Introduction to Reinforcement Learning ● Markov Decision Process ● Dynamic Programming ● Monte Carlo method ● Temporal Difference method ● Deep Reinforcement Learning ● Policy Gradient ● Hierarchical Reinforcement Learning and Multiagent Reinforcement Learning ● Active Research Issue 3
  • 4. Resources Textbooks: ● Reinforcement Learning: An Introduction, Sutton and Barto ● Algorithms for Reinforcement Learning, Szepesvari Course: ● CS 294 Deep Reinforcement Learning, Berkeley ● David Silver’s Reinforcement Learning course, UCL ● CMU 10703 Deep Reinforcement Learning and Control, CMU ● Shan-Hung Wu’s Deep Learning course in NTHU All of them are our reference materials in this lecture. 4
  • 5. Outline ● Syllabus ● Introduction ● Elements of reinforcement learning and its objective ● History of RL ● Applications ● The challenge and active research fields in RL ● Research institute and notable researchers 5
  • 6. Machine Learning From David Silver’s RL course 6
  • 7. Introduction to Reinforcement Learning Reinforcement learning is a learning framework different from supervised learning and unsupervised learning. It is composed of series of perception and interaction between agent and environment. From Sutton’s book 7
  • 8. Agent and Environment At each step t the agent: ● Receives scalar reward Rt ● Receives observaiotn Ot ● Executes action At The environment: ● Receives action At ● Emits observation Ot+1 ● Emits scalar reward Rt+1 8
  • 9. Introduction to Reinforcement Learning Reinforcement Learning is often used to solve sequential decision problem. ● Goal: select actions to maximize total future reward ● Action may have long term consequences ● Reward may be delayed ● It may be better to sacrifice immediate reward to gain more long-term reward ● Eg: ○ A financial investiment ○ Chess game 9
  • 10. Supervised Learning & Unsupervised Learning The input data are independent (i.i.d). Current output will not affect the next input. 10
  • 11. Reinforcement Learning The agent’s action do affect the data received in the future. Figure from Wikipedia, made by waldoalvarez11
  • 12. Introduction to Reinforcement Learning ● In reinforcement learning the agent learns from trial and error. ● The better experience make the agent learn better policy. ● What kind of experience is better? The image is from : http://www.homemeeting.us/franktmc/maze_2.jpg 12
  • 13. Elements of reinforcement learning ● Policy ● Reward signal ● Value function ● Model of environment (optional) 13
  • 14. Elements of reinforcement learning - policy Policy ● Define the learning agents’ way of behaving at a given time. Could be a simple function or lookup table or search process ● Often denoted by ● Could be deterministic or stochastic 14
  • 15. Elements of reinforcement learning - policy If you are Russell Westbrook, and now is defended by James Harden. With this situation, you have 3 choices: ● Cut ● Shoot ● Pass 15
  • 18. Policies - Action space In reinforcement learning, we can categorize the problem by the action space into 2 types. ● Discrete action space ● Continuous action space In previous example, the decision or the action are in discrete space, but there are many example of continuous control, eg: robotic arm. The stochastic policy of continuous control problem would like a probability density function. 18
  • 19. Elements of reinforcement learning - reward Reward: r / Rt ● Defines the goal in a reinforcement learning problem ● Indicates how well agent is doing at step t ● Immediately percepted from the environment 19
  • 20. Elements of reinforcement learning - reward +2 0 or -0.2? 20
  • 21. Elements of reinforcement learning - reward In chess or Go, the reward is defined by its outcome. ● Win: +1 ● Draw: 0 ● Lose: -1 In most steps, we don’t receive any reward(value = 0). It’s a kind of sparse reward problem. 21
  • 22. Elements of reinforcement learning - reward If we want to reach the goal by less steps, we often define the reward to -1 when you take a step. 22
  • 23. Elements of reinforcement learning - value function Value function ● Indicates which decision is good in the long run. ● There are two forms: ○ state-value function ○ action-value function ● Unlike reward, value function is an estmated value. 23
  • 24. Elements of reinforcement learning - value function The game comes to 99 vs 98(our) and just left 5 seconds to the end of the game. Now, If you need to throw in in midfield, which one would you pass the ball to? 1. 櫻木花道 2. 三井壽 24
  • 25. Elements of reinforcement learning - model Model of environments (optional) ● Use something to mimic the behavior of the environment. ● Allow inferences to be made about how the environment will behave. (planning) ● Methods for solving reinforcement learning problems that use models for planning are called model-based methods. The opposites are model-free methods. 25
  • 26. Elements of reinforcement learning - model Interaction, inferences Learn the model The image is from David Silver’s RL course 26
  • 28. Elements of reinforcement learning - model 28
  • 29. Elements of reinforcement learning - model 29
  • 30. Elements of reinforcement learning ● Policy ● Reward signal ● Value function ● Model of environment (optional) 30
  • 31. The objective of reinforcement learning Reinforcement learning is a framework of goal directed learning. The objective of reinforcement learning is to maximize accumulative rewards in each task. The image is from: https://www.wikijob.co.uk/content/interview-advice/competencies/decision-making31
  • 32. History of Reinforcement Learning Reinforcement Learning is inspired by two domain knowledge ● Optimal control ● Biological learning system: Animal learning 32
  • 33. Optimal control It is a mathematical optimization method for deriving control policies especially under certain constraints. The optimization method is largely due to the work of Lev Pontryagin and Richard Bellman in the 1950s. 33
  • 34. Richard Bellman Richard Bellman was an applied mathematician, who introduced dynamic programming in 1953. Work: ● Bellman Equation ● Curse of dimensionality ● Bellman-Ford algorithm 34
  • 35. Animal Learning ● Teach dog - positive reward 35
  • 36. Animal Learning ● Teach dog - penalty (negative reward) 36
  • 37. Some question about RL ● Why do we need to learn Reinforcement Learning? ● What make Reinforcement Learning spring up like mushrooms? 37
  • 38. Backgammon (IBM, 1992) Temporal difference learning and TD-Gammon, by Gerald Tesauro, 1992 Gammon is 雙陸棋 in Chinese. source: from wikipedia 38
  • 39. Autonomous Helicopter (Stanford, 2000) The aerobatics fo helicopter has been studied from 2000 by Andrew Ng and Pieter Abbeel in Stanford. You can see more details on: http://heli.stanford.edu/39
  • 40. Deep reinforcement learning in Atari game (2013) Deep Q Network: proposed by V Mnih et al. It’s the first reinforcement learning end-to-end model to combine deep learning with raw inputs. 40
  • 41. Deep reinforcement learning in Atari game (2013) 41
  • 42. Deep Reinforcement Learning for Robotic Manipulation 42
  • 44. AlphaGo (DeepMind, 2016) AlphaGo: David Silver, Aja Huang et al., use Monte Carlo Tree search (MCTS) and deep reinforcement learning (policy gradient) to master the game of Go. 44
  • 45. AlphaGo Zero (DeepMind, 2017) AlphaGo Zero: David Silver et al., use MCTS and policy iteration with ResNet with 2-head architecture to learn from scratch without human knowledge. 45
  • 47. Dota2 (OpenAI, 2017) ● Beats the world’s top professionals at 1v1 matches ● The bot learned from scratch by self-play 47
  • 51. Deep RL for Dialogue Generation (Li et al., 2016) ● RL agent generates more interactive responses ● RL agent tends to end a sentence with a question and hand the conversation over to the user ● Next step: explore intrinsic rewards, large-scale training From the slides on http://opendialogue.miulab.tw51
  • 52. The Challenge of reinforcement learning ● Sparse reward issue ● Reward credit assignment ● Large space for exploration (trial-and-error) ● Imperfect information, partial observation 52
  • 53. Active research domain ● Multiagent reinforcement learning ● Hierarchical reinforcement learning ● Inverse reinforcement learning ● Multi-task Transfer learning in reinforcement learning ● Meta learning ● One-shot reinforcement learning ● Deep reinforcement learning in dialogue generation 53
  • 54. Research institute and notable researchers 54
  • 55. The research scientists in RL you must know! ● Richard S. Sutton ● David Silver ● Pieter Abbeel ● Sergey Levine 55
  • 56. Richard S. Sutton ● The founding father of reinforcement learning ● Professor of Computer Science at University of Alberta ● Temporal difference learning ● Dyna architecture 56
  • 57. David Silver ● The research scientist in DeepMind ● Lead researcher on AlphaGo and AlphaGo Zero team ● Supervised by Sutton in Ph.D ● A professor in University College London before 57
  • 58. Pieter Abbeel ● Professor in UC Berkeley ● Director of the UC Berkeley Robot Learning Lab ● Research scientist and advisor in OpenAI 58
  • 59. Sergey Levine ● Assistant Professor in UC Berkeley ● Research scientist in Google Brain ● Autonomous robots 59