SlideShare a Scribd company logo
2
Most read
4
Most read
Introduction to
Reinforcement Learning
Pramod R,
Senior Lead Data Scientist
Fidelity
Types of Machine Learning
Supervised Learning
Learn from labelled data - predict the
right label. Eg: Fraudulent transaction
classification, probability of a customer
to purchase a product given an online
Ad, etc.
Unsupervised Learning
No Labelled data - Instead, it relies of the
underlying pattern of data to find the
relationships between the data elements. Eg:
Marketing segmentation of customers based
on their demographic attributes, finding the
product associations, etc.
What is Reinforcement Learning
Modelled against a human brain, where we take an action, seek reward
for that action taken and determine what next action to take. Eg: A baby
learning to walk
There is no labelled data, nor do we find relationship between the data
points - We just seek the reward from every step and determine the
action based on the reward we get
The data is positioned in time sequence manner following this paradigm:
State→Action→Reward→State→Action
Applications of RL
Self Driving Cars
Online Ad Recommendations
Robotics
Chatbot
Medication on patients
Stock Trading
Online Education
Components of Reinforcement Learning - Markov Decision Process
● State St: Environmental Condition
● Agent: The model/Robot which learns about
the environment and decides the action
● Action At: Agent’s action based on some
condition
● Policy π: Mapping from State → Action
● Reward Rt: Feedback received for the action
The central idea of a reinforcement learning is to
maximize the expected cumulative reward
Markov Property
For a sequence - {q1, q2, q3, q4.. qn} -
P(qn|qn-1,qn-2,.. q1) = P(qn|qn-1)
Example: India’s chance of winning tomorrow’s match only depends on the last match that
India played
“The future is
independent of the
past given the
present”
- Markov
Basic working of Reinforcement Learning
● Action Space: Left, Right, Jump
● State: Position of Mario, position of the
enemy, places where the reward is, etc.
● Reward: Coins
● Discounted cumulative expected reward:
Types of Reinforcement Learning Algorithms
Multi Arm Bandits:
● Used in A/B testing of marketing
Ads, Actual Drug vs Placebo usage
in clinical trials, etc.
● Explore-Exploit Dilemma
● Epsilon Greedy
Types of Reinforcement Learning Algorithms
Temporal Differencing
Value of a state V(S): Tells us how
good it is to be at a state at a time t
Cumulative Discounted Reward:
Gt = Rt+1 + ℽRt+2 + ℽ2Rt+3 + ℽ3Rt+4…
TD(1):
V(S)t = V(S)t + ⍺ (Gt - V(S)t)
Learning Resources
● David Silver Reinforcement Learning Videos
● Sutton and Barto - Reinforcement Learning
● Prof. Ravindran Balaraman Videos on Reinforcement Learning
● Github repo: Awesome RL
● Deep RL Bootcamp Lectures

More Related Content

PDF
Reinforcement Learning on Mine Sweeper
PDF
DRL 1 Course Introduction Reinforcement.ppt
PDF
Rl chapter 1 introduction
PDF
reinforcement-learning-141009013546-conversion-gate02.pdf
PPTX
Introduction to Reinforcement Learning.pptx
PDF
Reinforcement learning Russell and Norvig CMSC
PPTX
An efficient use of temporal difference technique in Computer Game Learning
PPTX
CS3013 -MACHINE LEARNING.pptx
Reinforcement Learning on Mine Sweeper
DRL 1 Course Introduction Reinforcement.ppt
Rl chapter 1 introduction
reinforcement-learning-141009013546-conversion-gate02.pdf
Introduction to Reinforcement Learning.pptx
Reinforcement learning Russell and Norvig CMSC
An efficient use of temporal difference technique in Computer Game Learning
CS3013 -MACHINE LEARNING.pptx

Similar to Introduction to reinforcement learning (20)

PPTX
reinforcement-learning-141009013546-conversion-gate02.pptx
PDF
Reinforcement Learning.pdf
PPT
Reinforcement learning
PPT
RL.ppt
PPTX
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
PDF
Lectures_18_19_20_Monte_Carlo_Methods.pdf
PPTX
Deep Reinforcement Learning
PPTX
24.09.2021 Reinforcement Learning Algorithms.pptx
PPTX
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
PPTX
Online learning & adaptive game playing
PPT
reiniforcement learning.ppt
PPTX
R22 Machine learning jntuh UNIT- 5.pptx
PDF
Reinforcement Learning with Amazon SageMaker RL
PDF
Reinforcement Learning Guide For Beginners
PPT
Reinforcement learning presentation1.ppt
PDF
What is Reinforcement Learning.pdf
PDF
MachineLearning_Unit-I.pptx.pdtegfdxcdsfxf
PPT
Reinforcement Learning.ppt
PPTX
Reinforcement Learning
PPT
YijueRL.ppt
reinforcement-learning-141009013546-conversion-gate02.pptx
Reinforcement Learning.pdf
Reinforcement learning
RL.ppt
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
Lectures_18_19_20_Monte_Carlo_Methods.pdf
Deep Reinforcement Learning
24.09.2021 Reinforcement Learning Algorithms.pptx
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Online learning & adaptive game playing
reiniforcement learning.ppt
R22 Machine learning jntuh UNIT- 5.pptx
Reinforcement Learning with Amazon SageMaker RL
Reinforcement Learning Guide For Beginners
Reinforcement learning presentation1.ppt
What is Reinforcement Learning.pdf
MachineLearning_Unit-I.pptx.pdtegfdxcdsfxf
Reinforcement Learning.ppt
Reinforcement Learning
YijueRL.ppt
Ad

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Quality review (1)_presentation of this 21
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Lecture1 pattern recognition............
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Foundation of Data Science unit number two notes
STUDY DESIGN details- Lt Col Maksud (21).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Quality review (1)_presentation of this 21
IBA_Chapter_11_Slides_Final_Accessible.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Reliability_Chapter_ presentation 1221.5784
Clinical guidelines as a resource for EBP(1).pdf
Fluorescence-microscope_Botany_detailed content
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Data_Analytics_and_PowerBI_Presentation.pptx
Lecture1 pattern recognition............
ISS -ESG Data flows What is ESG and HowHow
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Supervised vs unsupervised machine learning algorithms
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Ad

Introduction to reinforcement learning

  • 1. Introduction to Reinforcement Learning Pramod R, Senior Lead Data Scientist Fidelity
  • 2. Types of Machine Learning Supervised Learning Learn from labelled data - predict the right label. Eg: Fraudulent transaction classification, probability of a customer to purchase a product given an online Ad, etc. Unsupervised Learning No Labelled data - Instead, it relies of the underlying pattern of data to find the relationships between the data elements. Eg: Marketing segmentation of customers based on their demographic attributes, finding the product associations, etc.
  • 3. What is Reinforcement Learning Modelled against a human brain, where we take an action, seek reward for that action taken and determine what next action to take. Eg: A baby learning to walk There is no labelled data, nor do we find relationship between the data points - We just seek the reward from every step and determine the action based on the reward we get The data is positioned in time sequence manner following this paradigm: State→Action→Reward→State→Action
  • 4. Applications of RL Self Driving Cars Online Ad Recommendations Robotics Chatbot Medication on patients Stock Trading Online Education
  • 5. Components of Reinforcement Learning - Markov Decision Process ● State St: Environmental Condition ● Agent: The model/Robot which learns about the environment and decides the action ● Action At: Agent’s action based on some condition ● Policy π: Mapping from State → Action ● Reward Rt: Feedback received for the action The central idea of a reinforcement learning is to maximize the expected cumulative reward
  • 6. Markov Property For a sequence - {q1, q2, q3, q4.. qn} - P(qn|qn-1,qn-2,.. q1) = P(qn|qn-1) Example: India’s chance of winning tomorrow’s match only depends on the last match that India played “The future is independent of the past given the present” - Markov
  • 7. Basic working of Reinforcement Learning ● Action Space: Left, Right, Jump ● State: Position of Mario, position of the enemy, places where the reward is, etc. ● Reward: Coins ● Discounted cumulative expected reward:
  • 8. Types of Reinforcement Learning Algorithms Multi Arm Bandits: ● Used in A/B testing of marketing Ads, Actual Drug vs Placebo usage in clinical trials, etc. ● Explore-Exploit Dilemma ● Epsilon Greedy
  • 9. Types of Reinforcement Learning Algorithms Temporal Differencing Value of a state V(S): Tells us how good it is to be at a state at a time t Cumulative Discounted Reward: Gt = Rt+1 + ℽRt+2 + ℽ2Rt+3 + ℽ3Rt+4… TD(1): V(S)t = V(S)t + ⍺ (Gt - V(S)t)
  • 10. Learning Resources ● David Silver Reinforcement Learning Videos ● Sutton and Barto - Reinforcement Learning ● Prof. Ravindran Balaraman Videos on Reinforcement Learning ● Github repo: Awesome RL ● Deep RL Bootcamp Lectures