SlideShare a Scribd company logo
Reinforcement Learning
BY:
DR. SUKHNANDAN KAUR
ASSISTANT PROFESSOR ,
CSED, TIET
Reinforcement Learning
How can an agent learn behaviors when it doesn’t have a
supervisor to tell it how to perform?
This problem is called reinforcement learning.
Rewards:
Positive / Negative
Reinforcement Learning (cont.)
The goal is to get the agent to act in the world so as
to maximize its rewards.
The agent has to figure out what it did that made it
get the reward/punishment
Components of Reinforcement Learning
1. Agent
2. Environment
3.Rewards
4. State
5. Action(s)
Q Learning
AI can directly drive an optimal policy from its environment
without needing to create a model beforehand.
Q learning is model free learning technique that can be
used to find the optimal action selection policy using Q
function.
Q function gives the largest expected return achievable by
any policy π for each possible state action pair.
Reinforcement Learning
In reinforcement learning we want to obtain function Q(S, A) that predicts
best action A in a state S to get maximum cumulative reward.
cumulative reward 1 = Q(s1, a1) + Q(s2, a1) = -1 + 0 = -1
cumulative reward 2 = Q(s1, a2) + Q(s2, a2) = 1 + 0.5 = 1.5
cumulative reward 3 = Q(s1, a2) + Q(s2, a1) = 1 + 0 = 1
cumulative reward 4 = Q(s1, a1) + Q(s2, a2) = -1 + 0.5 = -0.5
maximum cumulative reward = max(cumulative reward 1, cumulative reward 2, cumulative
reward 3, cumulative reward 4) = 1.5
a1 a2
s1 -1 1
s2 0 0.5
How Q learning Works?
Initialize Q
Choose Action from Q
Calculate Reward
Take action
Update Q
Example
USER
Go to house
Retrieve
Flower
State =1
Q(State = 1, Action = “Retrieve Flower”) = 0.5
Q(State = 1, Action = “Go to house”) = 3.0
Accessible or
observable state
Repeat:
 s  sensed state
 If s is terminal then exit
 a  choose action (given s)
 Perform a
Reactive Agent Algorithm
Repeat:
 s  sensed state
 If s is terminal then exit
 a  P(s) /* Choose action using policy
 Perform a
Reactive Agent Algorithm using
Reinforcement Learning
Approaches
 Learn policy directly– function mapping from states to actions
Q(S, A)
Where, Q = {s1,s2,s3,s4} and A = {a1,a2,a3}
 Learn utility values for states (i.e., the value function)
If the outcome of performing an action at a state is deterministic, then the
agent can update the utility value U() of states:
◦ U(new state) = reward + U(old state)
Exploration / Exploitation policy
Wacky approach (exploration): act randomly in
hopes of eventually exploring entire environment
Greedy approach (exploitation): act to maximize
utility using current estimate
Reasonable balance: act more wacky (exploratory)
when agent has little idea of environment; more
greedy when the model is close to correct path.
Summary
Active area of research.
Reinforcement learning is applicable to game-playing, robot controllers,
others

More Related Content

PPT
RL.ppt
PDF
Reinforcement learning Russell and Norvig CMSC
PPTX
Deep einforcement learning
PPTX
CS3013 -MACHINE LEARNING.pptx
PPTX
q_learning in machine learning with problem
PDF
DRL 1 Course Introduction Reinforcement.ppt
PPTX
An efficient use of temporal difference technique in Computer Game Learning
PPT
Reinforcement learning presentation1.ppt
RL.ppt
Reinforcement learning Russell and Norvig CMSC
Deep einforcement learning
CS3013 -MACHINE LEARNING.pptx
q_learning in machine learning with problem
DRL 1 Course Introduction Reinforcement.ppt
An efficient use of temporal difference technique in Computer Game Learning
Reinforcement learning presentation1.ppt

Similar to Q_Learning.ppt (20)

PPT
reiniforcement learning.ppt
PDF
Reinforcement learning
PPT
Reinforcement learning 7313
PPT
Reinforcement learning
PPT
YijueRL.ppt
PPT
RL_online _presentation_1.ppt
PPTX
Reinforcement Learning
PDF
Head First Reinforcement Learning
PPT
Reinforcement Learning.ppt
PPT
Lecture notes
PDF
CS799_FinalReport
PDF
Reinforcement learning, Q-Learning
PDF
reinforcement-learning-141009013546-conversion-gate02.pdf
PPTX
semi supervised Learning and Reinforcement learning (1).pptx
PDF
Aaa ped-24- Reinforcement Learning
PDF
5th Module_Machine Learning_Reinforc.pdf
PPTX
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
PPTX
14_ReinforcementLearning.pptx
PPTX
Reinforcement learning
PDF
Reinforcement Learning using OpenAI Gym
reiniforcement learning.ppt
Reinforcement learning
Reinforcement learning 7313
Reinforcement learning
YijueRL.ppt
RL_online _presentation_1.ppt
Reinforcement Learning
Head First Reinforcement Learning
Reinforcement Learning.ppt
Lecture notes
CS799_FinalReport
Reinforcement learning, Q-Learning
reinforcement-learning-141009013546-conversion-gate02.pdf
semi supervised Learning and Reinforcement learning (1).pptx
Aaa ped-24- Reinforcement Learning
5th Module_Machine Learning_Reinforc.pdf
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
14_ReinforcementLearning.pptx
Reinforcement learning
Reinforcement Learning using OpenAI Gym
Ad

Recently uploaded (20)

PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
PPTX
Principles of Marketing, Industrial, Consumers,
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
PPTX
sales presentation، Training Overview.pptx
PPTX
Slide gioi thieu VietinBank Quy 2 - 2025
PDF
Cours de Système d'information about ERP.pdf
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
PPTX
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
PPTX
TRAINNING, DEVELOPMENT AND APPRAISAL.pptx
PDF
Solaris Resources Presentation - Corporate August 2025.pdf
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
PDF
Blood Collected straight from the donor into a blood bag and mixed with an an...
PDF
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Booking.com The Global AI Sentiment Report 2025
PDF
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
PPTX
Astra-Investor- business Presentation (1).pptx
PDF
How to Get Approval for Business Funding
PDF
Deliverable file - Regulatory guideline analysis.pdf
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
Principles of Marketing, Industrial, Consumers,
Ôn tập tiếng anh trong kinh doanh nâng cao
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
sales presentation، Training Overview.pptx
Slide gioi thieu VietinBank Quy 2 - 2025
Cours de Système d'information about ERP.pdf
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
TRAINNING, DEVELOPMENT AND APPRAISAL.pptx
Solaris Resources Presentation - Corporate August 2025.pdf
Digital Marketing & E-commerce Certificate Glossary.pdf.................
Blood Collected straight from the donor into a blood bag and mixed with an an...
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Booking.com The Global AI Sentiment Report 2025
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
Astra-Investor- business Presentation (1).pptx
How to Get Approval for Business Funding
Deliverable file - Regulatory guideline analysis.pdf
Ad

Q_Learning.ppt

  • 1. Reinforcement Learning BY: DR. SUKHNANDAN KAUR ASSISTANT PROFESSOR , CSED, TIET
  • 2. Reinforcement Learning How can an agent learn behaviors when it doesn’t have a supervisor to tell it how to perform? This problem is called reinforcement learning. Rewards: Positive / Negative
  • 3. Reinforcement Learning (cont.) The goal is to get the agent to act in the world so as to maximize its rewards. The agent has to figure out what it did that made it get the reward/punishment
  • 4. Components of Reinforcement Learning 1. Agent 2. Environment 3.Rewards 4. State 5. Action(s)
  • 5. Q Learning AI can directly drive an optimal policy from its environment without needing to create a model beforehand. Q learning is model free learning technique that can be used to find the optimal action selection policy using Q function. Q function gives the largest expected return achievable by any policy π for each possible state action pair.
  • 6. Reinforcement Learning In reinforcement learning we want to obtain function Q(S, A) that predicts best action A in a state S to get maximum cumulative reward. cumulative reward 1 = Q(s1, a1) + Q(s2, a1) = -1 + 0 = -1 cumulative reward 2 = Q(s1, a2) + Q(s2, a2) = 1 + 0.5 = 1.5 cumulative reward 3 = Q(s1, a2) + Q(s2, a1) = 1 + 0 = 1 cumulative reward 4 = Q(s1, a1) + Q(s2, a2) = -1 + 0.5 = -0.5 maximum cumulative reward = max(cumulative reward 1, cumulative reward 2, cumulative reward 3, cumulative reward 4) = 1.5 a1 a2 s1 -1 1 s2 0 0.5
  • 7. How Q learning Works? Initialize Q Choose Action from Q Calculate Reward Take action Update Q
  • 8. Example USER Go to house Retrieve Flower State =1 Q(State = 1, Action = “Retrieve Flower”) = 0.5 Q(State = 1, Action = “Go to house”) = 3.0
  • 9. Accessible or observable state Repeat:  s  sensed state  If s is terminal then exit  a  choose action (given s)  Perform a Reactive Agent Algorithm
  • 10. Repeat:  s  sensed state  If s is terminal then exit  a  P(s) /* Choose action using policy  Perform a Reactive Agent Algorithm using Reinforcement Learning
  • 11. Approaches  Learn policy directly– function mapping from states to actions Q(S, A) Where, Q = {s1,s2,s3,s4} and A = {a1,a2,a3}  Learn utility values for states (i.e., the value function) If the outcome of performing an action at a state is deterministic, then the agent can update the utility value U() of states: ◦ U(new state) = reward + U(old state)
  • 12. Exploration / Exploitation policy Wacky approach (exploration): act randomly in hopes of eventually exploring entire environment Greedy approach (exploitation): act to maximize utility using current estimate Reasonable balance: act more wacky (exploratory) when agent has little idea of environment; more greedy when the model is close to correct path.
  • 13. Summary Active area of research. Reinforcement learning is applicable to game-playing, robot controllers, others