SlideShare a Scribd company logo
10
Most read
14
Most read
16
Most read
Reinforcement Learning ⇒ Dynamic Programming ⇒
Markov Decision Process
Subject: Machine Learning
Dr. Varun Kumar
Subject: Machine Learning Dr. Varun Kumar Lecture 9 1 / 16
Outlines
1 Introduction to Reinforcement Learning
2 Application of Reinforcement Learning
3 Approach for Studying Reinforcement Learning
4 Basics of Dynamic Programming
5 Markov Decision Process:
6 References
Subject: Machine Learning Dr. Varun Kumar Lecture 9 2 / 16
Introduction to reinforcement learning:
Key Feature
1 There is no supervisor for performing the learning process.
2 In stead of supervisor, there is a critic that informs the end outcome.
3 If outcome is meaningful then the whole process is rewarded. On the
other side the whole process is penalized.
4 This learning process is based on reward and penalty.
5 Critic convert the primary reinforcement signal into heuristic
reinforcement signal.
6 Primary reinforcement signal → Signal observed from the
environment.
7 Heuristic reinforcement signal → Higher quality signal.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 3 / 16
Difference between critic and supervisor
Let a complex system has been described as follows
Note
⇒ Critic does not provide the step-by-step solution.
⇒ Critic does not provide any method, training data, suitable learning
system or logical operation for doing the necessary correction, if
output does reaches to the expected value.
⇒ It comment only the end output, whereas supervisor helps in many
ways.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 4 / 16
Block diagram of reinforcement learning
Block diagram
Subject: Machine Learning Dr. Varun Kumar Lecture 9 5 / 16
Aim of reinforcement learning
⇒ To minimize the cost-to-go function.
⇒ Cost-to-go function → Expectation of cumulative cost of action
taken over a sequence of steps instead of immediate cost.
⇒ Learning system : It discover several actions and feed them back to
the environment.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 6 / 16
Application of reinforcement learning
Major application area
♦ Game theory.
♦ Simulation based optimization.
♦ Operational research.
♦ Control theory.
♦ Swarm intelligence.
♦ Multi-agents system.
♦ Information theory.
Note :
⇒ Reinforcement learning is also called as approximate dynamic
programming.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 7 / 16
Approach for studying reinforcement learning
Classical approach: Learning takes place through a process of reward
and penalty with the goal of achieving highly skilled behavior.
Modern approach :
⇒ Based on mathematical framework, such as dynamic programming.
⇒ It decides on the course of action by considering possible future stages
without actually experiencing them.
⇒ It emphasis on planning.
⇒ It is a credit assignment problem.
⇒ Credit or blame is part of interacting decision.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 8 / 16
Dynamic programming
Basics
⇒ How can an agent/decision maker/learning system improves its
long term performance in a stochastic environment ?
⇒ Attaining long term improvised performance without disrupting the
short term performance.
Markov decision process (MDP)
Subject: Machine Learning Dr. Varun Kumar Lecture 9 9 / 16
Markov decision process (MDP):
Key features of MDP
♦ Environment is modeled through probabilistic framework. Some
known probability mass function (pmf) may be the basis for
modeling.
♦ It consists of a finite set of discrete states.
♦ Here states does not contain any past statistics.
♦ Through well defined pmf a set of discrete sample data is created.
♦ For each environmental state, there is a finite set of possible action
that may be taken by agent.
♦ Every time agent takes an action, a certain cost is incurred.
♦ States are observed, actions are taken and costs are incurred at
discrete times.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 10 / 16
Continued–
MDP works on the stochastic environment. It is nothing but a
random process.
Decision action is a time dependent random variable.
Mathematical description:
⇒ Si is the ith state at a sample instant n.
⇒ Sj is the next state at a sample instant n + 1
⇒ pij is known as the transition probability ∀ 1 ≤ i ≤ k and 1 ≤ j ≤ k
pij (Ai ) = P(Xn+1 = Sj |Xn = Si , An = Ai )
⇒ Ai ia ith action taken by an agent at a sample instant n.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 11 / 16
Markov chain rule
Markov chain rule
Markov chain rule is based on the partition theorem.
Statement of partition theorem: Let B1, ..., Bm form a partition of Ω,
then for any event A.
P(A) =
N
i=1
P(A ∩ Bi ) =
N
i=1
P(A|Bi )P(Bi )
Subject: Machine Learning Dr. Varun Kumar Lecture 9 12 / 16
Markov property
1 The basic property of a Markov chain is that only the most recent
point in the trajectory affects what happens next.
P(Xn+1|Xn, Xn−1, ....X0) = P(Xn+1|Xn)
2 Transition matrix or stochastic matrix:
P =





p11 p12 .... p1K
p21 p22 .... p2K
...
... ..........
pK1 pK2 ..... pKK





⇒ Sum of row is equal to unity → j pij = 1
⇒ p11 + p12 + ....p1K = 1 or but p11 + p21 + .... + pK1 = 1
Subject: Machine Learning Dr. Varun Kumar Lecture 9 13 / 16
Continued–
3 n-step transition probability:
Statement: Let X0, X1, X2, ... be a Markov chain with state space
S = 1, 2, ..., N. Recall that the elements of the transition matrix P
are defined as:
pij = P(X1 = j|X0 = i) = P(Xn+1 = j|Xn = i) for any n.
⇒ pij is the probability of making a transition from state i to state j in a
single step.
Q What is the probability of making a transition from state i to state j
over two steps? In another sence, what is P(X2 = j|X0 = i)?
Ans pij
2
Subject: Machine Learning Dr. Varun Kumar Lecture 9 14 / 16
Continued–
Subject: Machine Learning Dr. Varun Kumar Lecture 9 15 / 16
References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
S. Haykin, Neural Networks and Learning Machines, 3/E. Pearson Education
India, 2010.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 16 / 16

More Related Content

PDF
Markov decision process
PDF
Markov decision process
PPT
Reinforcement learning
PDF
Reinforcement Learning 1. Introduction
PDF
Data Visualization in Python
PPTX
Overview of Big data(ppt)
PPTX
Introduction to matplotlib
PPTX
Conditional Random Fields
Markov decision process
Markov decision process
Reinforcement learning
Reinforcement Learning 1. Introduction
Data Visualization in Python
Overview of Big data(ppt)
Introduction to matplotlib
Conditional Random Fields

What's hot (20)

PPTX
An introduction to reinforcement learning
PDF
Reinforcement learning
PPTX
Reinforcement Learning
PPT
Reinforcement Learning Q-Learning
PDF
An introduction to reinforcement learning
PDF
Reinforcement Learning 4. Dynamic Programming
PPT
First order logic
PPTX
Deep Reinforcement Learning
PDF
Reinforcement learning, Q-Learning
PPTX
Reinforcement Learning : A Beginners Tutorial
PDF
An introduction to deep reinforcement learning
PDF
Understanding Bagging and Boosting
PDF
Multi-armed Bandits
PPTX
First order logic
PDF
Decision trees in Machine Learning
PPT
backpropagation in neural networks
PPT
AI Lecture 4 (informed search and exploration)
PPTX
AI_Session 11: searching with Non-Deterministic Actions and partial observati...
PPTX
Forward Backward Chaining
PDF
Machine Learning: Introduction to Neural Networks
An introduction to reinforcement learning
Reinforcement learning
Reinforcement Learning
Reinforcement Learning Q-Learning
An introduction to reinforcement learning
Reinforcement Learning 4. Dynamic Programming
First order logic
Deep Reinforcement Learning
Reinforcement learning, Q-Learning
Reinforcement Learning : A Beginners Tutorial
An introduction to deep reinforcement learning
Understanding Bagging and Boosting
Multi-armed Bandits
First order logic
Decision trees in Machine Learning
backpropagation in neural networks
AI Lecture 4 (informed search and exploration)
AI_Session 11: searching with Non-Deterministic Actions and partial observati...
Forward Backward Chaining
Machine Learning: Introduction to Neural Networks
Ad

Similar to Lecture 9 Markov decision process (20)

PDF
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
PDF
Role of Bellman's Equation in Reinforcement Learning
PDF
Lecture 2 (Machine Learning)
PPTX
reinforcement-learning-141009013546-conversion-gate02.pptx
PDF
Model Based Episodic Memory
PDF
reinforcement-learning-141009013546-conversion-gate02.pdf
PPTX
An efficient use of temporal difference technique in Computer Game Learning
PDF
Head First Reinforcement Learning
PPTX
Reinforcement learning
PPTX
Machine learning ( Part 3 )
PDF
(Online) Convex Reinforcement Learning and applications to Energy Management ...
PDF
RL_UpsideDown
PPTX
Intro to Deep Reinforcement Learning
PDF
REINFORCEMENT LEARNING
PDF
PPTX
Reinforcement learning
PDF
Deep reinforcement learning from scratch
PPTX
Particle filter
PDF
Introduction to Deep Reinforcement Learning
PDF
Tracking the tracker: Time Series Analysis in Python from First Principles
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Role of Bellman's Equation in Reinforcement Learning
Lecture 2 (Machine Learning)
reinforcement-learning-141009013546-conversion-gate02.pptx
Model Based Episodic Memory
reinforcement-learning-141009013546-conversion-gate02.pdf
An efficient use of temporal difference technique in Computer Game Learning
Head First Reinforcement Learning
Reinforcement learning
Machine learning ( Part 3 )
(Online) Convex Reinforcement Learning and applications to Energy Management ...
RL_UpsideDown
Intro to Deep Reinforcement Learning
REINFORCEMENT LEARNING
Reinforcement learning
Deep reinforcement learning from scratch
Particle filter
Introduction to Deep Reinforcement Learning
Tracking the tracker: Time Series Analysis in Python from First Principles
Ad

More from VARUN KUMAR (20)

PDF
Distributed rc Model
PDF
Electrical Wire Model
PDF
Interconnect Parameter in Digital VLSI Design
PDF
Introduction to Digital VLSI Design
PDF
Challenges of Massive MIMO System
PDF
E-democracy or Digital Democracy
PDF
Ethics of Parasitic Computing
PDF
Action Lines of Geneva Plan of Action
PDF
Geneva Plan of Action
PDF
Fair Use in the Electronic Age
PDF
Software as a Property
PDF
Orthogonal Polynomial
PDF
Patent Protection
PDF
Copyright Vs Patent and Trade Secrecy Law
PDF
Property Right and Software
PDF
Investigating Data Trials
PDF
Gaussian Numerical Integration
PDF
Censorship and Controversy
PDF
Romberg's Integration
PDF
Introduction to Censorship
Distributed rc Model
Electrical Wire Model
Interconnect Parameter in Digital VLSI Design
Introduction to Digital VLSI Design
Challenges of Massive MIMO System
E-democracy or Digital Democracy
Ethics of Parasitic Computing
Action Lines of Geneva Plan of Action
Geneva Plan of Action
Fair Use in the Electronic Age
Software as a Property
Orthogonal Polynomial
Patent Protection
Copyright Vs Patent and Trade Secrecy Law
Property Right and Software
Investigating Data Trials
Gaussian Numerical Integration
Censorship and Controversy
Romberg's Integration
Introduction to Censorship

Recently uploaded (20)

PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
web development for engineering and engineering
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Welding lecture in detail for understanding
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Well-logging-methods_new................
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
UNIT 4 Total Quality Management .pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
CH1 Production IntroductoryConcepts.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Automation-in-Manufacturing-Chapter-Introduction.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
web development for engineering and engineering
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Welding lecture in detail for understanding
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Embodied AI: Ushering in the Next Era of Intelligent Systems
Internet of Things (IOT) - A guide to understanding
Well-logging-methods_new................
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
UNIT 4 Total Quality Management .pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CYBER-CRIMES AND SECURITY A guide to understanding
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Geodesy 1.pptx...............................................
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CH1 Production IntroductoryConcepts.pptx

Lecture 9 Markov decision process

  • 1. Reinforcement Learning ⇒ Dynamic Programming ⇒ Markov Decision Process Subject: Machine Learning Dr. Varun Kumar Subject: Machine Learning Dr. Varun Kumar Lecture 9 1 / 16
  • 2. Outlines 1 Introduction to Reinforcement Learning 2 Application of Reinforcement Learning 3 Approach for Studying Reinforcement Learning 4 Basics of Dynamic Programming 5 Markov Decision Process: 6 References Subject: Machine Learning Dr. Varun Kumar Lecture 9 2 / 16
  • 3. Introduction to reinforcement learning: Key Feature 1 There is no supervisor for performing the learning process. 2 In stead of supervisor, there is a critic that informs the end outcome. 3 If outcome is meaningful then the whole process is rewarded. On the other side the whole process is penalized. 4 This learning process is based on reward and penalty. 5 Critic convert the primary reinforcement signal into heuristic reinforcement signal. 6 Primary reinforcement signal → Signal observed from the environment. 7 Heuristic reinforcement signal → Higher quality signal. Subject: Machine Learning Dr. Varun Kumar Lecture 9 3 / 16
  • 4. Difference between critic and supervisor Let a complex system has been described as follows Note ⇒ Critic does not provide the step-by-step solution. ⇒ Critic does not provide any method, training data, suitable learning system or logical operation for doing the necessary correction, if output does reaches to the expected value. ⇒ It comment only the end output, whereas supervisor helps in many ways. Subject: Machine Learning Dr. Varun Kumar Lecture 9 4 / 16
  • 5. Block diagram of reinforcement learning Block diagram Subject: Machine Learning Dr. Varun Kumar Lecture 9 5 / 16
  • 6. Aim of reinforcement learning ⇒ To minimize the cost-to-go function. ⇒ Cost-to-go function → Expectation of cumulative cost of action taken over a sequence of steps instead of immediate cost. ⇒ Learning system : It discover several actions and feed them back to the environment. Subject: Machine Learning Dr. Varun Kumar Lecture 9 6 / 16
  • 7. Application of reinforcement learning Major application area ♦ Game theory. ♦ Simulation based optimization. ♦ Operational research. ♦ Control theory. ♦ Swarm intelligence. ♦ Multi-agents system. ♦ Information theory. Note : ⇒ Reinforcement learning is also called as approximate dynamic programming. Subject: Machine Learning Dr. Varun Kumar Lecture 9 7 / 16
  • 8. Approach for studying reinforcement learning Classical approach: Learning takes place through a process of reward and penalty with the goal of achieving highly skilled behavior. Modern approach : ⇒ Based on mathematical framework, such as dynamic programming. ⇒ It decides on the course of action by considering possible future stages without actually experiencing them. ⇒ It emphasis on planning. ⇒ It is a credit assignment problem. ⇒ Credit or blame is part of interacting decision. Subject: Machine Learning Dr. Varun Kumar Lecture 9 8 / 16
  • 9. Dynamic programming Basics ⇒ How can an agent/decision maker/learning system improves its long term performance in a stochastic environment ? ⇒ Attaining long term improvised performance without disrupting the short term performance. Markov decision process (MDP) Subject: Machine Learning Dr. Varun Kumar Lecture 9 9 / 16
  • 10. Markov decision process (MDP): Key features of MDP ♦ Environment is modeled through probabilistic framework. Some known probability mass function (pmf) may be the basis for modeling. ♦ It consists of a finite set of discrete states. ♦ Here states does not contain any past statistics. ♦ Through well defined pmf a set of discrete sample data is created. ♦ For each environmental state, there is a finite set of possible action that may be taken by agent. ♦ Every time agent takes an action, a certain cost is incurred. ♦ States are observed, actions are taken and costs are incurred at discrete times. Subject: Machine Learning Dr. Varun Kumar Lecture 9 10 / 16
  • 11. Continued– MDP works on the stochastic environment. It is nothing but a random process. Decision action is a time dependent random variable. Mathematical description: ⇒ Si is the ith state at a sample instant n. ⇒ Sj is the next state at a sample instant n + 1 ⇒ pij is known as the transition probability ∀ 1 ≤ i ≤ k and 1 ≤ j ≤ k pij (Ai ) = P(Xn+1 = Sj |Xn = Si , An = Ai ) ⇒ Ai ia ith action taken by an agent at a sample instant n. Subject: Machine Learning Dr. Varun Kumar Lecture 9 11 / 16
  • 12. Markov chain rule Markov chain rule Markov chain rule is based on the partition theorem. Statement of partition theorem: Let B1, ..., Bm form a partition of Ω, then for any event A. P(A) = N i=1 P(A ∩ Bi ) = N i=1 P(A|Bi )P(Bi ) Subject: Machine Learning Dr. Varun Kumar Lecture 9 12 / 16
  • 13. Markov property 1 The basic property of a Markov chain is that only the most recent point in the trajectory affects what happens next. P(Xn+1|Xn, Xn−1, ....X0) = P(Xn+1|Xn) 2 Transition matrix or stochastic matrix: P =      p11 p12 .... p1K p21 p22 .... p2K ... ... .......... pK1 pK2 ..... pKK      ⇒ Sum of row is equal to unity → j pij = 1 ⇒ p11 + p12 + ....p1K = 1 or but p11 + p21 + .... + pK1 = 1 Subject: Machine Learning Dr. Varun Kumar Lecture 9 13 / 16
  • 14. Continued– 3 n-step transition probability: Statement: Let X0, X1, X2, ... be a Markov chain with state space S = 1, 2, ..., N. Recall that the elements of the transition matrix P are defined as: pij = P(X1 = j|X0 = i) = P(Xn+1 = j|Xn = i) for any n. ⇒ pij is the probability of making a transition from state i to state j in a single step. Q What is the probability of making a transition from state i to state j over two steps? In another sence, what is P(X2 = j|X0 = i)? Ans pij 2 Subject: Machine Learning Dr. Varun Kumar Lecture 9 14 / 16
  • 15. Continued– Subject: Machine Learning Dr. Varun Kumar Lecture 9 15 / 16
  • 16. References E. Alpaydin, Introduction to machine learning. MIT press, 2020. J. Grus, Data science from scratch: first principles with python. O’Reilly Media, 2019. T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University, School of Computer Science, Machine Learning , 2006, vol. 9. S. Haykin, Neural Networks and Learning Machines, 3/E. Pearson Education India, 2010. Subject: Machine Learning Dr. Varun Kumar Lecture 9 16 / 16