SlideShare a Scribd company logo
1
Mridula G Narang
1RV24SCS07
Q-LEARNING
2
INTRODUCTION
What is Reinforcement Learning?
• A type of Machine Learning where an agent learns to make decisions by interacting with an environment.
• The agent receives rewards or penalties based on its actions and aims to maximize total reward.
• Core elements:
• Agent: The learner or decision-maker.
• Environment: Where the agent operates.
• Action: Choices the agent can make.
• State: Current situation returned by the environment.
• Reward: Feedback from the environment.
• Reinforcement Learning ≠ Supervised Learning (no labeled data; learns from experience).
3
INTRODUCTION
What is Q-Learning?
• Q-learning is a model-free, value-based, off-policy algorithm that will find the best series of
actions based on the agent's current state.
• The “Q” stands for quality.
• Quality represents how valuable the action is in maximizing future rewards.
• Model-free algorithms learn the consequences of their actions through the experience
without transition and reward function.
• The value-based method trains the value function to learn which state is more valuable and
take action.
• Policy-based methods train the policy directly to learn which action to take in a given state.
• In the off-policy, the algorithm evaluates and updates a policy that differs from the policy
used to take an action.
TERMINOLOGIES
4
• States(s): the current position of the agent in the environment.
• Action(a): a step taken by the agent in a particular state.
• Rewards: for every action, the agent receives a reward and penalty.
• Episodes: the end of the stage, where agents can’t take new action. It happens when
the agent has achieved the goal or failed.
• Q(St+1, a): expected optimal Q-value of doing the action in a particular state.
• Q(St, At): it is the current estimation of Q(St+1, a).
• Q-Table: the agent maintains the Q-table of sets of states and actions.
• Temporal Differences(TD): used to estimate the expected value of Q(St+1, a) by using
the current state and action and previous state and action.
ALGORITHM
5
THE BELLMAN EQUATION
6
In order to find the Q value, we use the Bellman equation which is as follows:
Q(s, a) Q(s, a) + α [r + γ max Q(s', a') - Q(s, a)]
←
Where:
• Q(s, a): Current Q-value for state s and action a
• α (Learning rate): Impact of new information
• r (Reward): Immediate reward received
• γ (Discount factor): Future reward significance
• max Q(s', a'): Best Q-value in next state
7
APPLICATIONS OF Q-LEARNING
Game Strategy Learning
Q-Learning is widely used in training agents to play games such as Tic-Tac-Toe, Chess, and Gridworld.
The agent learns optimal strategies by interacting with the environment and receiving rewards,
allowing it to improve performance over time without needing a model of the game.
Robot Navigation
Robots use Q-Learning to learn how to navigate mazes or physical spaces by trial and error. The agent
learns the best sequence of actions (e.g., move forward, turn) to reach a target location while avoiding
obstacles, even when the environment is partially unknown.
Control Problems (e.g., Cart-Pole Balancing)
In classic control tasks like the Cart-Pole problem, Q-Learning teaches the agent to take actions (like
moving the cart left or right) that keep the pole balanced. These problems are fundamental
benchmarks in reinforcement learning and demonstrate how Q-Learning can handle continuous
feedback.
8
ADVANTAGES AND DISADVANTAGES
Advantages:
• Does not require a model of the environment (model-free).
• Can handle stochastic transitions and rewards.
• Converges to the optimal policy given sufficient exploration and learning time.
• Simple to implement and understand.
Disadvantages:
• Not scalable for environments with large state/action spaces (Q-table becomes too big).
• Requires lots of interactions with the environment.
• Exploration vs exploitation balance can be tricky.
• May converge slowly in complex environments.
THANK YOU
9

More Related Content

PDF
Reinforcement learning, Q-Learning
PDF
Reinforcement learning Russell and Norvig CMSC
PPTX
R22 Machine learning jntuh UNIT- 5.pptx
PPTX
24.09.2021 Reinforcement Learning Algorithms.pptx
PPT
RL.ppt
PDF
5th Module_Machine Learning_Reinforc.pdf
PPTX
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
PPTX
An efficient use of temporal difference technique in Computer Game Learning
Reinforcement learning, Q-Learning
Reinforcement learning Russell and Norvig CMSC
R22 Machine learning jntuh UNIT- 5.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
RL.ppt
5th Module_Machine Learning_Reinforc.pdf
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
An efficient use of temporal difference technique in Computer Game Learning

Similar to q_learning in machine learning with problem (20)

PDF
Reinforcement Learning Guide For Beginners
PPTX
Introduction: Asynchronous Methods for Deep Reinforcement Learning
PPTX
RL - Unit 1.pptx reinforcement learning ppt srm ist
PDF
Reinfrocement Learning
PDF
Introduction to Deep Reinforcement Learning
PDF
Machine Learning , deep learning module imp
PPTX
Reinforcement Learning: An Introduction.pptx
PPTX
What is Reinforcement Algorithms and how worked.pptx
PPT
Q_Learning.ppt
PDF
DRL 1 Course Introduction Reinforcement.ppt
PPTX
semi supervised Learning and Reinforcement learning (1).pptx
PDF
Reinforcement learning
PPTX
Intro to Reinforcement Learning
PPTX
14_ReinforcementLearning.pptx
PDF
Deep Reinforcement learning
PPT
Reinforcement learning
PPTX
Learning Task in machine learning
PPTX
Reinforcemnet Leaning in ML and DL.pptx
PPTX
Making smart decisions in real-time with Reinforcement Learning
PPTX
reinforcement-learning-141009013546-conversion-gate02.pptx
Reinforcement Learning Guide For Beginners
Introduction: Asynchronous Methods for Deep Reinforcement Learning
RL - Unit 1.pptx reinforcement learning ppt srm ist
Reinfrocement Learning
Introduction to Deep Reinforcement Learning
Machine Learning , deep learning module imp
Reinforcement Learning: An Introduction.pptx
What is Reinforcement Algorithms and how worked.pptx
Q_Learning.ppt
DRL 1 Course Introduction Reinforcement.ppt
semi supervised Learning and Reinforcement learning (1).pptx
Reinforcement learning
Intro to Reinforcement Learning
14_ReinforcementLearning.pptx
Deep Reinforcement learning
Reinforcement learning
Learning Task in machine learning
Reinforcemnet Leaning in ML and DL.pptx
Making smart decisions in real-time with Reinforcement Learning
reinforcement-learning-141009013546-conversion-gate02.pptx
Ad

Recently uploaded (20)

PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
737-MAX_SRG.pdf student reference guides
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPT
Occupational Health and Safety Management System
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
PPT on Performance Review to get promotions
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
Information Storage and Retrieval Techniques Unit III
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
introduction to high performance computing
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Visual Aids for Exploratory Data Analysis.pdf
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
737-MAX_SRG.pdf student reference guides
R24 SURVEYING LAB MANUAL for civil enggi
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Occupational Health and Safety Management System
Fundamentals of safety and accident prevention -final (1).pptx
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PPT on Performance Review to get promotions
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Information Storage and Retrieval Techniques Unit III
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
introduction to high performance computing
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Automation-in-Manufacturing-Chapter-Introduction.pdf
Exploratory_Data_Analysis_Fundamentals.pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Ad

q_learning in machine learning with problem

  • 2. 2 INTRODUCTION What is Reinforcement Learning? • A type of Machine Learning where an agent learns to make decisions by interacting with an environment. • The agent receives rewards or penalties based on its actions and aims to maximize total reward. • Core elements: • Agent: The learner or decision-maker. • Environment: Where the agent operates. • Action: Choices the agent can make. • State: Current situation returned by the environment. • Reward: Feedback from the environment. • Reinforcement Learning ≠ Supervised Learning (no labeled data; learns from experience).
  • 3. 3 INTRODUCTION What is Q-Learning? • Q-learning is a model-free, value-based, off-policy algorithm that will find the best series of actions based on the agent's current state. • The “Q” stands for quality. • Quality represents how valuable the action is in maximizing future rewards. • Model-free algorithms learn the consequences of their actions through the experience without transition and reward function. • The value-based method trains the value function to learn which state is more valuable and take action. • Policy-based methods train the policy directly to learn which action to take in a given state. • In the off-policy, the algorithm evaluates and updates a policy that differs from the policy used to take an action.
  • 4. TERMINOLOGIES 4 • States(s): the current position of the agent in the environment. • Action(a): a step taken by the agent in a particular state. • Rewards: for every action, the agent receives a reward and penalty. • Episodes: the end of the stage, where agents can’t take new action. It happens when the agent has achieved the goal or failed. • Q(St+1, a): expected optimal Q-value of doing the action in a particular state. • Q(St, At): it is the current estimation of Q(St+1, a). • Q-Table: the agent maintains the Q-table of sets of states and actions. • Temporal Differences(TD): used to estimate the expected value of Q(St+1, a) by using the current state and action and previous state and action.
  • 6. THE BELLMAN EQUATION 6 In order to find the Q value, we use the Bellman equation which is as follows: Q(s, a) Q(s, a) + α [r + γ max Q(s', a') - Q(s, a)] ← Where: • Q(s, a): Current Q-value for state s and action a • α (Learning rate): Impact of new information • r (Reward): Immediate reward received • γ (Discount factor): Future reward significance • max Q(s', a'): Best Q-value in next state
  • 7. 7 APPLICATIONS OF Q-LEARNING Game Strategy Learning Q-Learning is widely used in training agents to play games such as Tic-Tac-Toe, Chess, and Gridworld. The agent learns optimal strategies by interacting with the environment and receiving rewards, allowing it to improve performance over time without needing a model of the game. Robot Navigation Robots use Q-Learning to learn how to navigate mazes or physical spaces by trial and error. The agent learns the best sequence of actions (e.g., move forward, turn) to reach a target location while avoiding obstacles, even when the environment is partially unknown. Control Problems (e.g., Cart-Pole Balancing) In classic control tasks like the Cart-Pole problem, Q-Learning teaches the agent to take actions (like moving the cart left or right) that keep the pole balanced. These problems are fundamental benchmarks in reinforcement learning and demonstrate how Q-Learning can handle continuous feedback.
  • 8. 8 ADVANTAGES AND DISADVANTAGES Advantages: • Does not require a model of the environment (model-free). • Can handle stochastic transitions and rewards. • Converges to the optimal policy given sufficient exploration and learning time. • Simple to implement and understand. Disadvantages: • Not scalable for environments with large state/action spaces (Q-table becomes too big). • Requires lots of interactions with the environment. • Exploration vs exploitation balance can be tricky. • May converge slowly in complex environments.