SlideShare a Scribd company logo
2
Most read
3
Most read
15
Most read
Chapter 1: Introduction
Seungjae Ryan Lee
● Learning by interacting with the environment
● Goal: maximize a numerical reward signal by choosing correct actions
○ Trial and error: learner is not told the best action
○ Delayed rewards: actions can affect all future rewards
Reinforcement Learning
● No external supervisor / teacher
○ No training set with labeled examples (answers)
○ Need to interact with environment in uncharted territories
● Different goals
○ Supervised Learning: Generalize existing data to minimize test set error
○ Reinforcement Learning: Maximize reward through interactions
○ Unsupervised Learning: Find hidden structure
→ Reinforcement Learning is a new paradigm of Machine Learning
vs. Supervised and Unsupervised Learning
● Interactions between agent and environment
● Uncertainty about the environment
○ Effects of actions cannot be fully predicted
○ Monitor environment and react appropriately
● Defined goal
○ Judge progress through rewards
● Present affects the future
○ Effect can be delayed
● Experience improves performance
Characteristics of Reinforcement Learning
● Complex sequence of interactions to achieve goal
● Need to observe and react to the uncertainty of the environment
○ Grab different bowl if current bowl is dirty
○ Stop pouring if the bowl is about to overflow
● Actions have delayed consequences
○ Failing to get spoon does not matter until you start eating
● Experience improves performance
Example: Preparing Breakfast
Get
cereal
Get
bowl
Get
spoon
Get
milk
Pour
cereal
Pour
milk
Grab
spoon
● Exploration: Try different actions
● Exploitation: Choose best known action
● Need both to obtain high reward
Exploration vs Exploitation
Try a new cereal? Eat the usual cereal?Agent
● Policy defines the agent’s behavior
● Reward Signal defines the goal of the problem
● Value Function indicates the long-term desirability of state
● Model of the environment mimics behavior of environment
Elements of Reinforcement Learning
● Mapping from observation to action
● Defines the agent’s behavior
● Can be stochastic
Policy
s1
s2
s3
s4
a1
a2
S A
● Reward
○ Immediate reward of action
○ Defines good/bad events for the agent
○ Given by the environment
● Value Function
○ Sum of future rewards from a state
○ Long-term desirability of states
○ Difficult to estimate
○ Primary basis of choosing action
Reward Signal vs. Value Function
● Mimics the behavior of environment
● Allow planning a future course of actions
● Not necessary for all RL methods
○ Model-based methods use the model for planning
○ Model-free methods only use trial-and-error
Model
https://worldmodels.github.io/
● Assume imperfect opponent
● Agent needs to find and exploit imperfections
Example: Tic Tac Toe
Tic Tac Toe with Reinforcement Learning
● Initialize value functions to 0.5 (except terminal states)
● Learn by playing games
○ Move greedily most times, but explore sometimes
● Incrementally update value functions by playing games
● Decrease learning rate over time to converge
● Minimax algorithm
○ Assumes best play for opponent → Cannot exploit opponent
● Classic optimization
○ Require complete specification of opponent → Impractical
○ ex. Dynamic Programming
● Evolutionary methods
○ Finds optimal algorithm
○ Ignores useful structure of RL problems
○ Works best when good policy can be found easily
Tic Tac Toe with other algorithms
● Can be applied to:
○ more complex games (ex. Backgammon)
○ problems without enemies (“games against nature”)
○ problems with partially observable environments
○ non-episodic problems
○ continuous-time problems
Reinforcement Learning beyond Tic-Tac-Toe
https://blog.openai.com/evolution-strategies/
Thank you!
Original content from
● Reinforcement Learning: An Introduction by Sutton and Barto
You can find more content in
● github.com/seungjaeryanlee
● www.endtoend.ai

More Related Content

PDF
Reinforcement Learning 3. Finite Markov Decision Processes
PDF
Reinforcement Learning 2. Multi-armed Bandits
PDF
Reinforcement Learning 6. Temporal Difference Learning
PDF
Reinforcement Learning 4. Dynamic Programming
PDF
Reinforcement Learning 5. Monte Carlo Methods
PDF
Reinforcement Learning 8: Planning and Learning with Tabular Methods
PDF
Rl chapter 1 introduction
PDF
An introduction to reinforcement learning
Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 6. Temporal Difference Learning
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Rl chapter 1 introduction
An introduction to reinforcement learning

What's hot (20)

PDF
Introduction of Deep Reinforcement Learning
PPTX
An introduction to reinforcement learning
PDF
Introduction to SAC(Soft Actor-Critic)
PDF
An introduction to deep reinforcement learning
PPT
Reinforcement learning 7313
PDF
Reinforcement Learning 10. On-policy Control with Approximation
PPTX
Reinforcement Learning
PPTX
Intro to Deep Reinforcement Learning
PDF
Deep Q-Learning
PDF
Reinforcement Learning 7. n-step Bootstrapping
PDF
Deep Reinforcement Learning
PDF
Deep Reinforcement Learning: Q-Learning
PDF
Reinforcement learning, Q-Learning
PPTX
Deep Reinforcement Learning
PPT
Reinforcement Learning Q-Learning
PPTX
Reinforcement Learning
PPTX
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
PDF
Continuous control with deep reinforcement learning (DDPG)
PDF
Reinforcement Learning using OpenAI Gym
PDF
강화 학습 기초 Reinforcement Learning an introduction
Introduction of Deep Reinforcement Learning
An introduction to reinforcement learning
Introduction to SAC(Soft Actor-Critic)
An introduction to deep reinforcement learning
Reinforcement learning 7313
Reinforcement Learning 10. On-policy Control with Approximation
Reinforcement Learning
Intro to Deep Reinforcement Learning
Deep Q-Learning
Reinforcement Learning 7. n-step Bootstrapping
Deep Reinforcement Learning
Deep Reinforcement Learning: Q-Learning
Reinforcement learning, Q-Learning
Deep Reinforcement Learning
Reinforcement Learning Q-Learning
Reinforcement Learning
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Continuous control with deep reinforcement learning (DDPG)
Reinforcement Learning using OpenAI Gym
강화 학습 기초 Reinforcement Learning an introduction
Ad

Similar to Reinforcement Learning 1. Introduction (20)

PPTX
reinforcement-learning-141009013546-conversion-gate02.pptx
PDF
reinforcement-learning-141009013546-conversion-gate02.pdf
PPT
Reinforcement learning
PDF
Shanghai deep learning meetup 4
PDF
What is Reinforcement Learning.pdf
PDF
Reinforcement learning for data-driven optimisation
PPTX
CS3013 -MACHINE LEARNING.pptx
PDF
anintroductiontoreinforcementlearning-180912151720.pdf
PDF
Intro rl
PDF
Reinforcement learning
PDF
A Review on Introduction to Reinforcement Learning
PDF
DRL 1 Course Introduction Reinforcement.ppt
PDF
Reinforcement Learning - Learning from Experience like a Human
PPTX
semi supervised Learning and Reinforcement learning (1).pptx
PDF
Machine Learning , deep learning module imp
PPTX
Introduction to reinforcement learning
PDF
Lecture 1 - introduction.pdf
PPT
RL.ppt
PDF
Reinforcement learning Russell and Norvig CMSC
PDF
"Reinforcement Learning: Pioneering the Next Evolution in Artificial Intellig...
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pdf
Reinforcement learning
Shanghai deep learning meetup 4
What is Reinforcement Learning.pdf
Reinforcement learning for data-driven optimisation
CS3013 -MACHINE LEARNING.pptx
anintroductiontoreinforcementlearning-180912151720.pdf
Intro rl
Reinforcement learning
A Review on Introduction to Reinforcement Learning
DRL 1 Course Introduction Reinforcement.ppt
Reinforcement Learning - Learning from Experience like a Human
semi supervised Learning and Reinforcement learning (1).pptx
Machine Learning , deep learning module imp
Introduction to reinforcement learning
Lecture 1 - introduction.pdf
RL.ppt
Reinforcement learning Russell and Norvig CMSC
"Reinforcement Learning: Pioneering the Next Evolution in Artificial Intellig...
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Dropbox Q2 2025 Financial Results & Investor Presentation
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
“AI and Expert System Decision Support & Business Intelligence Systems”
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Modernizing your data center with Dell and AMD
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
Per capita expenditure prediction using model stacking based on satellite ima...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectral efficient network and resource selection model in 5G networks
Digital-Transformation-Roadmap-for-Companies.pptx

Reinforcement Learning 1. Introduction

  • 2. ● Learning by interacting with the environment ● Goal: maximize a numerical reward signal by choosing correct actions ○ Trial and error: learner is not told the best action ○ Delayed rewards: actions can affect all future rewards Reinforcement Learning
  • 3. ● No external supervisor / teacher ○ No training set with labeled examples (answers) ○ Need to interact with environment in uncharted territories ● Different goals ○ Supervised Learning: Generalize existing data to minimize test set error ○ Reinforcement Learning: Maximize reward through interactions ○ Unsupervised Learning: Find hidden structure → Reinforcement Learning is a new paradigm of Machine Learning vs. Supervised and Unsupervised Learning
  • 4. ● Interactions between agent and environment ● Uncertainty about the environment ○ Effects of actions cannot be fully predicted ○ Monitor environment and react appropriately ● Defined goal ○ Judge progress through rewards ● Present affects the future ○ Effect can be delayed ● Experience improves performance Characteristics of Reinforcement Learning
  • 5. ● Complex sequence of interactions to achieve goal ● Need to observe and react to the uncertainty of the environment ○ Grab different bowl if current bowl is dirty ○ Stop pouring if the bowl is about to overflow ● Actions have delayed consequences ○ Failing to get spoon does not matter until you start eating ● Experience improves performance Example: Preparing Breakfast Get cereal Get bowl Get spoon Get milk Pour cereal Pour milk Grab spoon
  • 6. ● Exploration: Try different actions ● Exploitation: Choose best known action ● Need both to obtain high reward Exploration vs Exploitation Try a new cereal? Eat the usual cereal?Agent
  • 7. ● Policy defines the agent’s behavior ● Reward Signal defines the goal of the problem ● Value Function indicates the long-term desirability of state ● Model of the environment mimics behavior of environment Elements of Reinforcement Learning
  • 8. ● Mapping from observation to action ● Defines the agent’s behavior ● Can be stochastic Policy s1 s2 s3 s4 a1 a2 S A
  • 9. ● Reward ○ Immediate reward of action ○ Defines good/bad events for the agent ○ Given by the environment ● Value Function ○ Sum of future rewards from a state ○ Long-term desirability of states ○ Difficult to estimate ○ Primary basis of choosing action Reward Signal vs. Value Function
  • 10. ● Mimics the behavior of environment ● Allow planning a future course of actions ● Not necessary for all RL methods ○ Model-based methods use the model for planning ○ Model-free methods only use trial-and-error Model https://worldmodels.github.io/
  • 11. ● Assume imperfect opponent ● Agent needs to find and exploit imperfections Example: Tic Tac Toe
  • 12. Tic Tac Toe with Reinforcement Learning ● Initialize value functions to 0.5 (except terminal states) ● Learn by playing games ○ Move greedily most times, but explore sometimes ● Incrementally update value functions by playing games ● Decrease learning rate over time to converge
  • 13. ● Minimax algorithm ○ Assumes best play for opponent → Cannot exploit opponent ● Classic optimization ○ Require complete specification of opponent → Impractical ○ ex. Dynamic Programming ● Evolutionary methods ○ Finds optimal algorithm ○ Ignores useful structure of RL problems ○ Works best when good policy can be found easily Tic Tac Toe with other algorithms
  • 14. ● Can be applied to: ○ more complex games (ex. Backgammon) ○ problems without enemies (“games against nature”) ○ problems with partially observable environments ○ non-episodic problems ○ continuous-time problems Reinforcement Learning beyond Tic-Tac-Toe https://blog.openai.com/evolution-strategies/
  • 15. Thank you! Original content from ● Reinforcement Learning: An Introduction by Sutton and Barto You can find more content in ● github.com/seungjaeryanlee ● www.endtoend.ai