SlideShare a Scribd company logo
Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024
Agenda
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
What is reinforcement learning?
Reward Action
What is reinforcement learning?
Reward Action
What is reinforcement learning?
action-reward feedback
loop of a generic RL
model
What is reinforcement learning?
Reinforcement learning is a branch
of machine learning that relies on
learning through the mechanism of
rewards and punishments.
Policy
How does Agent decide which action to take?
Policy determines a probability that Agent will do Action At when in State St
Policy: π(a|s)
Goal == maximize total reward
𝜸 == discount factor
Determines how much is a reward
in distant future is less important
that reward in near future
Gt (Return)
total reward in the future
Learning is done in discrete steps
Rk == reward in step k
The number of steps can be
fixed (T) or infinite (∞)
Reinforcement learning in the the world of AI
Artificial Intelligence
Machine Learning
… …
Supervised learning
Unsupervised learning
Reinforcement learning
Reinforcement learning in the the world of ML
Supervised learning vs reinforcement learning
- Supervised learning relies on labeled data set
Unsupervised learning vs reinforcement learning
- Unsupervised learning == training based on unlabeled data
== finding patterns in
data
- Reinforcement learning == learning through the mechanism of
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
Robotics
RL is used for building robust robots
Industrial robots for more complex applications
Sophisticated grasping strategies, object manipulation techniques, and
enhance hand-eye coordination
RL can be used to teach a robot to walk on 2 or 4 legs
RL can be used to teach a robot to walk on two/four legs
https://www.freethink.com/hard-tech/robot-legs https://bostondynamics.com/blog/starting-
on-the-right-foot-with-reinforcement-learning
https://youtu.be/goxCjGPQH7U
Gaming
RL can be used for testing games
RL can perform many iterations
without human input
Reinforcement learning and Atari games
Deep Q Learning was used to teach AI how to play Atari 2600 games
Reinforcement learning and Atari games
AI system did not get a domain knowledge how to play games (rules)
System only sees pixels and was instructed to maximize points
Implemented for many Atari 2600 games: Pong, Breakout …
In 2013. Deepmind has published „Playing Atari with Deep Reinforcement
Learning (Mnih et. al)”: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
Reinforcement learning and Atari games
Game: Breakout
After 240 minutes RL system has learned the
best strategy:
Create a tunnel, and send ball above the blocks
-> The ball bounces between roof and blocks
„The implications go far beyond my
beloved chessboard... Not only do these
self-taught expert machines perform
incredibly well, but we can actually learn
from the new knowledge they produce.”
Garry Kasparov
former world chess champion
AlphaGo
Presented in 2015. by Google
DeepMind (https://deepmind.google)
The first program that won a match
against world champion in Go
- Chinese strategy board game
- Bigger challenge than chess
AlphaZero
2017 AlphaZero == a single AI system that is an expert in:
Go
Chess
Shogi (Japanese chess)
https://deepmind.google/discover/blog/alphazero-shedding-new-light-on-
chess-shogi-and-go
Healthcare
Reinforcement learning is applied to:
- Development of the new drugs
- Diagnostics
- Dynamic treatment regimes (DTRs)
- Surgery
- …
Trading and Finance
Reinforcement learning achieves better
results than supervised learning when
applied to trading and finance
IBM has developed a sophisticated RL-
based platform that has ability to make
financial trades
Autonomous driving
RL can be used for:
Trajectory optimization
Avoiding collision
Lane changing
Automatic parking
…
More info: https://wayve.ai | https://youtu.be/eRwTbRtnT1I
And other areas …
Cooling of data center (Google has reduced energy usage by 40%)
News recommendation
Marketing
…
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
Advantages of Reinforcement Learning
✅RL can solve complex problems that cannot be solved using other
methods.
✅It functions in dynamic environments
✅RL does not need a separate step of preparing data
Difference between RL and supervised learning
✅It can be used when the only way to collect data from an environment is
for an agent to interact with that environment
…
Disadvantages of Reinforcement Learning
⚠ Sparse-reward environment - an agent receives a reward only when the
goal is reached
Harder to known which steps were actually useful
Popular solution == reward shaping -> adding additional hand-crafted
rewards to help RL
Hand-crafted additional awards require human expert to design them
correctly, and additionally humans can be bias
Disadvantages of Reinforcement Learning
⚠ RL needs to collect a lot of data from environment, and it needs a lot of
calculations (data hungry)
Not a problem when RL is applied to gaming because it can play the
same game many times and collect a lot of data.
⚠ It can be expensive to learn by trying (and failing)
For example: in robotics where robots are expensive and can get
damaged when used (for learning)
Solution to the disadvantages - general advice
Combine RL with other techniques
For example:
RL + Deep Learning
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
RL Algorithms
Source: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
Q-Learning Algorithm
Most famous RL algorithm
“Q” in “Q-Learning” stands for quality
Example (Python):
https://www.datacamp.com/tutorial/introduction-q-learning-beginner-
tutorial
Q-Table
Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python
Q-Learning Algorithm
Source: https://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.html
Deep Q-Learning Algorithm
Deep neural network instead of „simple” Q-Table
Used in case of large environments
Example (Python):
https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-
learning-python
Deep Q-Learning Algorithm
Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-
python
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
API for reinforcement learning
Python
One Agent is used
Different environments
https://gymnasium.farama.org
Key points
Reinforcement learning is a branch of machine learning where
agent learns about its environment using the mechanism of rewards and
punishments.
RL doesn’t rely on labeled data set.
RL learns by trial-and-error through interacting with its environment so it
can come to conclusions / knowledge that humans didn’t reach.
@MarkoLohert

More Related Content

PDF
Reinforcement learning
PPTX
Online learning & adaptive game playing
PDF
Reinforcement Learning using OpenAI Gym
PPTX
Machine Learning in Unity - How to give your game AI a real brain
PDF
Multi-Agent Reinforcement Learning
PPTX
Designing an AI that gains experience for absolute beginners
PPTX
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
PPTX
Building a deep learning ai.pptx
Reinforcement learning
Online learning & adaptive game playing
Reinforcement Learning using OpenAI Gym
Machine Learning in Unity - How to give your game AI a real brain
Multi-Agent Reinforcement Learning
Designing an AI that gains experience for absolute beginners
Ciro Continisio - Implementing Machine Learning the Unity way - Codemotion Mi...
Building a deep learning ai.pptx

Similar to Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024 (20)

PPTX
Machine Learning Contents.pptx
PDF
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
PDF
Machine learning Chapter 1
PPTX
OpenAI Gym & Universe
PDF
Aprendizaje reforzado con swift
PPTX
24.09.2021 Reinforcement Learning Algorithms.pptx
PPTX
Introduction to Deep Learning | CloudxLab
PDF
Briefly About Reinforcement Learning which we are using in our Esports project?
PDF
Is Production RL at a tipping point?
PPT
Chapter01.ppt
PPTX
Machine Learning in Finance
PPTX
Reinforcement learning slides
PDF
Autonomous Systems for Optimization and Control
PPTX
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
PPTX
Intelligent Ruby + Machine Learning
PPTX
UNIT 1 Machine Learning [KCS-055] (1).pptx
PDF
Machine learning Lecture 1
PPTX
Machine Learning.pptx
PPTX
Primer to Machine Learning
PPTX
Types of machine learning
Machine Learning Contents.pptx
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
Machine learning Chapter 1
OpenAI Gym & Universe
Aprendizaje reforzado con swift
24.09.2021 Reinforcement Learning Algorithms.pptx
Introduction to Deep Learning | CloudxLab
Briefly About Reinforcement Learning which we are using in our Esports project?
Is Production RL at a tipping point?
Chapter01.ppt
Machine Learning in Finance
Reinforcement learning slides
Autonomous Systems for Optimization and Control
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Intelligent Ruby + Machine Learning
UNIT 1 Machine Learning [KCS-055] (1).pptx
Machine learning Lecture 1
Machine Learning.pptx
Primer to Machine Learning
Types of machine learning
Ad

More from Marko Lohert (7)

PPTX
How to Run LLM Locally, and Why - Marko Lohert - Graz 2025.pptx
PPTX
Vector Databases and Why Are They Used in Modern AI - Marko Lohert - ATD 2024
PPTX
Kako lokalno pokrenuti LLM - Marko Lohert - ATD 2024.pptx
PPTX
Kvantna računala - Marko Lohert - Meetup Križevci - 2023.pptx
PPTX
Get started with MudBlazor
PPTX
Scratch 3.0 - dizajniran za učenje programiranja (DORS/CLUC 2019)
PPTX
Scratch workshops at Dors/CLUC 2016 conference (in Croatian)
How to Run LLM Locally, and Why - Marko Lohert - Graz 2025.pptx
Vector Databases and Why Are They Used in Modern AI - Marko Lohert - ATD 2024
Kako lokalno pokrenuti LLM - Marko Lohert - ATD 2024.pptx
Kvantna računala - Marko Lohert - Meetup Križevci - 2023.pptx
Get started with MudBlazor
Scratch 3.0 - dizajniran za učenje programiranja (DORS/CLUC 2019)
Scratch workshops at Dors/CLUC 2016 conference (in Croatian)
Ad

Recently uploaded (20)

PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Digital Strategies for Manufacturing Companies
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
history of c programming in notes for students .pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
top salesforce developer skills in 2025.pdf
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
assetexplorer- product-overview - presentation
PDF
Understanding Forklifts - TECH EHS Solution
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Digital Strategies for Manufacturing Companies
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Which alternative to Crystal Reports is best for small or large businesses.pdf
CHAPTER 2 - PM Management and IT Context
history of c programming in notes for students .pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
iTop VPN Free 5.6.0.5262 Crack latest version 2025
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Odoo POS Development Services by CandidRoot Solutions
top salesforce developer skills in 2025.pdf
Why Generative AI is the Future of Content, Code & Creativity?
Design an Analysis of Algorithms I-SECS-1021-03
assetexplorer- product-overview - presentation
Understanding Forklifts - TECH EHS Solution
wealthsignaloriginal-com-DS-text-... (1).pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus

Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

  • 2. Agenda What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 3. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 4. What is reinforcement learning? Reward Action
  • 5. What is reinforcement learning? Reward Action
  • 6. What is reinforcement learning? action-reward feedback loop of a generic RL model
  • 7. What is reinforcement learning? Reinforcement learning is a branch of machine learning that relies on learning through the mechanism of rewards and punishments.
  • 8. Policy How does Agent decide which action to take? Policy determines a probability that Agent will do Action At when in State St Policy: π(a|s)
  • 9. Goal == maximize total reward 𝜸 == discount factor Determines how much is a reward in distant future is less important that reward in near future Gt (Return) total reward in the future Learning is done in discrete steps Rk == reward in step k The number of steps can be fixed (T) or infinite (∞)
  • 10. Reinforcement learning in the the world of AI Artificial Intelligence Machine Learning … … Supervised learning Unsupervised learning Reinforcement learning
  • 11. Reinforcement learning in the the world of ML Supervised learning vs reinforcement learning - Supervised learning relies on labeled data set Unsupervised learning vs reinforcement learning - Unsupervised learning == training based on unlabeled data == finding patterns in data - Reinforcement learning == learning through the mechanism of
  • 12. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 13. Robotics RL is used for building robust robots Industrial robots for more complex applications Sophisticated grasping strategies, object manipulation techniques, and enhance hand-eye coordination RL can be used to teach a robot to walk on 2 or 4 legs
  • 14. RL can be used to teach a robot to walk on two/four legs https://www.freethink.com/hard-tech/robot-legs https://bostondynamics.com/blog/starting- on-the-right-foot-with-reinforcement-learning https://youtu.be/goxCjGPQH7U
  • 15. Gaming RL can be used for testing games RL can perform many iterations without human input
  • 16. Reinforcement learning and Atari games Deep Q Learning was used to teach AI how to play Atari 2600 games
  • 17. Reinforcement learning and Atari games AI system did not get a domain knowledge how to play games (rules) System only sees pixels and was instructed to maximize points Implemented for many Atari 2600 games: Pong, Breakout … In 2013. Deepmind has published „Playing Atari with Deep Reinforcement Learning (Mnih et. al)”: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
  • 18. Reinforcement learning and Atari games Game: Breakout After 240 minutes RL system has learned the best strategy: Create a tunnel, and send ball above the blocks -> The ball bounces between roof and blocks
  • 19. „The implications go far beyond my beloved chessboard... Not only do these self-taught expert machines perform incredibly well, but we can actually learn from the new knowledge they produce.” Garry Kasparov former world chess champion
  • 20. AlphaGo Presented in 2015. by Google DeepMind (https://deepmind.google) The first program that won a match against world champion in Go - Chinese strategy board game - Bigger challenge than chess
  • 21. AlphaZero 2017 AlphaZero == a single AI system that is an expert in: Go Chess Shogi (Japanese chess) https://deepmind.google/discover/blog/alphazero-shedding-new-light-on- chess-shogi-and-go
  • 22. Healthcare Reinforcement learning is applied to: - Development of the new drugs - Diagnostics - Dynamic treatment regimes (DTRs) - Surgery - …
  • 23. Trading and Finance Reinforcement learning achieves better results than supervised learning when applied to trading and finance IBM has developed a sophisticated RL- based platform that has ability to make financial trades
  • 24. Autonomous driving RL can be used for: Trajectory optimization Avoiding collision Lane changing Automatic parking …
  • 25. More info: https://wayve.ai | https://youtu.be/eRwTbRtnT1I
  • 26. And other areas … Cooling of data center (Google has reduced energy usage by 40%) News recommendation Marketing …
  • 27. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 28. Advantages of Reinforcement Learning ✅RL can solve complex problems that cannot be solved using other methods. ✅It functions in dynamic environments ✅RL does not need a separate step of preparing data Difference between RL and supervised learning ✅It can be used when the only way to collect data from an environment is for an agent to interact with that environment …
  • 29. Disadvantages of Reinforcement Learning ⚠ Sparse-reward environment - an agent receives a reward only when the goal is reached Harder to known which steps were actually useful Popular solution == reward shaping -> adding additional hand-crafted rewards to help RL Hand-crafted additional awards require human expert to design them correctly, and additionally humans can be bias
  • 30. Disadvantages of Reinforcement Learning ⚠ RL needs to collect a lot of data from environment, and it needs a lot of calculations (data hungry) Not a problem when RL is applied to gaming because it can play the same game many times and collect a lot of data. ⚠ It can be expensive to learn by trying (and failing) For example: in robotics where robots are expensive and can get damaged when used (for learning)
  • 31. Solution to the disadvantages - general advice Combine RL with other techniques For example: RL + Deep Learning
  • 32. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 34. Q-Learning Algorithm Most famous RL algorithm “Q” in “Q-Learning” stands for quality Example (Python): https://www.datacamp.com/tutorial/introduction-q-learning-beginner- tutorial
  • 37. Deep Q-Learning Algorithm Deep neural network instead of „simple” Q-Table Used in case of large environments Example (Python): https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q- learning-python
  • 38. Deep Q-Learning Algorithm Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning- python
  • 39. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 40. API for reinforcement learning Python One Agent is used Different environments https://gymnasium.farama.org
  • 41. Key points Reinforcement learning is a branch of machine learning where agent learns about its environment using the mechanism of rewards and punishments. RL doesn’t rely on labeled data set. RL learns by trial-and-error through interacting with its environment so it can come to conclusions / knowledge that humans didn’t reach.

Editor's Notes

  • #31: RL achieves excellent results when applied to complex problems
  • #35: https://youtu.be/Lu56xVlZ40M?si=DtUTUBi8-hpdFzhQ