Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

Agenda
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?

What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?

What is reinforcement learning?
Reward Action

action-reward feedback
loop of a generic RL
model

Reinforcement learning is a branch
of machine learning that relies on
learning through the mechanism of
rewards and punishments.

Policy
How does Agent decide which action to take?
Policy determines a probability that Agent will do Action At when in State St
Policy: π(a|s)

Goal == maximize total reward
𝜸 == discount factor
Determines how much is a reward
in distant future is less important
that reward in near future
Gt (Return)
total reward in the future
Learning is done in discrete steps
Rk == reward in step k
The number of steps can be
fixed (T) or infinite (∞)

Reinforcement learning in the the world of AI
Artificial Intelligence
Machine Learning
… …
Supervised learning
Unsupervised learning
Reinforcement learning

Reinforcement learning in the the world of ML
Supervised learning vs reinforcement learning
- Supervised learning relies on labeled data set
Unsupervised learning vs reinforcement learning
- Unsupervised learning == training based on unlabeled data
== finding patterns in
data
- Reinforcement learning == learning through the mechanism of

Robotics
RL is used for building robust robots
Industrial robots for more complex applications
Sophisticated grasping strategies, object manipulation techniques, and
enhance hand-eye coordination
RL can be used to teach a robot to walk on 2 or 4 legs

RL can be used to teach a robot to walk on two/four legs
https://www.freethink.com/hard-tech/robot-legs https://bostondynamics.com/blog/starting-
on-the-right-foot-with-reinforcement-learning
https://youtu.be/goxCjGPQH7U

Gaming
RL can be used for testing games
RL can perform many iterations
without human input

Reinforcement learning and Atari games
Deep Q Learning was used to teach AI how to play Atari 2600 games

AI system did not get a domain knowledge how to play games (rules)
System only sees pixels and was instructed to maximize points
Implemented for many Atari 2600 games: Pong, Breakout …
In 2013. Deepmind has published „Playing Atari with Deep Reinforcement
Learning (Mnih et. al)”: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Game: Breakout
After 240 minutes RL system has learned the
best strategy:
Create a tunnel, and send ball above the blocks
-> The ball bounces between roof and blocks

„The implications go far beyond my
beloved chessboard... Not only do these
self-taught expert machines perform
incredibly well, but we can actually learn
from the new knowledge they produce.”
Garry Kasparov
former world chess champion

AlphaGo
Presented in 2015. by Google
DeepMind (https://deepmind.google)
The first program that won a match
against world champion in Go
- Chinese strategy board game
- Bigger challenge than chess

AlphaZero
2017 AlphaZero == a single AI system that is an expert in:
Go
Chess
Shogi (Japanese chess)
https://deepmind.google/discover/blog/alphazero-shedding-new-light-on-
chess-shogi-and-go

Healthcare
Reinforcement learning is applied to:
- Development of the new drugs
- Diagnostics
- Dynamic treatment regimes (DTRs)
- Surgery
- …

Trading and Finance
Reinforcement learning achieves better
results than supervised learning when
applied to trading and finance
IBM has developed a sophisticated RL-
based platform that has ability to make
financial trades

Autonomous driving
RL can be used for:
Trajectory optimization
Avoiding collision
Lane changing
Automatic parking
…

More info: https://wayve.ai | https://youtu.be/eRwTbRtnT1I

And other areas …
Cooling of data center (Google has reduced energy usage by 40%)
News recommendation
Marketing
…

Advantages of Reinforcement Learning
✅RL can solve complex problems that cannot be solved using other
methods.
✅It functions in dynamic environments
✅RL does not need a separate step of preparing data
Difference between RL and supervised learning
✅It can be used when the only way to collect data from an environment is
for an agent to interact with that environment
…

Disadvantages of Reinforcement Learning
⚠ Sparse-reward environment - an agent receives a reward only when the
goal is reached
Harder to known which steps were actually useful
Popular solution == reward shaping -> adding additional hand-crafted
rewards to help RL
Hand-crafted additional awards require human expert to design them
correctly, and additionally humans can be bias

Disadvantages of Reinforcement Learning
⚠ RL needs to collect a lot of data from environment, and it needs a lot of
calculations (data hungry)
Not a problem when RL is applied to gaming because it can play the
same game many times and collect a lot of data.
⚠ It can be expensive to learn by trying (and failing)
For example: in robotics where robots are expensive and can get
damaged when used (for learning)

Solution to the disadvantages - general advice
Combine RL with other techniques
For example:
RL + Deep Learning

RL Algorithms
Source: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html

Q-Learning Algorithm
Most famous RL algorithm
“Q” in “Q-Learning” stands for quality
Example (Python):
https://www.datacamp.com/tutorial/introduction-q-learning-beginner-
tutorial

Q-Table
Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python

Q-Learning Algorithm
Source: https://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.html

Deep Q-Learning Algorithm
Deep neural network instead of „simple” Q-Table
Used in case of large environments
Example (Python):
https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-
learning-python

Deep Q-Learning Algorithm
Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-
python

API for reinforcement learning
Python
One Agent is used
Different environments
https://gymnasium.farama.org

Key points
Reinforcement learning is a branch of machine learning where
agent learns about its environment using the mechanism of rewards and
punishments.
RL doesn’t rely on labeled data set.
RL learns by trial-and-error through interacting with its environment so it
can come to conclusions / knowledge that humans didn’t reach.

Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

More Related Content

Similar to Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024 (20)

More from Marko Lohert (7)

Recently uploaded (20)

Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

Editor's Notes