Basics of Reinforcement Learning

Basics of
Reinforcement Learning
Spotle.ai Study Material
Spotle.ai/Learn

Spotle.ai/Learn
Let’s play chess!
I just don’t make any possible move
without thinking what my opponent’s
move can be to counter my move.
I try to consider all possible moves that
are safe. And then choose the one that I
feel is the best move among all.
Machines can learn this way. And this
learning is called reinforcement machine
learning.

Spotle.ai/Learn
What is reinforcement learning?
First, a particular situation in which the learning will be applicable.
You start at a point, you go through several steps to reach a level.
In the process you earn a reward point for every correct step and you lose a reward point
for every wrong step.
Finally, you choose the path with the highest reward point in that particular situation.
Agent Environment
State
Reward
Action

Spotle.ai/Learn
Terminologies
Agent: The learner and the decision maker.
Environment: Where the agent learns and decides what actions to perform.
Action: A set of actions which the agent can perform.
State: The state of the agent in the environment.
Reward: For each action selected by the agent the environment provides a reward.
Usually a scalar value.
Agent Environment
State
Reward
Action

In supervised learning the training data has the output, that is, the answer in it. Here
the model is trained with the correct answer. But in case of reinforcement learning,
there is no answer given. The reinforcement agent decides the action to perform based
on the maximum reward it receives. There is no training data in reinforcement
learning. The machine learns from its experience.
Supervised learning? No
Spotle.ai/Learn
Training
data
Not available

Spotle.ai/Learn
Reinforcing your learning
Which one to choose?
Give reward to all
possible ones step by step
Choose the one with the
maximum reward.Topic A Topic B Topic C

Spotle.ai/Learn
Pavlov Experiment
TRIAL 1
In the first trial Pavlov
gives meat to his dog and
the dog starts salivating.

Spotle.ai/Learn
Pavlov Experiment
TRIAL 2
In the second trial Pavlov
does not give meat to his
dog but rings a bell.
Without seeing the meat
the dog does not start
salivating.

Spotle.ai/Learn
Pavlov Experiment
TRIAL 3
In trial 3 Pavlov rings the
bell and gives meat to his
dog and seeing meat the
dog starts salivating.

Spotle.ai/Learn
Pavlov Experiment
TRIAL 4
In trial 4 Pavlov rings the
bell and at this his dog
starts salivating, hoping
that meat will follow the
ringing of the bell. This is
learning by reinforcement.
The dog was rewarded
with meat after the
ringing of the bell.

Summarizing
❖ The input is an initial stage from which the machine starts learning.
❖ There are more than one possible output in a particular problem.
❖ Each output state is given a reward or punishment.
❖ The output with maximum reward is selected to be performed.
❖ The reinforcement learning process is continuous.

Spotle.ai/Learn
#HappyLearning
#BeCareerReady
That’s all for today.

Basics of Reinforcement Learning

More Related Content

Similar to Basics of Reinforcement Learning (20)

More from Spotle.ai (20)

Recently uploaded (20)

Basics of Reinforcement Learning