Reinforcement Learning
Reinforcement Learning
2
Topics
7.1 Upper Confidence Bound
7.2 Thompson Sampling
7.3 Q-Learning
Reinforcement Learning
• Reinforcement Learning is used to solve interacting problems
where the data observed up to time t is considered to decide
which action to take at time t + 1.
• It is also used for Artificial Intelligence when training
machines to perform tasks such as walking. Desired
outcomes provide the AI with reward, undesired with
punishment. Machines learn through trial and error.
• Reinforcement Learning models:
Upper Confidence Bound (UCB)
Thompson Sampling
Reinforcement Learning
What is RL?
RL Process
RL Process
RL Definitions:
RL Definitions:
RL Concept- Reward Maximization
Exploration and Exploitation
Upper Confidence Bound (UCB)
Thompson Sampling Algorithm
13
Difference between UCB & Thompson Sampling Algorithm
14
Q-Learning Algorithm
15
Q-Learning Algorithm
16
Reinforcement Learning
•Reinforcement Learning is used to solve interacting
problems where the data observed up to time t is
considered to decide which action to take at time t + 1.
•It is also used for Artificial Intelligence when training
machines to perform tasks such as walking. Desired
outcomes provide the AI with reward, undesired with
punishment. Machines learn through trial and error.
•Reinforcement Learning models:
Upper Confidence Bound (UCB)
Thompson Sampling
https://www.amazon.com/gp/mpc/A18C1YWPOVXRKS
Multi-Armed Bandit Problem using Upper
Bound confidence Algorithm
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Eclat Output
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
Upper Confidence Bound (UCB)
UCB Python Code
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Ads_CTR_Optimisation.csv')
# Implementing UCB
import math
N = 10000
d = 10
ads_selected = []
numbers_of_selections = [0] * d
sums_of_rewards = [0] * d
total_reward = 0
for n in range(0, N):
ad = 0
max_upper_bound = 0
UCB Python Code
for i in range(0, d):
if (numbers_of_selections[i] > 0):
average_reward = sums_of_rewards[i] / numbers_of_selections[i]
delta_i = math.sqrt(3/2 * math.log(n + 1) / numbers_of_selections[i])
upper_bound = average_reward + delta_i
else:
upper_bound = 1e400
if upper_bound > max_upper_bound:
max_upper_bound = upper_bound
ad = i
ads_selected.append(ad)
numbers_of_selections[ad] = numbers_of_selections[ad] + 1
reward = dataset.values[n, ad]
sums_of_rewards[ad] = sums_of_rewards[ad] + reward
total_reward = total_reward + reward
# Visualising the results
plt.hist(ads_selected)
plt.title('Histogram of ads selections')
plt.xlabel('Ads')
plt.ylabel('Number of times each ad was selected')
plt.show()
UCB Output
What is RL?
RL Process
RL Process
RL Definitions:
RL Definitions:
RL Concept- Reward Maximization
Exploration and Exploitation
Markov Decision Process
Markov Decision Process-Shortest Path Problem
This is an example of EXPLOITATION
7. Reinforcement Learning.pdf

More Related Content

PDF
3. Regression.pdf
PDF
1. Demystifying ML.pdf
PDF
8. Deep Learning.pdf
PDF
6. Association Rule.pdf
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PDF
4. Classification.pdf
PPTX
04 Multi-layer Feedforward Networks
PPTX
Multi verse optimization
3. Regression.pdf
1. Demystifying ML.pdf
8. Deep Learning.pdf
6. Association Rule.pdf
Artificial Neural Networks Lect3: Neural Network Learning rules
4. Classification.pdf
04 Multi-layer Feedforward Networks
Multi verse optimization

What's hot (20)

PDF
Deep Feed Forward Neural Networks and Regularization
PPT
Vanishing & Exploding Gradients
PPTX
Ensemble Learning and Random Forests
PDF
5. Types of Clustering Algorithms in ML.pdf
PPTX
Neural network & its applications
PPTX
Feedforward neural network
PPTX
Regularization in deep learning
PPTX
boosting algorithm
PPTX
Unsupervised learning networks
PPTX
The world of loss function
PDF
Recurrent neural networks rnn
PPT
3.6 constraint based cluster analysis
PPTX
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
PDF
Multi-Verse Optimizer
PPTX
Artificial Neural Networks for NIU session 2016 17
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
PPTX
LinkedIn talk at Netflix ML Platform meetup Sep 2019
PPT
Support Vector Machines
PPTX
Support vector machine
PDF
Deep learning - A Visual Introduction
Deep Feed Forward Neural Networks and Regularization
Vanishing & Exploding Gradients
Ensemble Learning and Random Forests
5. Types of Clustering Algorithms in ML.pdf
Neural network & its applications
Feedforward neural network
Regularization in deep learning
boosting algorithm
Unsupervised learning networks
The world of loss function
Recurrent neural networks rnn
3.6 constraint based cluster analysis
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Multi-Verse Optimizer
Artificial Neural Networks for NIU session 2016 17
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
LinkedIn talk at Netflix ML Platform meetup Sep 2019
Support Vector Machines
Support vector machine
Deep learning - A Visual Introduction
Ad

Similar to 7. Reinforcement Learning.pdf (20)

DOCX
Essentials of machine learning algorithms
PDF
Acm aleppo cpc training introduction 1
PPTX
Data Science at Flurry
PPTX
(2016 07-19) providing click predictions in real-time at scale
PDF
business analytic meeting 1 tunghai university.pdf
PPTX
Jay Yagnik at AI Frontiers : A History Lesson on AI
PPTX
Keynote at IWLS 2017
PDF
Reinforcement learning
PPTX
Ml ppt at
PDF
Machine Learning with Python- Machine Learning Algorithms.pdf
PDF
Big Data Science - hype?
PDF
Deep Learning Introduction - WeCloudData
PDF
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
PDF
Ed Snelson. Counterfactual Analysis
PPT
Query optimization
PDF
Inter IIT Tech Meet 2k19, IIT Jodhpur
PDF
The ABC of Implementing Supervised Machine Learning with Python.pptx
PDF
Power ai tensorflowworkloadtutorial-20171117
PPTX
Reinforcement Learning
Essentials of machine learning algorithms
Acm aleppo cpc training introduction 1
Data Science at Flurry
(2016 07-19) providing click predictions in real-time at scale
business analytic meeting 1 tunghai university.pdf
Jay Yagnik at AI Frontiers : A History Lesson on AI
Keynote at IWLS 2017
Reinforcement learning
Ml ppt at
Machine Learning with Python- Machine Learning Algorithms.pdf
Big Data Science - hype?
Deep Learning Introduction - WeCloudData
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Ed Snelson. Counterfactual Analysis
Query optimization
Inter IIT Tech Meet 2k19, IIT Jodhpur
The ABC of Implementing Supervised Machine Learning with Python.pptx
Power ai tensorflowworkloadtutorial-20171117
Reinforcement Learning
Ad

More from Jyoti Yadav (13)

PDF
Part 4: Understanding the working of Smart Contracts
PDF
Part 3 Introduction to Cryptocurrency.pdf
PDF
Part 2 Blockchain Programming Using Python.pdf
PDF
Part 1: Introduction to Blockchain Fundamentals
PDF
Natural Language Processing Algorithm...
PDF
2. Data Preprocessing.pdf
PDF
6. Web Publishing
PDF
5. Web Technology CSS Advanced
PDF
4. Web Technology CSS Basics-1
PDF
3. Web Technology Advanced HTML
PDF
2b. Web Technology HTML Basics-2
PDF
2a web technology html basics 1
PDF
1. web technology basics
Part 4: Understanding the working of Smart Contracts
Part 3 Introduction to Cryptocurrency.pdf
Part 2 Blockchain Programming Using Python.pdf
Part 1: Introduction to Blockchain Fundamentals
Natural Language Processing Algorithm...
2. Data Preprocessing.pdf
6. Web Publishing
5. Web Technology CSS Advanced
4. Web Technology CSS Basics-1
3. Web Technology Advanced HTML
2b. Web Technology HTML Basics-2
2a web technology html basics 1
1. web technology basics

Recently uploaded (20)

PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Unlock new opportunities with location data.pdf
PPT
What is a Computer? Input Devices /output devices
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Modernising the Digital Integration Hub
DOCX
search engine optimization ppt fir known well about this
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
STKI Israel Market Study 2025 version august
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Developing a website for English-speaking practice to English as a foreign la...
Tartificialntelligence_presentation.pptx
Unlock new opportunities with location data.pdf
What is a Computer? Input Devices /output devices
A novel scalable deep ensemble learning framework for big data classification...
1 - Historical Antecedents, Social Consideration.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Zenith AI: Advanced Artificial Intelligence
CloudStack 4.21: First Look Webinar slides
Web Crawler for Trend Tracking Gen Z Insights.pptx
Chapter 5: Probability Theory and Statistics
Modernising the Digital Integration Hub
search engine optimization ppt fir known well about this
observCloud-Native Containerability and monitoring.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Getting Started with Data Integration: FME Form 101
STKI Israel Market Study 2025 version august

7. Reinforcement Learning.pdf