SlideShare a Scribd company logo
4
Most read
6
Most read
UCB
Upper Confidence Bound
Challenge Description
• RL –serves the option that aims to maximize the reward
(e.g. if we measure clicks we wish to serve the option that will to be
clicked with the biggest probability )
Problem: After a certain duration there is a stronger option that will
always be served.
Epsilon-Greedy
• What is epsilon greedy?
• Causata’s implementation .
Multi –Armed Bandit (bandit)
• The problem :
Consider a casino with many slot machines. Each with a certain
unknown pay-out rates (e.g. 0.6 ,0.3, 0.4).
We aim to maximize our reward, hence we should learn the rates.
Exploration – We explore over the payouts
Exploitation – We assume that we have learned and we take the optimal
Q: How to balance between Exploration & Exploitation ?
Bandit algorithms verify that exploration will always take place
Bandit (Cont.)
• We can do A/B testing
1. Consider K machines
2. Play each of them randomly and measure the reward
3. Take the best measured rate.
• We can do UCB
• Impressions
• Responses (Positive responses)
• Opportunities
UCB – How does it work?
• We measure the pay-out rate of each option as in A/B
• Rather taking the biggest rate we take the rate+std
• It can be used as exploration mechanism (We follow this mechanism)
• It can be used in exploitation (explore and while exploiting using this
mechanism)
Visual Example
Chernoff/Hoeffding
• Chernoff/Hoeffding
• Let Xi ∈ [ai , bi ] independent random variables with µi = E[Xi ]
P(Ʃ|xi- µi| ≥ε) ≤ 2*exp((-2ε2 )/(Ʃ|𝑏𝑖 − 𝑎𝑖|2 ))
For every ε >0
Chernoff Hoefding (cont)
• For UCB needs we take :
• ε = 2log(t) /s where t is the amount of samples and s the amount of
impressions for a single arm .
• With some manipulations we get
• P(µi + 2log(t) /s ≤ µi) ≤ exp(-4log(t)) =-𝑡4
Formulas
• UCB= P +sqrt( (1-p) * p /impressions)
• Auer improvement
UCB =P +sqrt((1-p)*P*log(opportunities) /impressions))
• Next improvement
• UCB = P +sqrt((1-p)*P*log(opportunities) /impressions)) +log(opportunities
)/impressions -
• Note that this correction term may go to infinity thus we have a window,
• Further reading – Chernoff/Hoeffding inequality
Where it is used?
• In Causata’s engine –Exploration and solely exploration
• One can use the current exploration mechanism and use UCB as
exploitation (i.e. rather taking the best mean take the best UCB)

More Related Content

ODP
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
PDF
DRL #2-3 - Multi-Armed Bandits .pptx.pdf
ODP
Combining UCT and Constraint Satisfaction Problems for Minesweeper
PDF
Finalver
PPTX
EMOD_Optimization_Presentation.pptx
PDF
Reinfrocement Learning
PDF
tutorial1 on economic and strategic issues
PPTX
week1 cs72 module 01 temp check 190.pptx
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
DRL #2-3 - Multi-Armed Bandits .pptx.pdf
Combining UCT and Constraint Satisfaction Problems for Minesweeper
Finalver
EMOD_Optimization_Presentation.pptx
Reinfrocement Learning
tutorial1 on economic and strategic issues
week1 cs72 module 01 temp check 190.pptx

Similar to Ucb (20)

PDF
ICML2017 best paper (Understanding black box predictions via influence functi...
PDF
Scott Clark, Software Engineer, Yelp at MLconf SF
PDF
Practical AI for Business: Bandit Algorithms
PPTX
2a-linear-regression-18Maykjkij;oik;.pptx
PDF
Optimal Learning for Fun and Profit with MOE
PPTX
Intro to Reinforcement Learning
PPT
Ninja Cursors
PPTX
Step Count Method for Time Complexity Analysis.pptx
PPTX
Decision tree and Multi armed bandit.pptx
PPTX
Week 2 - ML models and Linear Regression.pptx
PPTX
Design and Analysis of Algorithms Lecture Notes
PPT
week1a-cohhgghgggggggggggggggggntrol.ppt
PPTX
Computer Studies 2013 Curriculum framework 11 Notes ppt.pptx
PDF
Lecture 4 asymptotic notations
PPT
Lecture1
PDF
7. Reinforcement Learning.pdf
PPTX
kmean_naivebayes.pptx
PPT
Signal and system (Classifications of systems)
PPTX
Basic Machine Learning in Python tutorial
PPT
Design and analysis of algorithm in Computer Science
ICML2017 best paper (Understanding black box predictions via influence functi...
Scott Clark, Software Engineer, Yelp at MLconf SF
Practical AI for Business: Bandit Algorithms
2a-linear-regression-18Maykjkij;oik;.pptx
Optimal Learning for Fun and Profit with MOE
Intro to Reinforcement Learning
Ninja Cursors
Step Count Method for Time Complexity Analysis.pptx
Decision tree and Multi armed bandit.pptx
Week 2 - ML models and Linear Regression.pptx
Design and Analysis of Algorithms Lecture Notes
week1a-cohhgghgggggggggggggggggntrol.ppt
Computer Studies 2013 Curriculum framework 11 Notes ppt.pptx
Lecture 4 asymptotic notations
Lecture1
7. Reinforcement Learning.pdf
kmean_naivebayes.pptx
Signal and system (Classifications of systems)
Basic Machine Learning in Python tutorial
Design and analysis of algorithm in Computer Science
Ad

More from Natan Katz (18)

PDF
Open Source models security- Supply chain
PDF
AI HIT taught in HIT always believe thanks
PPTX
AI Open-Source Models- Benefits vs. Risks.
PPTX
final_v.pptx
PPTX
AI for PM.pptx
PPTX
SGLD Berlin ML GROUP
PPTX
Ancestry, Anecdotes & Avanan -DL for Amateurs
PDF
Cyn meetup
PDF
Foundation of KL Divergence
PDF
Quant2a
PPTX
Bismark
PPTX
Bayesian Neural Networks
PDF
Deep VI with_beta_likelihood
PPTX
NICE Research -Variational inference project
PPTX
NICE Implementations of Variational Inference
PPTX
Neural ODE
PDF
Variational inference
PPTX
GAN for Bayesian Inference objectives
Open Source models security- Supply chain
AI HIT taught in HIT always believe thanks
AI Open-Source Models- Benefits vs. Risks.
final_v.pptx
AI for PM.pptx
SGLD Berlin ML GROUP
Ancestry, Anecdotes & Avanan -DL for Amateurs
Cyn meetup
Foundation of KL Divergence
Quant2a
Bismark
Bayesian Neural Networks
Deep VI with_beta_likelihood
NICE Research -Variational inference project
NICE Implementations of Variational Inference
Neural ODE
Variational inference
GAN for Bayesian Inference objectives
Ad

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
The scientific heritage No 166 (166) (2025)
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
famous lake in india and its disturibution and importance
PPTX
Microbiology with diagram medical studies .pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
protein biochemistry.ppt for university classes
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
The scientific heritage No 166 (166) (2025)
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
famous lake in india and its disturibution and importance
Microbiology with diagram medical studies .pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
microscope-Lecturecjchchchchcuvuvhc.pptx
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
2. Earth - The Living Planet Module 2ELS
ECG_Course_Presentation د.محمد صقران ppt
AlphaEarth Foundations and the Satellite Embedding dataset
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Cell Membrane: Structure, Composition & Functions
protein biochemistry.ppt for university classes
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...

Ucb

  • 2. Challenge Description • RL –serves the option that aims to maximize the reward (e.g. if we measure clicks we wish to serve the option that will to be clicked with the biggest probability ) Problem: After a certain duration there is a stronger option that will always be served.
  • 3. Epsilon-Greedy • What is epsilon greedy? • Causata’s implementation .
  • 4. Multi –Armed Bandit (bandit) • The problem : Consider a casino with many slot machines. Each with a certain unknown pay-out rates (e.g. 0.6 ,0.3, 0.4). We aim to maximize our reward, hence we should learn the rates. Exploration – We explore over the payouts Exploitation – We assume that we have learned and we take the optimal Q: How to balance between Exploration & Exploitation ? Bandit algorithms verify that exploration will always take place
  • 5. Bandit (Cont.) • We can do A/B testing 1. Consider K machines 2. Play each of them randomly and measure the reward 3. Take the best measured rate. • We can do UCB • Impressions • Responses (Positive responses) • Opportunities
  • 6. UCB – How does it work? • We measure the pay-out rate of each option as in A/B • Rather taking the biggest rate we take the rate+std • It can be used as exploration mechanism (We follow this mechanism) • It can be used in exploitation (explore and while exploiting using this mechanism)
  • 8. Chernoff/Hoeffding • Chernoff/Hoeffding • Let Xi ∈ [ai , bi ] independent random variables with µi = E[Xi ] P(Ʃ|xi- µi| ≥ε) ≤ 2*exp((-2ε2 )/(Ʃ|𝑏𝑖 − 𝑎𝑖|2 )) For every ε >0
  • 9. Chernoff Hoefding (cont) • For UCB needs we take : • ε = 2log(t) /s where t is the amount of samples and s the amount of impressions for a single arm . • With some manipulations we get • P(µi + 2log(t) /s ≤ µi) ≤ exp(-4log(t)) =-𝑡4
  • 10. Formulas • UCB= P +sqrt( (1-p) * p /impressions) • Auer improvement UCB =P +sqrt((1-p)*P*log(opportunities) /impressions)) • Next improvement • UCB = P +sqrt((1-p)*P*log(opportunities) /impressions)) +log(opportunities )/impressions - • Note that this correction term may go to infinity thus we have a window, • Further reading – Chernoff/Hoeffding inequality
  • 11. Where it is used? • In Causata’s engine –Exploration and solely exploration • One can use the current exploration mechanism and use UCB as exploitation (i.e. rather taking the best mean take the best UCB)