SlideShare a Scribd company logo
1
Multi-armed bandits
• At each time t pick arm i;
• get independent payoff ft with mean ui
• Classic model for exploration – exploitation tradeoff
• Extensively studied (Robbins ’52, Gittins ’79)
• Typically assume each arm is tried multiple times
• Goal: minimize regret
…
u1 u2 u3 uK
1
[ ]
t
T opt t
i
T E f
R 

  
2
Infinite-armed bandits
…
p1 p2 p3 pk
… p∞
p1 p2
…
In many applications, number of arms is huge
(sponsored search, sensor selection)
Cannot try each arm even once
Assumptions on payoff function f essential
Optimizing Noisy, Unknown Functions
• Given: Set of possible inputs D;
black-box access to unknown function f
• Want: Adaptive choice of inputs
from D maximizing
• Many applications: robotic control [Lizotte
et al. ’07], sponsored search [Pande &
Olston, ’07], clinical trials, …
• Sampling is expensive
• Algorithms evaluated using regret
Goal: minimize
Running example: Noisy Search
• How to find the hottest point in a building?
• Many noisy sensors available but sampling is expensive
• D: set of sensors; : temperature at chosen at step i
Observe
• Goal: Find with minimal number of queries
4
Relating to us: Active learning for PMF
A bandit setting for movie recommendation
Task: recommend movies for a new user
M-armed Bandit
Movie item as arm of bandit
For a new user i
At each round t, pick a movie j
Observe a rating Xij
Goal: maximize cumulative reward
sum of the ratings of all recommended movies
Model: PMF
X=UV+E, where
U: N*K matrix, V: K*M matrix, E: N*M matrix, zero-mean normal distributed
Assume movie feature V is fully observed. User feature Ui is unknown at first
Xi(j) = Ui Vj + ε (regard the ith row vector of X as a function Xi)
Xi(.): random linear function
5
Key insight: Exploit correlation
• Sampling f(x) at one point x yields information about f(x’)
for points x’ near x
• In this paper:
Model correlation using a Gaussian process (GP) prior for f
6
Temperature is
spatially correlated
Gaussian Processes to model payoff f
• Gaussian process (GP) = normal distribution over functions
• Finite marginals are multivariate Gaussians
• Closed form formulae for Bayesian posterior update exist
• Parameterized by covariance function K(x,x’) = Cov(f(x),f(x’))
7
Normal dist.
(1-D Gaussian)
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-2
-1
0
1
2
0
0.1
0.2
0.3
0.4
Multivariate normal
(n-D Gaussian)
+
+
+
+
Gaussian process
(∞-D Gaussian)
8
Thinking about GPs
• Kernel function K(x, x’) specifies covariance
• Encodes smoothness assumptions
x
f(x)
P(f(x))
f(x)
9
Example of GPs
• Squared exponential kernel
K(x,x’) = exp(-(x-x’)2/h2)
0 0.2 0.4 0.6 0.8 1
-4
-3
-2
-1
0
1
2
Bandwidth h=.1
0 100 200 300 400 500 600 700
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Distance |x-x’|
0 0.2 0.4 0.6 0.8 1
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Bandwidth h=.3
Samples from P(f)
-3 -2 -1 0 1 2 3
Gaussian process optimization
[e.g., Jones et al ’98]
10
x
f(x)
Goal: Adaptively pick
inputs such that
Key question: how should we pick samples?
So far, only heuristics:
Expected Improvement [Močkus et al. ‘78]
Most Probable Improvement [Močkus ‘89]
Used successfully in machine learning [Ginsbourger et al. ‘08,
Jones ‘01, Lizotte et al. ’07]
No theoretical guarantees on their regret!
11
Simple algorithm for GP optimization
• In each round t do:
• Pick
• Observe
• Use Bayes’ rule to get posterior mean
Can get stuck in local maxima!
11
x
f(x)
12
Uncertainty sampling
Pick:
That’s equivalent to (greedily) maximizing
information gain
Popular objective in Bayesian experimental design
(where the goal is pure exploration of f)
But…wastes samples by exploring f everywhere!
12
x
f(x)
Avoiding unnecessary samples
Key insight: Never need to sample where Upper
Confidence Bound (UCB) < best lower bound! 13
x
f(x)
Best lower
bound
14
Upper Confidence Bound (UCB) Algorithm
Naturally trades off explore and exploit; no samples wasted
Regret bounds: classic [Auer ’02] & linear f [Dani et al. ‘07]
But none in the GP optimization setting! (popular heuristic)
x
f(x)
Pick input that maximizes Upper Confidence Bound (UCB):
How should
we choose ¯t?
Need theory!
15
How well does UCB work?
• Intuitively, performance should depend on how
“learnable” the function is
15
“Easy” “Hard”
The quicker confidence bands collapse, the easier to learn
Key idea: Rate of collapse  growth of information gain
0 0.2 0.4 0.6 0.8 1
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Bandwidth h=.3
0 0.2 0.4 0.6 0.8 1
-4
-3
-2
-1
0
1
2
Bandwidth h=.1
Learnability and information gain
• We show that regret bounds depend on how quickly we can
gain information
• Mathematically:
• Establishes a novel connection between GP optimization
and Bayesian experimental design
16
T
17
Performance of optimistic sampling
Theorem
If we choose ¯t = £(log t), then with high probability,
Hereby
The slower γT grows, the easier f is to learn
Key question: How quickly does γ T grow?
17
Maximal information gain
due to sampling!
Learnability and information gain
• Information gain exhibits diminishing returns (submodularity)
[Krause & Guestrin ’05]
• Our bounds depend on “rate” of diminishment
18
Little diminishing returns
Returns diminish fast
Dealing with high dimensions
Theorem: For various popular kernels, we have:
• Linear: ;
• Squared-exponential: ;
• Matérn with , ;
Smoothness of f helps battle curse of dimensionality!
Our bounds rely on submodularity of
19
What if f is not from a GP?
• In practice, f may not be Gaussian
Theorem: Let f lie in the RKHS of kernel K with ,
and let the noise be bounded almost surely by .
Choose .Then with high probab.,
• Frees us from knowing the “true prior”
• Intuitively, the bound depends on the “complexity” of
the function through its RKHS norm
20
Experiments: UCB vs. heuristics
• Temperature data
• 46 sensors deployed at Intel Research, Berkeley
• Collected data for 5 days (1 sample/minute)
• Want to adaptively find highest temperature as quickly as
possible
• Traffic data
• Speed data from 357 sensors deployed along highway I-880
South
• Collected during 6am-11am, for one month
• Want to find most congested (lowest speed) area as quickly as
possible
21
Comparison: UCB vs. heuristics
22
GP-UCB compares favorably with existing heuristics
23
Assumptions on f
Linear?
[Dani et al, ’07]
Lipschitz-continuous
(bounded slope)
[Kleinberg ‘08]
Fast convergence;
But strong assumption
Very flexible, but
Conclusions
• First theoretical guarantees and convergence rates
for GP optimization
• Both true prior and agnostic case covered
• Performance depends on “learnability”, captured by
maximal information gain
• Connects GP Bandit Optimization & Experimental Design!
• Performance on real data comparable to other heuristics
24

More Related Content

PDF
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
PDF
On the value of stochastic analysis for software engineering
PDF
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
PDF
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
PPTX
Deep learning Unit1 BasicsAllllllll.pptx
PPT
ch09-Simulation.ppt
PPT
Heuristic search problem-solving str.ppt
PPTX
k-Nearest Neighbors with brief explanation.pptx
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
On the value of stochastic analysis for software engineering
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Deep learning Unit1 BasicsAllllllll.pptx
ch09-Simulation.ppt
Heuristic search problem-solving str.ppt
k-Nearest Neighbors with brief explanation.pptx

Similar to GAUSSIAN PRESENTATION.ppt (20)

PDF
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
PDF
Hands-on - Machine Learning using scikitLearn
PPTX
Problem Solving Based on Searching Module 2
PDF
eviewsOLSMLE
PPTX
Seminar nov2017
PDF
Software Testing:
 A Research Travelogue 
(2000–2014)
PPT
AL slides.ppt
PPTX
Module_2_rks in Artificial intelligence in Expert System
PPTX
linearly separable and therefore a set of weights exist that are consistent ...
PDF
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
PDF
Deep Learning for Cyber Security
PDF
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
PPTX
Clustering - ACM 2013 02-25
PPT
HEURISTIC SEARCH IN ARTIFICIAL INTELLEGENCE
PPT
What is A-Star (A*) Algorithm in Artificial astar.ppt
PDF
Introduction to Big Data Science
PDF
Genetic programming
PPTX
ObjRecog2-17 (1).pptx
PPTX
Artificial Intelligence for Robotic AIFR
PDF
Frequency-based Constraint Relaxation for Private Query Processing in Cloud D...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
Hands-on - Machine Learning using scikitLearn
Problem Solving Based on Searching Module 2
eviewsOLSMLE
Seminar nov2017
Software Testing:
 A Research Travelogue 
(2000–2014)
AL slides.ppt
Module_2_rks in Artificial intelligence in Expert System
linearly separable and therefore a set of weights exist that are consistent ...
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Deep Learning for Cyber Security
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
Clustering - ACM 2013 02-25
HEURISTIC SEARCH IN ARTIFICIAL INTELLEGENCE
What is A-Star (A*) Algorithm in Artificial astar.ppt
Introduction to Big Data Science
Genetic programming
ObjRecog2-17 (1).pptx
Artificial Intelligence for Robotic AIFR
Frequency-based Constraint Relaxation for Private Query Processing in Cloud D...
Ad

Recently uploaded (20)

PDF
Sciences of Europe No 170 (2025)
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPT
protein biochemistry.ppt for university classes
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
2. Earth - The Living Planet earth and life
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
An interstellar mission to test astrophysical black holes
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
Sciences of Europe No 170 (2025)
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
Microbiology with diagram medical studies .pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
Biophysics 2.pdffffffffffffffffffffffffff
protein biochemistry.ppt for university classes
2. Earth - The Living Planet Module 2ELS
2. Earth - The Living Planet earth and life
neck nodes and dissection types and lymph nodes levels
An interstellar mission to test astrophysical black holes
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
7. General Toxicologyfor clinical phrmacy.pptx
Ad

GAUSSIAN PRESENTATION.ppt

  • 1. 1 Multi-armed bandits • At each time t pick arm i; • get independent payoff ft with mean ui • Classic model for exploration – exploitation tradeoff • Extensively studied (Robbins ’52, Gittins ’79) • Typically assume each arm is tried multiple times • Goal: minimize regret … u1 u2 u3 uK 1 [ ] t T opt t i T E f R     
  • 2. 2 Infinite-armed bandits … p1 p2 p3 pk … p∞ p1 p2 … In many applications, number of arms is huge (sponsored search, sensor selection) Cannot try each arm even once Assumptions on payoff function f essential
  • 3. Optimizing Noisy, Unknown Functions • Given: Set of possible inputs D; black-box access to unknown function f • Want: Adaptive choice of inputs from D maximizing • Many applications: robotic control [Lizotte et al. ’07], sponsored search [Pande & Olston, ’07], clinical trials, … • Sampling is expensive • Algorithms evaluated using regret Goal: minimize
  • 4. Running example: Noisy Search • How to find the hottest point in a building? • Many noisy sensors available but sampling is expensive • D: set of sensors; : temperature at chosen at step i Observe • Goal: Find with minimal number of queries 4
  • 5. Relating to us: Active learning for PMF A bandit setting for movie recommendation Task: recommend movies for a new user M-armed Bandit Movie item as arm of bandit For a new user i At each round t, pick a movie j Observe a rating Xij Goal: maximize cumulative reward sum of the ratings of all recommended movies Model: PMF X=UV+E, where U: N*K matrix, V: K*M matrix, E: N*M matrix, zero-mean normal distributed Assume movie feature V is fully observed. User feature Ui is unknown at first Xi(j) = Ui Vj + ε (regard the ith row vector of X as a function Xi) Xi(.): random linear function 5
  • 6. Key insight: Exploit correlation • Sampling f(x) at one point x yields information about f(x’) for points x’ near x • In this paper: Model correlation using a Gaussian process (GP) prior for f 6 Temperature is spatially correlated
  • 7. Gaussian Processes to model payoff f • Gaussian process (GP) = normal distribution over functions • Finite marginals are multivariate Gaussians • Closed form formulae for Bayesian posterior update exist • Parameterized by covariance function K(x,x’) = Cov(f(x),f(x’)) 7 Normal dist. (1-D Gaussian) -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1 0 1 2 0 0.1 0.2 0.3 0.4 Multivariate normal (n-D Gaussian) + + + + Gaussian process (∞-D Gaussian)
  • 8. 8 Thinking about GPs • Kernel function K(x, x’) specifies covariance • Encodes smoothness assumptions x f(x) P(f(x)) f(x)
  • 9. 9 Example of GPs • Squared exponential kernel K(x,x’) = exp(-(x-x’)2/h2) 0 0.2 0.4 0.6 0.8 1 -4 -3 -2 -1 0 1 2 Bandwidth h=.1 0 100 200 300 400 500 600 700 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Distance |x-x’| 0 0.2 0.4 0.6 0.8 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 Bandwidth h=.3 Samples from P(f) -3 -2 -1 0 1 2 3
  • 10. Gaussian process optimization [e.g., Jones et al ’98] 10 x f(x) Goal: Adaptively pick inputs such that Key question: how should we pick samples? So far, only heuristics: Expected Improvement [Močkus et al. ‘78] Most Probable Improvement [Močkus ‘89] Used successfully in machine learning [Ginsbourger et al. ‘08, Jones ‘01, Lizotte et al. ’07] No theoretical guarantees on their regret!
  • 11. 11 Simple algorithm for GP optimization • In each round t do: • Pick • Observe • Use Bayes’ rule to get posterior mean Can get stuck in local maxima! 11 x f(x)
  • 12. 12 Uncertainty sampling Pick: That’s equivalent to (greedily) maximizing information gain Popular objective in Bayesian experimental design (where the goal is pure exploration of f) But…wastes samples by exploring f everywhere! 12 x f(x)
  • 13. Avoiding unnecessary samples Key insight: Never need to sample where Upper Confidence Bound (UCB) < best lower bound! 13 x f(x) Best lower bound
  • 14. 14 Upper Confidence Bound (UCB) Algorithm Naturally trades off explore and exploit; no samples wasted Regret bounds: classic [Auer ’02] & linear f [Dani et al. ‘07] But none in the GP optimization setting! (popular heuristic) x f(x) Pick input that maximizes Upper Confidence Bound (UCB): How should we choose ¯t? Need theory!
  • 15. 15 How well does UCB work? • Intuitively, performance should depend on how “learnable” the function is 15 “Easy” “Hard” The quicker confidence bands collapse, the easier to learn Key idea: Rate of collapse  growth of information gain 0 0.2 0.4 0.6 0.8 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 Bandwidth h=.3 0 0.2 0.4 0.6 0.8 1 -4 -3 -2 -1 0 1 2 Bandwidth h=.1
  • 16. Learnability and information gain • We show that regret bounds depend on how quickly we can gain information • Mathematically: • Establishes a novel connection between GP optimization and Bayesian experimental design 16 T
  • 17. 17 Performance of optimistic sampling Theorem If we choose ¯t = £(log t), then with high probability, Hereby The slower γT grows, the easier f is to learn Key question: How quickly does γ T grow? 17 Maximal information gain due to sampling!
  • 18. Learnability and information gain • Information gain exhibits diminishing returns (submodularity) [Krause & Guestrin ’05] • Our bounds depend on “rate” of diminishment 18 Little diminishing returns Returns diminish fast
  • 19. Dealing with high dimensions Theorem: For various popular kernels, we have: • Linear: ; • Squared-exponential: ; • Matérn with , ; Smoothness of f helps battle curse of dimensionality! Our bounds rely on submodularity of 19
  • 20. What if f is not from a GP? • In practice, f may not be Gaussian Theorem: Let f lie in the RKHS of kernel K with , and let the noise be bounded almost surely by . Choose .Then with high probab., • Frees us from knowing the “true prior” • Intuitively, the bound depends on the “complexity” of the function through its RKHS norm 20
  • 21. Experiments: UCB vs. heuristics • Temperature data • 46 sensors deployed at Intel Research, Berkeley • Collected data for 5 days (1 sample/minute) • Want to adaptively find highest temperature as quickly as possible • Traffic data • Speed data from 357 sensors deployed along highway I-880 South • Collected during 6am-11am, for one month • Want to find most congested (lowest speed) area as quickly as possible 21
  • 22. Comparison: UCB vs. heuristics 22 GP-UCB compares favorably with existing heuristics
  • 23. 23 Assumptions on f Linear? [Dani et al, ’07] Lipschitz-continuous (bounded slope) [Kleinberg ‘08] Fast convergence; But strong assumption Very flexible, but
  • 24. Conclusions • First theoretical guarantees and convergence rates for GP optimization • Both true prior and agnostic case covered • Performance depends on “learnability”, captured by maximal information gain • Connects GP Bandit Optimization & Experimental Design! • Performance on real data comparable to other heuristics 24

Editor's Notes

  • #2: Explanation of k-armed bandit ! 
  • #4: Repeat what f is – give an example !
  • #5: Floorplan looks funny (pixelated)
  • #6: Floorplan looks funny (pixelated)
  • #17: Add cartoon plot for \gamma_T; need axes, etc.