Sequential Learning in the Position-Based Model

Sequential Learning in
the Position-Based Model
Claire Vernade, Olivier Cappé, Paul Lagrée (Télécom ParisTech)
B.Kveton, S.Katariya, Z.Weng, C.Szepesvàri (Adobe Research, U.Alberta)

-Chris Stucchio
« Don’t use Bandit Algorithms,
they probably don’t work for you.»
Blog de C.Stucchio: https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html

Position-Based Model
1
2
3
4
✓
Xt ⇠ B(1 ⇥ ✓)
Chucklin et al. (2008):
!
Cascade Model,
User Browsing,
DCN,
CCN,
DCM,
…

Multi-Armed Bandit
0,53
0,61
0,42
0,40
0,60
0,55
Unobserved
expected reward
Estimated empirical
averages after a
few pulls

Multi-Armed Bandit
0,53
0,61
0,42
0,40 0,60
0,55
✓1
✓2
✓3

Two Bandit Games
1. Website optimization: You are the website manager
!
2. Add Placement: You want to place the right add in the
right location
1
2
3
4
Balzac
Zola

Website Optimization
At = ( , , , )
✓1✓2 ✓3✓4
✓4rt = 4321 + + +✓2 ✓1 ✓3
Multiple-Plays Bandits in the Position-Based Model. NIPS 2016

The C-KLUCB algorithm
The KL-UCB algorithm for Bounded Stochastic Bandits and Beyond. Cappé, Garivier, COLT 2011

Complexity Theorem (Lower Bound on the Regret)
For any uniformly e cient algorithm, the regret is asymptotically bounded
from below by
For T large enough, R(T) log(T) ⇥ C(, ✓)
102
103
104
Round t
0
20
40
60
80
100
RegretR(T)
Lower Bound
C-KLUCB
Ranked-UCB

Add Placement
0
B
B
@
· · · · ·
· · ✓kl · ·
· · · · ·
· · · · ·
1
C
C
A
✓k
l
1
2
3
4
At = (k, l)
rt = ✓kl
✓1
✓2
✓3
✓4
Stochastic Rank-1 Bandits. AISTATS 2017
KxL arms but K+L parameters !

Add Placement
Stochastic Rank-1 Bandits. AISTATS 2017
lim inf
T !1
R(T)
log(T)
KX
k=2
(✓11 ✓k1)
d(✓k1; ✓11)
+
LX
l=2
(✓11 ✓1l)
d(✓1l; ✓11)
Complexity Theorem (Lower Bound on the Regret)
Ccol(, ✓) Crow(, ✓)+R(T) log(T) ( )
For any uniformly e cient algorithm, the regret is asymptotically bounded
from below by
Which can be rewritten : for any T su ciently large,

Add Placement
BM-KLUCB
Idea : Alternatively explore the rows and the
columns of the matrix using KL-UCB
102
103
104
105
106
Round t
0
20
40
60
80
100
120
140
RegretR(T)
K = 3, L = 3
Lower Bound
R1klucb

Take-Home Message
‘Real-Life’ Bandit Algorithms are getting real… but not yet.
What comes next on Bandit models for recommendation and
conversion optimization : stochastic bandits with delays,
Rank-1 best arm identiﬁcation, higher rank models ?
No free lunch theorems : exploring comes at some price
which depends on the complexity of the problem
Existing ‘super theoretical’ works on bandits provide us
super efﬁcient algorithms in the end…

Sequential Learning in the Position-Based Model

More Related Content

What's hot (12)

Viewers also liked (10)

Similar to Sequential Learning in the Position-Based Model (20)

More from recsysfr (15)

Recently uploaded (20)

Sequential Learning in the Position-Based Model