Ensemble Contextual Bandits for Personalized Recommendation

Ensemble
Contextual
Bandits
for

Personalized
Recommenda8on

Liang
Tang,
Yexi
Jiang,
Lei
Li,
Tao
Li

Florida
Interna8onal
University

10/7/14
ACM
RecSys
2014
1

Cold
Start
Problem
for
Learning
based

Recommenda8on

•  Issue:
Do
not
have
enough
appropriate
data.

– Historical
user
log
data
is
biased.

– User
interest
may
change
over
8me.

– New
items
(or
users)
are
added.

•  Approach:
Exploita8on
and
Explora8on

– Contextual
Mul8-‐Arm
Bandit
Algorithm

10/7/14
ACM
RecSys
2014
2

The
contextual
informa8on
are
item
features
and
user
features

Contextual
Bandit
Algorithm
with

Personalized
Recommenda8on

•  Contextual
Bandit

–  Let
a1,
…,
am
be
a
set
of
arms.

–  Given
a
context
xt,
the
model
decides
which
arm
to
pull.

–  AYer
each
pull,
you
receive
a
random
reward,
which
is

determined
by
the
pulled
arm
and
xt.

–  Goal:
maximize
the
total
received
reward.

•  Online
Recommenda8on

–  Arm

à
Item

Pull

à
Recommend

–  Context

à
User
feature

–  Reward

à
Click

10/7/14
ACM
RecSys
2014
3

Problem
Statement

•  Problem
Se3ng:
have
many
different

recommenda8on
models
(or
policies):

–  Different
CTR
Predic8on
Algorithms.

–  Different
Explora8on-‐Exploita8on
Algorithms.

–  Different
Parameter
Choices.

•  No
data
to
do
model
valida;on

•  Problem
Statement:
how
to
build
an
ensemble

model
that
is
close
to
the
best
model
in
the
cold

start
situa8on
?

10/7/14
ACM
RecSys
2014
4

How
Ensemble?

•  Classiﬁer
ensemble
method
does
not
work
in

this
seang

– Recommenda8on
decision
is
NOT
purely
based
on

the
predicted
CTR.

•  Each
individual
model
only
tells
us:

– Which
item
to
recommend.

10/7/14
ACM
RecSys
2014
5

Ensemble
Method

•  Our
Method:

– Allocate
recommenda8on
chances
to
individual

models.

•  Problem:

– Beder
models
should
have
more
chances.

– We
do
not
know
which
one
is
good
or
bad
in

advance.

– Ideal
solu8on:
allocate
all
chances
to
the
best
one.

10/7/14
ACM
RecSys
2014
6

Current
Prac8ce:
Online
Evalua8on
(or

A/B
tes8ng)

Let
π1,
π2

…
πm
be
the
individual
models.

1.  Deploy
π1,
π2

…
πm
into
the
online
system
at
the

same
8me.

2.  Dispatch
a
small
percent
user
traﬃc
to
each
model.

3.  AYer
a
period,
choose
the
model
having
the
best

CTR
as
the
produc8on
model.

10/7/14
ACM
RecSys
2014
7

Current
Prac8ce:
Online
Evalua8on
(or

A/B
tes8ng)

Let
π1,
π2

…
πm
be
the
individual
models.

1.  Deploy
π1,
π2

…
πm
into
the
online
system
at
the

same
8me.

2.  Dispatch
a
small
percent
user
traﬃc
to
each
model.

3.  AYer
a
period,
choose
the
model
having
the
best

CTR
as
the
produc8on
model.

10/7/14
ACM
RecSys
2014
8

If
we
have
too
many
models,
this
will
hurt
the

performance
of
the
online
system.

Our
Idea
1
(HyperTS)

•  The
CTR
of
model
πi
is
a
random
unknown
variable,
Ri
.

•  Goal:

–  maximize

,

rt
is
a
random
number
drawn
from
Rs(t),
s(t)=1,2,…,
or
m.

For
each
t=1,…,N,
we
decide
s(t).

•  Solu;on:

–  Bernoulli
Thompson
Sampling
(ﬂat
prior:
beta(1,1))
.

–  π1,
π2

…
πm
are
bandit
arms.

10/7/14
ACM
RecSys
2014
9

1
N
rt
t=1
N
∑
CTR
of
our
ensemble

model

No
tricky
parameters

An
Example
of
HyperTS

10/7/14
ACM
RecSys
2014
10

In
memory,
we
keep
these

es8mated
CTRs
for
π1,
π2

…
πm.

R1

R2

Rk

…

Rm

…

An
Example
of
HyperTS

10/7/14
ACM
RecSys
2014
11

A
user
visit

HyperTS
selects
a

candidate
model,
πk
.

R1

R2

Rk

…

Es8mated
CTRs

Rm

…

An
Example
of
HyperTS

10/7/14
ACM
RecSys
2014
12

A
user
visit

HyperTS
selects
a

candidate
model,
πk
.

πk
recommends
item
A
to

the
user.

A

xt::
context

features

Es8mated
CTRs

R1

R2

Rk

…

Rm

…

An
Example
of
HyperTS

10/7/14
ACM
RecSys
2014
13

A
user
visit

HyperTS
selects
a

candidate
model,
πk
.

πk
recommends
item
A
to

the
user.

A

xt::
context

features

rt
:click
or
not

HyperTS
updates
the

es8ma8on
of
Rk
based
on
rt.

Es8mated
CTRs

R1

R2

Rk

…

Rm

…

update

Two-‐Layer
Decision

10/7/14
ACM
RecSys
2014
14

Bernoulli
Thompson

Sampling

π1
π2
πm
πk

Item
A
Item
B
Item
C

…
…

Our
Idea
2
(HyperTSFB)

•  Limita8on
of
Previous
Idea:

– For
each
recommenda8on,
user
feedback
is
used

by
only
one
individual
model
(e.g.,
πk).

•  Mo8va8on:

– Can
we
update
all
R1,
R2,
…,
Rm
by
every
user

feedback?
(Share
every
user
feedback
to
every

individual
model).

10/7/14
ACM
RecSys
2014
15

Our
Idea
2
(HyperTSFB)

•  Assume
each
model
can
output
the
probability
of

recommending
any
item
given
xt.

–  E.g.,
for
determinis8c
recommenda8on,
it
is
1
or
0.

•  For
a
user
visit
xt:

1.  πk
is
selected
to
perform
recommenda8on
(k=1,2,…,
or

m).

2.  Item
A
is
recommended
by
πk
given
xt.

3.  Receive
a
user
feedback
(click
or
not
click),
rt.

4.  Ask
every
model
π1,
π2

…
πm,
what
is
the
probability
of

recommending
A
given
xt.

10/7/14
ACM
RecSys
2014
16

Our
Idea
2
(HyperTSFB)

•  Assume
each
model
can
output
the
probability
of

recommending
any
item
given
xt.

–  E.g.,
for
determinis8c
recommenda8on,
it
is
1
or
0.

•  For
a
user
visit
xt:

1.  πk
is
selected
to
perform
recommenda8on
(k=1,2,…,
or

m).

2.  Item
A
is
recommended
by
πk
given
xt.

3.  Receive
a
user
feedback
(click
or
not
click),
rt.

4.  Ask
every
model
π1,
π2

…
πm,
what
is
the
probability
of

recommending
A
given
xt.

10/7/14
ACM
RecSys
2014
17

Es;mate
the
CTR
of

π1,
π2

…
πm

(Importance
Sampling)

Experimental
Setup

•  Experimental
Data

–  Yahoo!
Today
News
data
logs
(randomly
displayed).

–  KDD
Cup
2012
Online
Adver8sing
data
set.

•  Evalua;on
Methods

–  Yahoo!
Today
News:
Replay
(see
Lihong
Li
et.
al’s
WSDM
2011

paper).

–  KDD
Cup
2012
Data:
Simula>on
by
a
Logis8c

Regression
Model.

10/7/14
ACM
RecSys
2014
18

Compara8ve
Methods

•  CTR
Predic8on
Algorithm

– Logis8c
Regression

•  Exploita8on-‐Explora8on
Algorithms

– Random,
ε-‐greedy,
LinUCB,
SoYmax,
Epoch-‐
greedy,
Thompson
sampling

•  HyperTS
and
HyperTSFB

10/7/14
ACM
RecSys
2014
19

Results
for
Yahoo!
News
Data

•  Every
100,000
impressions
are
aggregated
into
a
bucket.

10/7/14
ACM
RecSys
2014
20

Results
for
Yahoo!
News
Data
(Cont.)

10/7/14
ACM
RecSys
2014
21

Conclusions
for
Experimental
Results

1.  The
performance
of
baseline
exploita8on-‐explora8on
algorithms

is
very
sensi8ve
to
the
parameter
seang.

–  In
cold-‐start
situa8on,
no
enough
data
to
tune
parameter.

2.  HyperTS
and
HyperTSFB
can
be
close
to
the
op8mal
baseline

algorithm
(No
guarantee
be
beder
than
the
op8mal
one),
even

though
some
bad
individual
models
are
included.

3.  For
contextual
Thompson
sampling,
the
performance
depends
on

the
choice
of
prior
distribu8on
for
the
logis8c
regression.

–  For
online
Bayesian
learning,
the
posterior
distribu8on
approxima8on
is

not
accurate(cannot
store
the
past
data).

10/7/14
ACM
RecSys
2014
22

Ques8on
&
Thank
you

•  Thank
you!

•  Ques8on?

10/7/14
ACM
RecSys
2014
23

Ensemble Contextual Bandits for Personalized Recommendation

More Related Content

Similar to Ensemble Contextual Bandits for Personalized Recommendation (20)

Recently uploaded (20)

Ensemble Contextual Bandits for Personalized Recommendation