Ensemble	
  Contextual	
  Bandits	
  for	
  
Personalized	
  Recommenda8on	
  
Liang	
  Tang,	
  Yexi	
  Jiang,	
  Lei	
  Li,	
  Tao	
  Li	
  
Florida	
  Interna8onal	
  University	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   1	
  
Cold	
  Start	
  Problem	
  for	
  Learning	
  based	
  
Recommenda8on	
  
•  Issue:	
  Do	
  not	
  have	
  enough	
  appropriate	
  data.	
  
– Historical	
  user	
  log	
  data	
  is	
  biased.	
  
– User	
  interest	
  may	
  change	
  over	
  8me.	
  
– New	
  items	
  (or	
  users)	
  are	
  added.	
  
•  Approach:	
  Exploita8on	
  and	
  Explora8on	
  
– Contextual	
  Mul8-­‐Arm	
  Bandit	
  Algorithm	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   2	
  
The	
  contextual	
  informa8on	
  are	
  item	
  features	
  and	
  user	
  features	
  
Contextual	
  Bandit	
  Algorithm	
  with	
  
Personalized	
  Recommenda8on	
  
•  Contextual	
  Bandit	
  
–  Let	
  a1,	
  …,	
  am	
  be	
  a	
  set	
  of	
  arms.	
  
–  Given	
  a	
  context	
  xt,	
  the	
  model	
  decides	
  which	
  arm	
  to	
  pull.	
  
–  AYer	
  each	
  pull,	
  you	
  receive	
  a	
  random	
  reward,	
  which	
  is	
  
determined	
  by	
  the	
  pulled	
  arm	
  and	
  xt.	
  
–  Goal:	
  maximize	
  the	
  total	
  received	
  reward.	
  
•  Online	
  Recommenda8on	
  
–  Arm	
   	
   	
  à	
  Item 	
   	
   	
  Pull	
  	
  à	
  Recommend	
  
–  Context	
  	
   	
  à	
  User	
  feature	
  	
  
–  Reward 	
   	
  à	
  Click	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   3	
  
Problem	
  Statement	
  
•  Problem	
  Se3ng:	
  have	
  many	
  different	
  
recommenda8on	
  models	
  (or	
  policies):	
  
–  Different	
  CTR	
  Predic8on	
  Algorithms.	
  
–  Different	
  Explora8on-­‐Exploita8on	
  Algorithms.	
  
–  Different	
  Parameter	
  Choices.	
  
•  No	
  data	
  to	
  do	
  model	
  valida;on	
  
•  Problem	
  Statement:	
  how	
  to	
  build	
  an	
  ensemble	
  
model	
  that	
  is	
  close	
  to	
  the	
  best	
  model	
  in	
  the	
  cold	
  
start	
  situa8on	
  ?	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   4	
  
How	
  Ensemble?	
  
•  Classifier	
  ensemble	
  method	
  does	
  not	
  work	
  in	
  
this	
  seang	
  
– Recommenda8on	
  decision	
  is	
  NOT	
  purely	
  based	
  on	
  
the	
  predicted	
  CTR.	
  
•  Each	
  individual	
  model	
  only	
  tells	
  us:	
  
– Which	
  item	
  to	
  recommend.	
  
	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   5	
  
Ensemble	
  Method	
  
•  Our	
  Method:	
  
– Allocate	
  recommenda8on	
  chances	
  to	
  individual	
  
models.	
  
•  Problem:	
  
– Beder	
  models	
  should	
  have	
  more	
  chances.	
  
– We	
  do	
  not	
  know	
  which	
  one	
  is	
  good	
  or	
  bad	
  in	
  
advance.	
  
– Ideal	
  solu8on:	
  allocate	
  all	
  chances	
  to	
  the	
  best	
  one.	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   6	
  
Current	
  Prac8ce:	
  Online	
  Evalua8on	
  (or	
  
A/B	
  tes8ng)	
  
Let	
  π1,	
  π2	
  	
  …	
  πm	
  be	
  the	
  individual	
  models.	
  
1.  Deploy	
  π1,	
  π2	
  	
  …	
  πm	
  into	
  the	
  online	
  system	
  at	
  the	
  
same	
  8me.	
  	
  
2.  Dispatch	
  a	
  small	
  percent	
  user	
  traffic	
  to	
  each	
  model.	
  
3.  AYer	
  a	
  period,	
  choose	
  the	
  model	
  having	
  the	
  best	
  
CTR	
  as	
  the	
  produc8on	
  model.	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   7	
  
Current	
  Prac8ce:	
  Online	
  Evalua8on	
  (or	
  
A/B	
  tes8ng)	
  
Let	
  π1,	
  π2	
  	
  …	
  πm	
  be	
  the	
  individual	
  models.	
  
1.  Deploy	
  π1,	
  π2	
  	
  …	
  πm	
  into	
  the	
  online	
  system	
  at	
  the	
  
same	
  8me.	
  	
  
2.  Dispatch	
  a	
  small	
  percent	
  user	
  traffic	
  to	
  each	
  model.	
  
3.  AYer	
  a	
  period,	
  choose	
  the	
  model	
  having	
  the	
  best	
  
CTR	
  as	
  the	
  produc8on	
  model.	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   8	
  
If	
  we	
  have	
  too	
  many	
  models,	
  this	
  will	
  hurt	
  the	
  
performance	
  of	
  the	
  online	
  system.	
  	
  
Our	
  Idea	
  1	
  (HyperTS)	
  
•  The	
  CTR	
  of	
  model	
  πi	
  is	
  a	
  random	
  unknown	
  variable,	
  Ri	
  .	
  	
  
•  Goal:	
  	
  
–  maximize	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ,	
  
rt	
  is	
  a	
  random	
  number	
  drawn	
  from	
  Rs(t),	
  s(t)=1,2,…,	
  or	
  m.	
  
For	
  each	
  t=1,…,N,	
  we	
  decide	
  s(t).	
  
•  Solu;on:	
  
–  Bernoulli	
  Thompson	
  Sampling	
  (flat	
  prior:	
  beta(1,1))	
  .	
  
–  π1,	
  π2	
  	
  …	
  πm	
  are	
  bandit	
  arms.	
  	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   9	
  
1
N
rt
t=1
N
∑
CTR	
  of	
  our	
  ensemble	
  
model	
  
No	
  tricky	
  parameters	
  
An	
  Example	
  of	
  HyperTS	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   10	
  
In	
  memory,	
  we	
  keep	
  these	
  
es8mated	
  CTRs	
  for	
  π1,	
  π2	
  	
  …	
  πm.	
  
R1	
  
R2	
  
Rk	
  
…	
  
Rm	
  
…	
  
An	
  Example	
  of	
  HyperTS	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   11	
  
A	
  user	
  visit	
  
HyperTS	
  selects	
  a	
  
candidate	
  model,	
  πk	
  .	
  
R1	
  
R2	
  
Rk	
  
…	
  
Es8mated	
  CTRs	
  
Rm	
  
…	
  
An	
  Example	
  of	
  HyperTS	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   12	
  
A	
  user	
  visit	
  
HyperTS	
  selects	
  a	
  
candidate	
  model,	
  πk	
  .	
  
πk	
  recommends	
  item	
  A	
  to	
  
the	
  user.	
  
A	
  
xt::	
  context	
  
	
  features	
  
Es8mated	
  CTRs	
  
R1	
  
R2	
  
Rk	
  
…	
  
Rm	
  
…	
  
An	
  Example	
  of	
  HyperTS	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   13	
  
A	
  user	
  visit	
  
HyperTS	
  selects	
  a	
  
candidate	
  model,	
  πk	
  .	
  
πk	
  recommends	
  item	
  A	
  to	
  
the	
  user.	
  
A	
  
xt::	
  context	
  
	
  features	
  
rt	
  :click	
  or	
  not	
  
HyperTS	
  updates	
  the	
  
es8ma8on	
  of	
  Rk	
  based	
  on	
  rt.	
  
Es8mated	
  CTRs	
  
R1	
  
R2	
  
Rk	
  
…	
  
Rm	
  
…	
  
update	
  
Two-­‐Layer	
  Decision	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   14	
  
Bernoulli	
  Thompson	
  
Sampling	
  
π1	
   π2	
   πm	
  πk	
  
Item	
  A	
   Item	
  B	
   Item	
  C	
  
…	
   …	
  
Our	
  Idea	
  2	
  (HyperTSFB)	
  
•  Limita8on	
  of	
  Previous	
  Idea:	
  
– For	
  each	
  recommenda8on,	
  user	
  feedback	
  is	
  used	
  
by	
  only	
  one	
  individual	
  model	
  (e.g.,	
  πk).	
  
	
  
•  Mo8va8on:	
  
– Can	
  we	
  update	
  all	
  R1,	
  R2,	
  …,	
  Rm	
  by	
  every	
  user	
  
feedback?	
  (Share	
  every	
  user	
  feedback	
  to	
  every	
  
individual	
  model).	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   15	
  
Our	
  Idea	
  2	
  (HyperTSFB)	
  
•  Assume	
  each	
  model	
  can	
  output	
  the	
  probability	
  of	
  
recommending	
  any	
  item	
  given	
  xt.	
  
–  E.g.,	
  for	
  determinis8c	
  recommenda8on,	
  it	
  is	
  1	
  or	
  0.	
  
•  For	
  a	
  user	
  visit	
  xt:	
  
1.  πk	
  is	
  selected	
  to	
  perform	
  recommenda8on	
  (k=1,2,…,	
  or	
  
m).	
  	
  
2.  Item	
  A	
  is	
  recommended	
  by	
  πk	
  given	
  xt.	
  	
  
3.  Receive	
  a	
  user	
  feedback	
  (click	
  or	
  not	
  click),	
  rt.	
  
4.  Ask	
  every	
  model	
  π1,	
  π2	
  	
  …	
  πm,	
  what	
  is	
  the	
  probability	
  of	
  
recommending	
  A	
  given	
  xt.	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   16	
  
Our	
  Idea	
  2	
  (HyperTSFB)	
  
•  Assume	
  each	
  model	
  can	
  output	
  the	
  probability	
  of	
  
recommending	
  any	
  item	
  given	
  xt.	
  
–  E.g.,	
  for	
  determinis8c	
  recommenda8on,	
  it	
  is	
  1	
  or	
  0.	
  
•  For	
  a	
  user	
  visit	
  xt:	
  
1.  πk	
  is	
  selected	
  to	
  perform	
  recommenda8on	
  (k=1,2,…,	
  or	
  
m).	
  	
  
2.  Item	
  A	
  is	
  recommended	
  by	
  πk	
  given	
  xt.	
  	
  
3.  Receive	
  a	
  user	
  feedback	
  (click	
  or	
  not	
  click),	
  rt.	
  
4.  Ask	
  every	
  model	
  π1,	
  π2	
  	
  …	
  πm,	
  what	
  is	
  the	
  probability	
  of	
  
recommending	
  A	
  given	
  xt.	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   17	
  
Es;mate	
  the	
  CTR	
  of	
  	
  π1,	
  π2	
  	
  …	
  πm	
  	
  
(Importance	
  Sampling)	
  
Experimental	
  Setup	
  
•  Experimental	
  Data	
  
–  Yahoo!	
  Today	
  News	
  data	
  logs	
  (randomly	
  displayed).	
  
–  KDD	
  Cup	
  2012	
  Online	
  Adver8sing	
  data	
  set.	
  
•  Evalua;on	
  Methods	
  
–  Yahoo!	
  Today	
  News:	
  Replay	
  (see	
  Lihong	
  Li	
  et.	
  al’s	
  WSDM	
  2011	
  
paper).	
  
	
  
–  KDD	
  Cup	
  2012	
  Data:	
  Simula>on	
  by	
  a	
  Logis8c	
  
Regression	
  Model.	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   18	
  
Compara8ve	
  Methods	
  
•  CTR	
  Predic8on	
  Algorithm	
  
– Logis8c	
  Regression	
  
•  Exploita8on-­‐Explora8on	
  Algorithms	
  
– Random,	
  ε-­‐greedy,	
  LinUCB,	
  SoYmax,	
  Epoch-­‐
greedy,	
  Thompson	
  sampling	
  
•  HyperTS	
  and	
  HyperTSFB	
  
	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   19	
  
Results	
  for	
  Yahoo!	
  News	
  Data	
  
•  Every	
  100,000	
  impressions	
  are	
  aggregated	
  into	
  a	
  bucket.	
  
	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   20	
  
Results	
  for	
  Yahoo!	
  News	
  Data	
  (Cont.)	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   21	
  
Conclusions	
  for	
  Experimental	
  Results	
  
1.  The	
  performance	
  of	
  baseline	
  exploita8on-­‐explora8on	
  algorithms	
  
is	
  very	
  sensi8ve	
  to	
  the	
  parameter	
  seang.	
  
–  In	
  cold-­‐start	
  situa8on,	
  no	
  enough	
  data	
  to	
  tune	
  parameter.	
  
2.  HyperTS	
  and	
  HyperTSFB	
  can	
  be	
  close	
  to	
  the	
  op8mal	
  baseline	
  
algorithm	
  (No	
  guarantee	
  be	
  beder	
  than	
  the	
  op8mal	
  one),	
  even	
  
though	
  some	
  bad	
  individual	
  models	
  are	
  included.	
  
3.  For	
  contextual	
  Thompson	
  sampling,	
  the	
  performance	
  depends	
  on	
  
the	
  choice	
  of	
  prior	
  distribu8on	
  for	
  the	
  logis8c	
  regression.	
  
–  For	
  online	
  Bayesian	
  learning,	
  the	
  posterior	
  distribu8on	
  approxima8on	
  is	
  
not	
  accurate(cannot	
  store	
  the	
  past	
  data).	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   22	
  
Ques8on	
  &	
  Thank	
  you	
  
•  Thank	
  you!	
  
•  Ques8on?	
  
10/7/14	
   ACM	
  RecSys	
  2014	
   23	
  

More Related Content

PPTX
A Multi-armed Bandit Approach to Online Spatial Task Assignment
PDF
Contextual Bandit Survey
PPTX
BanditProblems_final
PDF
Multi-Armed Bandits:
 Intro, examples and tricks
PPTX
Bandit algorithms
PDF
Multi-Armed Bandit and Applications
PPTX
multi-armed bandit
PDF
Multi Armed Bandits and Optimized Online Marketing
A Multi-armed Bandit Approach to Online Spatial Task Assignment
Contextual Bandit Survey
BanditProblems_final
Multi-Armed Bandits:
 Intro, examples and tricks
Bandit algorithms
Multi-Armed Bandit and Applications
multi-armed bandit
Multi Armed Bandits and Optimized Online Marketing

Similar to Ensemble Contextual Bandits for Personalized Recommendation (20)

PDF
GTC 2021: Counterfactual Learning to Rank in E-commerce
PDF
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
PDF
HOP-Rec_RecSys18
PDF
Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
PPTX
Reasesrty djhjan S - explanation required.pptx
PDF
Recommender Systems from A to Z – Model Evaluation
PDF
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
PPTX
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
PDF
Tutorial: Context In Recommender Systems
PPTX
Recommendation system
PPT
Recommender Systems Tutorial (Part 3) -- Online Components
PDF
Research on Recommender Systems: Beyond Ratings and Lists
PDF
2017 10-10 (netflix ml platform meetup) learning item and user representation...
PDF
Comparing Machine Learning Algorithms in Text Mining
PDF
Recommendation System Explained
PDF
ESSIR 2013 Recommender Systems tutorial
PDF
Ensemble Methods and Recommender Systems
PPTX
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
PPTX
CSC410-Presentation
PDF
Recsys 2014 Tutorial - The Recommender Problem Revisited
GTC 2021: Counterfactual Learning to Rank in E-commerce
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
HOP-Rec_RecSys18
Highlights from the 8th ACM Conference on Recommender Systems (RecSys 2014)
Reasesrty djhjan S - explanation required.pptx
Recommender Systems from A to Z – Model Evaluation
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
Tutorial: Context In Recommender Systems
Recommendation system
Recommender Systems Tutorial (Part 3) -- Online Components
Research on Recommender Systems: Beyond Ratings and Lists
2017 10-10 (netflix ml platform meetup) learning item and user representation...
Comparing Machine Learning Algorithms in Text Mining
Recommendation System Explained
ESSIR 2013 Recommender Systems tutorial
Ensemble Methods and Recommender Systems
An Example of Predictive Analytics: Building a Recommendation Engine Using Py...
CSC410-Presentation
Recsys 2014 Tutorial - The Recommender Problem Revisited
Ad

Recently uploaded (20)

PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Architecture types and enterprise applications.pdf
PDF
Five Habits of High-Impact Board Members
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Zenith AI: Advanced Artificial Intelligence
PPT
Geologic Time for studying geology for geologist
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Architecture types and enterprise applications.pdf
Five Habits of High-Impact Board Members
NewMind AI Weekly Chronicles – August ’25 Week III
1 - Historical Antecedents, Social Consideration.pdf
The influence of sentiment analysis in enhancing early warning system model f...
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
Taming the Chaos: How to Turn Unstructured Data into Decisions
Zenith AI: Advanced Artificial Intelligence
Geologic Time for studying geology for geologist
Convolutional neural network based encoder-decoder for efficient real-time ob...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Benefits of Physical activity for teenagers.pptx
Flame analysis and combustion estimation using large language and vision assi...
Module 1.ppt Iot fundamentals and Architecture
Custom Battery Pack Design Considerations for Performance and Safety
Comparative analysis of machine learning models for fake news detection in so...
Chapter 5: Probability Theory and Statistics
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Ad

Ensemble Contextual Bandits for Personalized Recommendation

  • 1. Ensemble  Contextual  Bandits  for   Personalized  Recommenda8on   Liang  Tang,  Yexi  Jiang,  Lei  Li,  Tao  Li   Florida  Interna8onal  University   10/7/14   ACM  RecSys  2014   1  
  • 2. Cold  Start  Problem  for  Learning  based   Recommenda8on   •  Issue:  Do  not  have  enough  appropriate  data.   – Historical  user  log  data  is  biased.   – User  interest  may  change  over  8me.   – New  items  (or  users)  are  added.   •  Approach:  Exploita8on  and  Explora8on   – Contextual  Mul8-­‐Arm  Bandit  Algorithm   10/7/14   ACM  RecSys  2014   2   The  contextual  informa8on  are  item  features  and  user  features  
  • 3. Contextual  Bandit  Algorithm  with   Personalized  Recommenda8on   •  Contextual  Bandit   –  Let  a1,  …,  am  be  a  set  of  arms.   –  Given  a  context  xt,  the  model  decides  which  arm  to  pull.   –  AYer  each  pull,  you  receive  a  random  reward,  which  is   determined  by  the  pulled  arm  and  xt.   –  Goal:  maximize  the  total  received  reward.   •  Online  Recommenda8on   –  Arm      à  Item      Pull    à  Recommend   –  Context      à  User  feature     –  Reward    à  Click   10/7/14   ACM  RecSys  2014   3  
  • 4. Problem  Statement   •  Problem  Se3ng:  have  many  different   recommenda8on  models  (or  policies):   –  Different  CTR  Predic8on  Algorithms.   –  Different  Explora8on-­‐Exploita8on  Algorithms.   –  Different  Parameter  Choices.   •  No  data  to  do  model  valida;on   •  Problem  Statement:  how  to  build  an  ensemble   model  that  is  close  to  the  best  model  in  the  cold   start  situa8on  ?   10/7/14   ACM  RecSys  2014   4  
  • 5. How  Ensemble?   •  Classifier  ensemble  method  does  not  work  in   this  seang   – Recommenda8on  decision  is  NOT  purely  based  on   the  predicted  CTR.   •  Each  individual  model  only  tells  us:   – Which  item  to  recommend.     10/7/14   ACM  RecSys  2014   5  
  • 6. Ensemble  Method   •  Our  Method:   – Allocate  recommenda8on  chances  to  individual   models.   •  Problem:   – Beder  models  should  have  more  chances.   – We  do  not  know  which  one  is  good  or  bad  in   advance.   – Ideal  solu8on:  allocate  all  chances  to  the  best  one.   10/7/14   ACM  RecSys  2014   6  
  • 7. Current  Prac8ce:  Online  Evalua8on  (or   A/B  tes8ng)   Let  π1,  π2    …  πm  be  the  individual  models.   1.  Deploy  π1,  π2    …  πm  into  the  online  system  at  the   same  8me.     2.  Dispatch  a  small  percent  user  traffic  to  each  model.   3.  AYer  a  period,  choose  the  model  having  the  best   CTR  as  the  produc8on  model.   10/7/14   ACM  RecSys  2014   7  
  • 8. Current  Prac8ce:  Online  Evalua8on  (or   A/B  tes8ng)   Let  π1,  π2    …  πm  be  the  individual  models.   1.  Deploy  π1,  π2    …  πm  into  the  online  system  at  the   same  8me.     2.  Dispatch  a  small  percent  user  traffic  to  each  model.   3.  AYer  a  period,  choose  the  model  having  the  best   CTR  as  the  produc8on  model.   10/7/14   ACM  RecSys  2014   8   If  we  have  too  many  models,  this  will  hurt  the   performance  of  the  online  system.    
  • 9. Our  Idea  1  (HyperTS)   •  The  CTR  of  model  πi  is  a  random  unknown  variable,  Ri  .     •  Goal:     –  maximize                                ,   rt  is  a  random  number  drawn  from  Rs(t),  s(t)=1,2,…,  or  m.   For  each  t=1,…,N,  we  decide  s(t).   •  Solu;on:   –  Bernoulli  Thompson  Sampling  (flat  prior:  beta(1,1))  .   –  π1,  π2    …  πm  are  bandit  arms.     10/7/14   ACM  RecSys  2014   9   1 N rt t=1 N ∑ CTR  of  our  ensemble   model   No  tricky  parameters  
  • 10. An  Example  of  HyperTS   10/7/14   ACM  RecSys  2014   10   In  memory,  we  keep  these   es8mated  CTRs  for  π1,  π2    …  πm.   R1   R2   Rk   …   Rm   …  
  • 11. An  Example  of  HyperTS   10/7/14   ACM  RecSys  2014   11   A  user  visit   HyperTS  selects  a   candidate  model,  πk  .   R1   R2   Rk   …   Es8mated  CTRs   Rm   …  
  • 12. An  Example  of  HyperTS   10/7/14   ACM  RecSys  2014   12   A  user  visit   HyperTS  selects  a   candidate  model,  πk  .   πk  recommends  item  A  to   the  user.   A   xt::  context    features   Es8mated  CTRs   R1   R2   Rk   …   Rm   …  
  • 13. An  Example  of  HyperTS   10/7/14   ACM  RecSys  2014   13   A  user  visit   HyperTS  selects  a   candidate  model,  πk  .   πk  recommends  item  A  to   the  user.   A   xt::  context    features   rt  :click  or  not   HyperTS  updates  the   es8ma8on  of  Rk  based  on  rt.   Es8mated  CTRs   R1   R2   Rk   …   Rm   …   update  
  • 14. Two-­‐Layer  Decision   10/7/14   ACM  RecSys  2014   14   Bernoulli  Thompson   Sampling   π1   π2   πm  πk   Item  A   Item  B   Item  C   …   …  
  • 15. Our  Idea  2  (HyperTSFB)   •  Limita8on  of  Previous  Idea:   – For  each  recommenda8on,  user  feedback  is  used   by  only  one  individual  model  (e.g.,  πk).     •  Mo8va8on:   – Can  we  update  all  R1,  R2,  …,  Rm  by  every  user   feedback?  (Share  every  user  feedback  to  every   individual  model).   10/7/14   ACM  RecSys  2014   15  
  • 16. Our  Idea  2  (HyperTSFB)   •  Assume  each  model  can  output  the  probability  of   recommending  any  item  given  xt.   –  E.g.,  for  determinis8c  recommenda8on,  it  is  1  or  0.   •  For  a  user  visit  xt:   1.  πk  is  selected  to  perform  recommenda8on  (k=1,2,…,  or   m).     2.  Item  A  is  recommended  by  πk  given  xt.     3.  Receive  a  user  feedback  (click  or  not  click),  rt.   4.  Ask  every  model  π1,  π2    …  πm,  what  is  the  probability  of   recommending  A  given  xt.   10/7/14   ACM  RecSys  2014   16  
  • 17. Our  Idea  2  (HyperTSFB)   •  Assume  each  model  can  output  the  probability  of   recommending  any  item  given  xt.   –  E.g.,  for  determinis8c  recommenda8on,  it  is  1  or  0.   •  For  a  user  visit  xt:   1.  πk  is  selected  to  perform  recommenda8on  (k=1,2,…,  or   m).     2.  Item  A  is  recommended  by  πk  given  xt.     3.  Receive  a  user  feedback  (click  or  not  click),  rt.   4.  Ask  every  model  π1,  π2    …  πm,  what  is  the  probability  of   recommending  A  given  xt.   10/7/14   ACM  RecSys  2014   17   Es;mate  the  CTR  of    π1,  π2    …  πm     (Importance  Sampling)  
  • 18. Experimental  Setup   •  Experimental  Data   –  Yahoo!  Today  News  data  logs  (randomly  displayed).   –  KDD  Cup  2012  Online  Adver8sing  data  set.   •  Evalua;on  Methods   –  Yahoo!  Today  News:  Replay  (see  Lihong  Li  et.  al’s  WSDM  2011   paper).     –  KDD  Cup  2012  Data:  Simula>on  by  a  Logis8c   Regression  Model.   10/7/14   ACM  RecSys  2014   18  
  • 19. Compara8ve  Methods   •  CTR  Predic8on  Algorithm   – Logis8c  Regression   •  Exploita8on-­‐Explora8on  Algorithms   – Random,  ε-­‐greedy,  LinUCB,  SoYmax,  Epoch-­‐ greedy,  Thompson  sampling   •  HyperTS  and  HyperTSFB     10/7/14   ACM  RecSys  2014   19  
  • 20. Results  for  Yahoo!  News  Data   •  Every  100,000  impressions  are  aggregated  into  a  bucket.     10/7/14   ACM  RecSys  2014   20  
  • 21. Results  for  Yahoo!  News  Data  (Cont.)   10/7/14   ACM  RecSys  2014   21  
  • 22. Conclusions  for  Experimental  Results   1.  The  performance  of  baseline  exploita8on-­‐explora8on  algorithms   is  very  sensi8ve  to  the  parameter  seang.   –  In  cold-­‐start  situa8on,  no  enough  data  to  tune  parameter.   2.  HyperTS  and  HyperTSFB  can  be  close  to  the  op8mal  baseline   algorithm  (No  guarantee  be  beder  than  the  op8mal  one),  even   though  some  bad  individual  models  are  included.   3.  For  contextual  Thompson  sampling,  the  performance  depends  on   the  choice  of  prior  distribu8on  for  the  logis8c  regression.   –  For  online  Bayesian  learning,  the  posterior  distribu8on  approxima8on  is   not  accurate(cannot  store  the  past  data).   10/7/14   ACM  RecSys  2014   22  
  • 23. Ques8on  &  Thank  you   •  Thank  you!   •  Ques8on?   10/7/14   ACM  RecSys  2014   23