SlideShare a Scribd company logo
Personalized List
Recommendation based on
Multi-armed Bandit Algorithms
Weiwen LIU
Computer Science & Engineering
Chinese University of Hong Kong
wwliu@cse.cuhk.edu.hk
wwliu, Term Presentation, Term 1
Content
oBackground
• Existing Methods
• Multi-armed Bandits
• Dependency Click Model
oAlgorithm
oResults
oExperiments
oConclusion and Future Work
2
wwliu, Term Presentation, Term 1
Background
oFor users:
• How to discover interesting items like music/news/apps
among large amount of items.
oFor companies:
• How to create economic opportunities.
• How to provide better personalized services.
3
wwliu, Term Presentation, Term 1
Existing methods
oContent-based Method/Collaborative Filtering
• Pros: perform well when user have enough click or
download records.
• Cons: cold-start problem
oContext-based Method/Regression
• Pros: efficient and easy to implement
• Cons: lack of diversity
4
Exploration vs Exploitation?
wwliu, Term Presentation, Term 1
Multi-armed bandits
o Rewards 𝒙𝑖,1, 𝒙𝑖,2, … of machine 𝑖 are i.i.d. 0,1 -valued
random variables
o An allocation policy prescribes which machine 𝑰 𝑡 to play at
time 𝑡 based on the realization of 𝒙 𝑰1,1
, … , 𝒙 𝑰 𝑡−1,𝑡−1
o The target is to play as often as possible the machine with
largest reward expectation
𝜇∗
= max
𝑖=1,…,𝐾
𝔼[𝑥𝑖]
5
wwliu, Term Presentation, Term 1
Bandit Solutions
oStochastic Bandits:
• Select items repeatedly and separately, one at each time
• Limitations: ignores the underlying relations; high
computational cost
oCombinatorial Cascade Bandits:
• Select a set of sequence of arms
• Limitations: can only deal with single click setting
6
wwliu, Term Presentation, Term 1
Click Models
oCascade Click Model:
• Stop when first click occurs
• Can only model single click
oDependency Click Model:
• Introduce a set of termination parameters
• Can handle settings with multiple click
7
1
2
3
wwliu, Term Presentation, Term 1
Dependency Click Model
o Allow user continue to
check more items after a
click.
o An extension of the
Cascade Model
• Can be reduced to CM if
the termination weights
ҧ𝑣 𝑘 = 1
8
Examine next
item ak
Attracted by the
item?
Would like to
terminate?
Reach the end of the
list?
Start
Satisfied Not satisfied
Yes
Yes
No
Yes
No
No
w(ak)
v(k)¯
¯
wwliu, Term Presentation, Term 1
Problem Formulation
o Given ground item set 𝐸 = 1, … , 𝐿 , a contextual vector 𝒙𝑖,𝑡 ∈ ℝd
is known to
the agent at time 𝑡.
o Attraction weight 𝒘 𝑡 𝑎 ∈ 0,1 𝐸
• is 𝑤𝑡(𝑎)-biased Bernoulli r.m.
• denotes whether user is attracted by 𝑎 or not.
• the attraction weights 𝒘 𝑡 𝑎 𝑡=1
𝑛
are i.i.d
o Termination weight 𝒗 𝑡 𝑘 ∈ 0,1 𝐾
• is ҧ𝑣(𝑘)-biased Bernoulli r.m.
• denotes where user wants to terminate examining the list
• only depends on the position 𝑘
• the termination weights 𝒗 𝑡 𝑘 𝑡=1
𝑛
are i.i.d
9
Recommended list
𝑨 𝑡 = (𝒂1
𝑡
, … , 𝒂 𝐾
𝑡
)
Feedback 010 ⋯ 100
wwliu, Term Presentation, Term 1
Objective
o The reward function is defined as
𝑓 𝐴, 𝑣, 𝑤 = 1 − ෑ
𝑘=1
𝐾
(1 − 𝑣 𝑘 𝑤(𝑎 𝑘))
indicating that 𝑓 𝑨 𝑡, 𝒗 𝑡, 𝒘 𝑡 = 1 if user clicks on a item, feels
satisfied and terminates examination.
o The pseudo-regret is defined as
ℛ 𝑛 = 𝔼 ෍
𝑡=1
𝑛
(𝑓 𝐴 𝑡
∗
, 𝑣 𝑡, 𝑤𝑡 − 𝑓(𝐴 𝑡, 𝑣 𝑡, 𝑤𝑡))
10
wwliu, Term Presentation, Term 1
Partial Knowledge
oClick sequence is the only feedback for the agent
• The termination position is unobserved
• The reward is not revealed
11
010011000
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
reward=1
reward=0
Use feedback before the last click to update the model
wwliu, Term Presentation, Term 1
Proposed Model: attraction weight
oAssume the expected attraction weight 𝑤𝑡(𝑎)
follows
𝑤𝑡 𝑎 = 𝔼 𝒘 𝑡 𝑎 ℋ𝑡 = 𝜇(𝜃∗
⊤ 𝑥 𝑡,𝑎)
oUse the generalized linear model as a flexible
extension
• Admits a wider range of distributions, e.g. Gaussian,
binomial, Poisson…
12
Attracted
Or Not?
wwliu, Term Presentation, Term 1
Proposed Model: termination weight
o Due to the limited feedback, we assume the order of the
expected termination weights are known
• For simplicity of explanation, assume
ҧ𝑣 1 ≥ ⋯ ≥ ҧ𝑣(𝐾)
oThe expected reward is maximized by recommending the
more attractive item to the higher position.
13
Terminate
Or Not?
wwliu, Term Presentation, Term 1
Proposed Model: parameter estimation
o The model parameter 𝜃 can be estimated using MLE:
෍
𝑠=1
𝑡
෍
𝑘=1
𝐶𝑡
𝑤𝑠 𝑎 𝑘
𝑠
− 𝜇 𝜃⊤ 𝑥 𝑠,𝑎 𝑘
𝑠 𝑥 𝑠,𝑎 𝑘
𝑠 = 0.
o Upper Confidence Bound (UCB):
𝑈𝑡 𝑎 = min 𝜇 ෨𝜃𝑡−1
⊤
𝑥𝑡,𝑎 + 𝜌 𝑡 − 1 𝑥𝑡,𝑎 𝑉𝑡−1
−1 , 1 ,
where 𝛽𝑡
𝑎
𝛿 = 𝜌(𝑡) 𝑥𝑡,𝑎 𝑉𝑡
−1.
14
Lemma: For any 𝑡 ≥ 1 and 𝑎 ∈ 𝐸, denote
𝛽𝑡
𝑎
𝛿 =
2𝑘 𝜇
𝑐 𝜇
𝑥𝑡,𝑎 𝑉𝑡
−1 log
1 +
𝐾𝑡
𝜆𝑑
𝑑
𝛿2
.
For all 0 ≤ 𝛿 ≤ 1, with probability at least 1 − 𝛿, it holds that:
𝜇 𝜃∗
⊤
𝑥𝑡,𝑎 − 𝜇 ෪𝜃𝑡
⊤
𝑥𝑡,𝑎 ≤ 𝛽𝑡
𝑎
𝛿 , ∀𝑡 ≥ 1.
wwliu, Term Presentation, Term 1
Proposed Model: UCB
oAnalyze mean and a measure of uncertainty
(variance) for each item
oMake decisions based on mean + variance
15
0 0.2 0.4
B
C
A
wwliu, Term Presentation, Term 1
Proposed Model: UCB
oThe value of 𝜌(𝑡) decreases w.r.t 𝑡
oThe uncertainty of 𝐴 reduces after several time
step
oAutomatically balances exploration and exploitation
16
0 0.2 0.4
B
C
A
wwliu, Term Presentation, Term 1
Proposed Model: Algorithm
17
Recommend
based on UCB
Estimate 𝜃
Update
statistics
wwliu, Term Presentation, Term 1
Theoretical Results
o The upper bound is of 𝑂(𝑑 𝑛 log 𝑛) for the regret, which
depends linearly on the dimension 𝑑 of the feature space,
but not on the number 𝐿 of base arms.
18
Theorem: If the reward function is given as 𝑓 𝐴, 𝑣, 𝑤 = 1 − Π 𝑘=1
𝐾
(1 − 𝑣 𝑘 𝑤(𝑎 𝑘)), then the
cumulative regret ℛ(𝑛) of the proposed algorithm has the following bound,
ℛ 𝑛 ≤
4𝐾Δ 𝑣 𝑘 𝜇
𝑐 𝜇 𝑝∗
𝑑𝑛 𝐾 + 1 log
1 +
𝐾𝑛
𝜆𝑑
𝑑
𝛿2
log 1 +
𝐾𝑛
𝜆𝑑
,
where 𝑘 𝜇 is the Lipschitz constant, 𝑐 𝜇 = inf 𝜇 ′.
wwliu, Term Presentation, Term 1
Experimental Results
o Synthetic data
• L=200, K=4 and d=10
• 𝜇 𝑥 =
1
1+exp −𝑥
oGL-CDCM outperforms KL-
DCM by 80.27% and Lin-
CDCM by 49.04%.
19
wwliu, Term Presentation, Term 1
Experimental Results
o Real-world data
• 20M MovieLens data
• L=200, K=5, d=100
o GL-CDCM is 5.69 times of
that of KL-DCM and 1.45
times of that of Lin-CDCM
20
wwliu, Term Presentation, Term 1
Conclusion
oConclusion
• Formulate the DCM bandits problem
• Incorporate contextual information
• Make a weaker assumption on the expected attraction
weight function
• Prove a upper regret bound
oFuture work
• Prove a tighter bound
• Consider other practical click model
• Verify the effectiveness using more real-world dataset
21
Thank you
22

More Related Content

PPTX
Relative motion in 1D & 2D
PPT
1.3 scalar & vector quantities
PPT
Alternative architecture and control strategy july 2010 - joe beno
PPT
Transportatopn problm
ODP
Power point vector
PPT
Kinematics-1
PDF
Class 11 important questions for physics Scalars and Vectors
PPTX
Unit 6, Lesson 3 - Vectors
Relative motion in 1D & 2D
1.3 scalar & vector quantities
Alternative architecture and control strategy july 2010 - joe beno
Transportatopn problm
Power point vector
Kinematics-1
Class 11 important questions for physics Scalars and Vectors
Unit 6, Lesson 3 - Vectors

What's hot (17)

PPTX
Chapter 2
PPTX
Vehicle Routing Problem using PSO (Particle Swarm Optimization)
PPTX
9th class Physics problems 2.1, 2.2
PPTX
transporation problem
PDF
HMPC for Upper Stage Attitude Control
PPT
Vector&scalar quantitiesppt
PPTX
operation research-modi
PPTX
Vectors and Kinematics
PDF
Robust model predictive control for discrete-time fractional-order systems
PDF
Distributed solution of stochastic optimal control problem on GPUs
PPT
Vectors chap6
PPTX
Transportation technique
PPTX
LP network chapter 5 transportation and assignment problem
PPTX
Introduction to vectors
ODP
Physics 1.3 scalars and vectors
PDF
Recursive Compressed Sensing
Chapter 2
Vehicle Routing Problem using PSO (Particle Swarm Optimization)
9th class Physics problems 2.1, 2.2
transporation problem
HMPC for Upper Stage Attitude Control
Vector&scalar quantitiesppt
operation research-modi
Vectors and Kinematics
Robust model predictive control for discrete-time fractional-order systems
Distributed solution of stochastic optimal control problem on GPUs
Vectors chap6
Transportation technique
LP network chapter 5 transportation and assignment problem
Introduction to vectors
Physics 1.3 scalars and vectors
Recursive Compressed Sensing
Ad

Similar to Personalized list recommendation based on multi armed bandit algorithms (20)

PDF
Data-Driven Recommender Systems
PDF
Summer internship 2014 report by Rishabh Misra, Thapar University
PDF
Personalized News Recommendation (Stream Data Based)
PDF
Multi-Armed Bandit and Applications
PDF
Multi-Armed Bandit: an algorithmic perspective
PPT
GAUSSIAN PRESENTATION.ppt
PPT
GAUSSIAN PRESENTATION (1).ppt
PDF
Reinforcement Learning in Practice: Contextual Bandits
PPTX
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
PDF
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
PDF
Modern Recommendation for Advanced Practitioners part2
PPTX
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
PDF
Artwork Personalization at Netflix
PDF
Bandit algorithms for website optimization - A summary
PPTX
Reinforcement learning
PDF
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
PDF
Multi-Armed Bandits:
 Intro, examples and tricks
PPTX
UNIT - I Reinforcement Learning .pptx
PDF
Temporal Learning and Sequence Modeling for a Job Recommender System
PPTX
2Multi_armed_bandits.pptx
Data-Driven Recommender Systems
Summer internship 2014 report by Rishabh Misra, Thapar University
Personalized News Recommendation (Stream Data Based)
Multi-Armed Bandit and Applications
Multi-Armed Bandit: an algorithmic perspective
GAUSSIAN PRESENTATION.ppt
GAUSSIAN PRESENTATION (1).ppt
Reinforcement Learning in Practice: Contextual Bandits
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
Modern Recommendation for Advanced Practitioners part2
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Artwork Personalization at Netflix
Bandit algorithms for website optimization - A summary
Reinforcement learning
Learning for exploration-exploitation in reinforcement learning. The dusk of ...
Multi-Armed Bandits:
 Intro, examples and tricks
UNIT - I Reinforcement Learning .pptx
Temporal Learning and Sequence Modeling for a Job Recommender System
2Multi_armed_bandits.pptx
Ad

Recently uploaded (20)

PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
introduction to high performance computing
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
737-MAX_SRG.pdf student reference guides
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPTX
Artificial Intelligence
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
III.4.1.2_The_Space_Environment.p pdffdf
86236642-Electric-Loco-Shed.pdf jfkduklg
Nature of X-rays, X- Ray Equipment, Fluoroscopy
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
introduction to high performance computing
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Categorization of Factors Affecting Classification Algorithms Selection
737-MAX_SRG.pdf student reference guides
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
Artificial Intelligence
UNIT 4 Total Quality Management .pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Exploratory_Data_Analysis_Fundamentals.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf

Personalized list recommendation based on multi armed bandit algorithms

  • 1. Personalized List Recommendation based on Multi-armed Bandit Algorithms Weiwen LIU Computer Science & Engineering Chinese University of Hong Kong wwliu@cse.cuhk.edu.hk
  • 2. wwliu, Term Presentation, Term 1 Content oBackground • Existing Methods • Multi-armed Bandits • Dependency Click Model oAlgorithm oResults oExperiments oConclusion and Future Work 2
  • 3. wwliu, Term Presentation, Term 1 Background oFor users: • How to discover interesting items like music/news/apps among large amount of items. oFor companies: • How to create economic opportunities. • How to provide better personalized services. 3
  • 4. wwliu, Term Presentation, Term 1 Existing methods oContent-based Method/Collaborative Filtering • Pros: perform well when user have enough click or download records. • Cons: cold-start problem oContext-based Method/Regression • Pros: efficient and easy to implement • Cons: lack of diversity 4 Exploration vs Exploitation?
  • 5. wwliu, Term Presentation, Term 1 Multi-armed bandits o Rewards 𝒙𝑖,1, 𝒙𝑖,2, … of machine 𝑖 are i.i.d. 0,1 -valued random variables o An allocation policy prescribes which machine 𝑰 𝑡 to play at time 𝑡 based on the realization of 𝒙 𝑰1,1 , … , 𝒙 𝑰 𝑡−1,𝑡−1 o The target is to play as often as possible the machine with largest reward expectation 𝜇∗ = max 𝑖=1,…,𝐾 𝔼[𝑥𝑖] 5
  • 6. wwliu, Term Presentation, Term 1 Bandit Solutions oStochastic Bandits: • Select items repeatedly and separately, one at each time • Limitations: ignores the underlying relations; high computational cost oCombinatorial Cascade Bandits: • Select a set of sequence of arms • Limitations: can only deal with single click setting 6
  • 7. wwliu, Term Presentation, Term 1 Click Models oCascade Click Model: • Stop when first click occurs • Can only model single click oDependency Click Model: • Introduce a set of termination parameters • Can handle settings with multiple click 7 1 2 3
  • 8. wwliu, Term Presentation, Term 1 Dependency Click Model o Allow user continue to check more items after a click. o An extension of the Cascade Model • Can be reduced to CM if the termination weights ҧ𝑣 𝑘 = 1 8 Examine next item ak Attracted by the item? Would like to terminate? Reach the end of the list? Start Satisfied Not satisfied Yes Yes No Yes No No w(ak) v(k)¯ ¯
  • 9. wwliu, Term Presentation, Term 1 Problem Formulation o Given ground item set 𝐸 = 1, … , 𝐿 , a contextual vector 𝒙𝑖,𝑡 ∈ ℝd is known to the agent at time 𝑡. o Attraction weight 𝒘 𝑡 𝑎 ∈ 0,1 𝐸 • is 𝑤𝑡(𝑎)-biased Bernoulli r.m. • denotes whether user is attracted by 𝑎 or not. • the attraction weights 𝒘 𝑡 𝑎 𝑡=1 𝑛 are i.i.d o Termination weight 𝒗 𝑡 𝑘 ∈ 0,1 𝐾 • is ҧ𝑣(𝑘)-biased Bernoulli r.m. • denotes where user wants to terminate examining the list • only depends on the position 𝑘 • the termination weights 𝒗 𝑡 𝑘 𝑡=1 𝑛 are i.i.d 9 Recommended list 𝑨 𝑡 = (𝒂1 𝑡 , … , 𝒂 𝐾 𝑡 ) Feedback 010 ⋯ 100
  • 10. wwliu, Term Presentation, Term 1 Objective o The reward function is defined as 𝑓 𝐴, 𝑣, 𝑤 = 1 − ෑ 𝑘=1 𝐾 (1 − 𝑣 𝑘 𝑤(𝑎 𝑘)) indicating that 𝑓 𝑨 𝑡, 𝒗 𝑡, 𝒘 𝑡 = 1 if user clicks on a item, feels satisfied and terminates examination. o The pseudo-regret is defined as ℛ 𝑛 = 𝔼 ෍ 𝑡=1 𝑛 (𝑓 𝐴 𝑡 ∗ , 𝑣 𝑡, 𝑤𝑡 − 𝑓(𝐴 𝑡, 𝑣 𝑡, 𝑤𝑡)) 10
  • 11. wwliu, Term Presentation, Term 1 Partial Knowledge oClick sequence is the only feedback for the agent • The termination position is unobserved • The reward is not revealed 11 010011000 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 reward=1 reward=0 Use feedback before the last click to update the model
  • 12. wwliu, Term Presentation, Term 1 Proposed Model: attraction weight oAssume the expected attraction weight 𝑤𝑡(𝑎) follows 𝑤𝑡 𝑎 = 𝔼 𝒘 𝑡 𝑎 ℋ𝑡 = 𝜇(𝜃∗ ⊤ 𝑥 𝑡,𝑎) oUse the generalized linear model as a flexible extension • Admits a wider range of distributions, e.g. Gaussian, binomial, Poisson… 12 Attracted Or Not?
  • 13. wwliu, Term Presentation, Term 1 Proposed Model: termination weight o Due to the limited feedback, we assume the order of the expected termination weights are known • For simplicity of explanation, assume ҧ𝑣 1 ≥ ⋯ ≥ ҧ𝑣(𝐾) oThe expected reward is maximized by recommending the more attractive item to the higher position. 13 Terminate Or Not?
  • 14. wwliu, Term Presentation, Term 1 Proposed Model: parameter estimation o The model parameter 𝜃 can be estimated using MLE: ෍ 𝑠=1 𝑡 ෍ 𝑘=1 𝐶𝑡 𝑤𝑠 𝑎 𝑘 𝑠 − 𝜇 𝜃⊤ 𝑥 𝑠,𝑎 𝑘 𝑠 𝑥 𝑠,𝑎 𝑘 𝑠 = 0. o Upper Confidence Bound (UCB): 𝑈𝑡 𝑎 = min 𝜇 ෨𝜃𝑡−1 ⊤ 𝑥𝑡,𝑎 + 𝜌 𝑡 − 1 𝑥𝑡,𝑎 𝑉𝑡−1 −1 , 1 , where 𝛽𝑡 𝑎 𝛿 = 𝜌(𝑡) 𝑥𝑡,𝑎 𝑉𝑡 −1. 14 Lemma: For any 𝑡 ≥ 1 and 𝑎 ∈ 𝐸, denote 𝛽𝑡 𝑎 𝛿 = 2𝑘 𝜇 𝑐 𝜇 𝑥𝑡,𝑎 𝑉𝑡 −1 log 1 + 𝐾𝑡 𝜆𝑑 𝑑 𝛿2 . For all 0 ≤ 𝛿 ≤ 1, with probability at least 1 − 𝛿, it holds that: 𝜇 𝜃∗ ⊤ 𝑥𝑡,𝑎 − 𝜇 ෪𝜃𝑡 ⊤ 𝑥𝑡,𝑎 ≤ 𝛽𝑡 𝑎 𝛿 , ∀𝑡 ≥ 1.
  • 15. wwliu, Term Presentation, Term 1 Proposed Model: UCB oAnalyze mean and a measure of uncertainty (variance) for each item oMake decisions based on mean + variance 15 0 0.2 0.4 B C A
  • 16. wwliu, Term Presentation, Term 1 Proposed Model: UCB oThe value of 𝜌(𝑡) decreases w.r.t 𝑡 oThe uncertainty of 𝐴 reduces after several time step oAutomatically balances exploration and exploitation 16 0 0.2 0.4 B C A
  • 17. wwliu, Term Presentation, Term 1 Proposed Model: Algorithm 17 Recommend based on UCB Estimate 𝜃 Update statistics
  • 18. wwliu, Term Presentation, Term 1 Theoretical Results o The upper bound is of 𝑂(𝑑 𝑛 log 𝑛) for the regret, which depends linearly on the dimension 𝑑 of the feature space, but not on the number 𝐿 of base arms. 18 Theorem: If the reward function is given as 𝑓 𝐴, 𝑣, 𝑤 = 1 − Π 𝑘=1 𝐾 (1 − 𝑣 𝑘 𝑤(𝑎 𝑘)), then the cumulative regret ℛ(𝑛) of the proposed algorithm has the following bound, ℛ 𝑛 ≤ 4𝐾Δ 𝑣 𝑘 𝜇 𝑐 𝜇 𝑝∗ 𝑑𝑛 𝐾 + 1 log 1 + 𝐾𝑛 𝜆𝑑 𝑑 𝛿2 log 1 + 𝐾𝑛 𝜆𝑑 , where 𝑘 𝜇 is the Lipschitz constant, 𝑐 𝜇 = inf 𝜇 ′.
  • 19. wwliu, Term Presentation, Term 1 Experimental Results o Synthetic data • L=200, K=4 and d=10 • 𝜇 𝑥 = 1 1+exp −𝑥 oGL-CDCM outperforms KL- DCM by 80.27% and Lin- CDCM by 49.04%. 19
  • 20. wwliu, Term Presentation, Term 1 Experimental Results o Real-world data • 20M MovieLens data • L=200, K=5, d=100 o GL-CDCM is 5.69 times of that of KL-DCM and 1.45 times of that of Lin-CDCM 20
  • 21. wwliu, Term Presentation, Term 1 Conclusion oConclusion • Formulate the DCM bandits problem • Incorporate contextual information • Make a weaker assumption on the expected attraction weight function • Prove a upper regret bound oFuture work • Prove a tighter bound • Consider other practical click model • Verify the effectiveness using more real-world dataset 21