SlideShare a Scribd company logo
IR & E
Personalized News Article Recommendation (Stream Data Based)
Monsoon 17, IIIT Hyderabad
Keywords
● Contextual Bandit
● Web Service
● Personalization
● Recommender Systems
● Exploration/Exploitation dilemma
Example of Learning through Exploration
Repeatedly:
1. A user comes to Yahoo! (with history of previous visits, IP addresses, data related to his Yahoo!
account)
2. Yahoo! chooses information to present (from URLs, Ads, news stories)
3. The user reacts to the presented information (clicks on something, clicks, comes back and clicks
again, etc.)
Yahoo! wants to interactively choose content and use the observed feedback to improve future content
choices.
Another Example: Clinical Decision Making
Repeatedly:
1. A patient comes to a doctor with symptoms, medical history, test results
2. The doctor chooses and suggests a treatment
3. The patient responds to it
The doctor wants a policy for choosing targeted treatments for individual patients.
Current Scenario
Which article to feature?
Challenges:
● A lot of new users and articles.
● Incorporation of content.
● Changing relevance of articles.
Goal:
"Quickly" identify relevant news stories on
personal level.
The Contextual Bandit Setting
For t = 1, . . . , T:
1. The world produces some context xt
∈ X
2. The learner chooses an action at
∈ {1, . . . ,K}
3. The world reacts with reward rt
(at
) ∈ [0, 1]
Goal: Learn a good policy for choosing actions given context.
What does learning mean?
The Contextual Bandit Setting (Contd.)
What does learning mean?
Efficiently competing with a large reference class of possible policies Π = { π : X → {1, ..., K} }
Some Remarks
This is not a supervised learning problem.
● We don’t know the reward of actions not taken,
○ loss function is unknown even at training time.
● Exploration is needed to succeed.
● Simpler than reinforcement learning,
○ We know which action is responsible for each reward.
Some Remarks (Contd.)
This is not a bandit problem.
● In the bandit setting, there is no x, and the goal is to compete with the set of constant actions.
○ Too weak in practice.
● Generalization across x is required to succeed.
Mapping to our current problem
For each time t = 1, 2, 3, … , T, the news page is loaded:
1. Arms or actions are the articles, which can be shown to the user. The environment could be user
and article information.
2. If the article a is clicked, rt, a
= 1, otherwise 0.
3. Improve new article selection.
Goal: Maximize expected Click-through-rate, i.e.,
Balancing Exploration and Exploitation
LinUCB (Disjoint Linear Model)
Assumption: The expected reward for action a is a linear function in the features of the context, i.e.:
1. In each trial t, for each a ∈ At
estimate θa
via regularized linear regression using feature matrix Da
.
E[rt, a
| xt, a
] = xT
t, a
θa
*
2. Choose at
such that,
LinUCB (Hybrid Model)
Assumption: The expected reward for action a is the sum of two linear terms, one that is independent of
the action and one that is specific to each action, i.e.:
E[rt, a
| xt, a
] = zT
t, a
β*
+ xT
t, a
θa
*
Algorithm works similar to the previous LinUCB algorithm.
Evaluation
● Testing on Live Data?
○ TOO EXPENSIVE.
● Then, testing offline?
○ DIFFERENT LOGGING POLICY
● Then, simulator-based approach?
○ BIASED.
Results
● Training Set: 4.7 million events
● Test Set: 36 million events
● Articles and users clustered into 5 clusters:
○ Two 6-dimensional (one constant) feature
vectors
Questions?
Ask in the comment section.

More Related Content

PDF
Microsoft Power BI 소개
PDF
Dso job log and activation parameters
PDF
Digital 2022 Madagascar (February 2022) v01
PDF
Digital 2022 Kazakhstan (February 2022) v01
PDF
Mundo digital España 2022
PDF
Digital 2022 Luxembourg (February 2022) v01
PDF
Digital 2022 Venezuela (February 2022) v01
PDF
Digital 2022: Essential TikTok Stats for Q1 2022 v01
Microsoft Power BI 소개
Dso job log and activation parameters
Digital 2022 Madagascar (February 2022) v01
Digital 2022 Kazakhstan (February 2022) v01
Mundo digital España 2022
Digital 2022 Luxembourg (February 2022) v01
Digital 2022 Venezuela (February 2022) v01
Digital 2022: Essential TikTok Stats for Q1 2022 v01

Similar to Personalized News Recommendation (Stream Data Based) (20)

PDF
Multi-Armed Bandit: an algorithmic perspective
PDF
Artwork Personalization at Netflix
PPTX
RL - Unit 1.pptx reinforcement learning ppt srm ist
PDF
Multi-Armed Bandits:
 Intro, examples and tricks
PPTX
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
PDF
Counterfactual Learning for Recommendation
PDF
Matt gershoff
PPTX
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
PDF
Reinforcement Learning in Practice: Contextual Bandits
PPTX
Practical contextual bandits for business
PDF
A contextual bandit algorithm for mobile context-aware recommender system
PDF
Personalized list recommendation based on multi armed bandit algorithms
PDF
NYAI - Interactive Machine Learning by Daniel Hsu
PPT
GAUSSIAN PRESENTATION.ppt
PPT
GAUSSIAN PRESENTATION (1).ppt
PPTX
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
PDF
Exploration exploitation trade off in mobile context-aware recommender systems
PDF
Practical AI for Business: Bandit Algorithms
PDF
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
PDF
Sequential Decision Making in Recommendations
Multi-Armed Bandit: an algorithmic perspective
Artwork Personalization at Netflix
RL - Unit 1.pptx reinforcement learning ppt srm ist
Multi-Armed Bandits:
 Intro, examples and tricks
IMC Summit 2016 Breakout - Girish Kathalagiri - Decision Making with MLLIB, S...
Counterfactual Learning for Recommendation
Matt gershoff
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Reinforcement Learning in Practice: Contextual Bandits
Practical contextual bandits for business
A contextual bandit algorithm for mobile context-aware recommender system
Personalized list recommendation based on multi armed bandit algorithms
NYAI - Interactive Machine Learning by Daniel Hsu
GAUSSIAN PRESENTATION.ppt
GAUSSIAN PRESENTATION (1).ppt
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Exploration exploitation trade off in mobile context-aware recommender systems
Practical AI for Business: Bandit Algorithms
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
Sequential Decision Making in Recommendations
Ad

Recently uploaded (20)

PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
An interstellar mission to test astrophysical black holes
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Placing the Near-Earth Object Impact Probability in Context
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PPTX
Application of enzymes in medicine (2).pptx
PPTX
Pharmacology of Autonomic nervous system
PDF
The Land of Punt — A research by Dhani Irwanto
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
BIOMOLECULES PPT........................
PPTX
Fluid dynamics vivavoce presentation of prakash
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
An interstellar mission to test astrophysical black holes
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Seminar Hypertension and Kidney diseases.pptx
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Biophysics 2.pdffffffffffffffffffffffffff
Placing the Near-Earth Object Impact Probability in Context
6.1 High Risk New Born. Padetric health ppt
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
CORDINATION COMPOUND AND ITS APPLICATIONS
Application of enzymes in medicine (2).pptx
Pharmacology of Autonomic nervous system
The Land of Punt — A research by Dhani Irwanto
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
BIOMOLECULES PPT........................
Fluid dynamics vivavoce presentation of prakash
Ad

Personalized News Recommendation (Stream Data Based)

  • 1. IR & E Personalized News Article Recommendation (Stream Data Based) Monsoon 17, IIIT Hyderabad
  • 2. Keywords ● Contextual Bandit ● Web Service ● Personalization ● Recommender Systems ● Exploration/Exploitation dilemma
  • 3. Example of Learning through Exploration Repeatedly: 1. A user comes to Yahoo! (with history of previous visits, IP addresses, data related to his Yahoo! account) 2. Yahoo! chooses information to present (from URLs, Ads, news stories) 3. The user reacts to the presented information (clicks on something, clicks, comes back and clicks again, etc.) Yahoo! wants to interactively choose content and use the observed feedback to improve future content choices.
  • 4. Another Example: Clinical Decision Making Repeatedly: 1. A patient comes to a doctor with symptoms, medical history, test results 2. The doctor chooses and suggests a treatment 3. The patient responds to it The doctor wants a policy for choosing targeted treatments for individual patients.
  • 5. Current Scenario Which article to feature? Challenges: ● A lot of new users and articles. ● Incorporation of content. ● Changing relevance of articles. Goal: "Quickly" identify relevant news stories on personal level.
  • 6. The Contextual Bandit Setting For t = 1, . . . , T: 1. The world produces some context xt ∈ X 2. The learner chooses an action at ∈ {1, . . . ,K} 3. The world reacts with reward rt (at ) ∈ [0, 1] Goal: Learn a good policy for choosing actions given context. What does learning mean?
  • 7. The Contextual Bandit Setting (Contd.) What does learning mean? Efficiently competing with a large reference class of possible policies Π = { π : X → {1, ..., K} }
  • 8. Some Remarks This is not a supervised learning problem. ● We don’t know the reward of actions not taken, ○ loss function is unknown even at training time. ● Exploration is needed to succeed. ● Simpler than reinforcement learning, ○ We know which action is responsible for each reward.
  • 9. Some Remarks (Contd.) This is not a bandit problem. ● In the bandit setting, there is no x, and the goal is to compete with the set of constant actions. ○ Too weak in practice. ● Generalization across x is required to succeed.
  • 10. Mapping to our current problem For each time t = 1, 2, 3, … , T, the news page is loaded: 1. Arms or actions are the articles, which can be shown to the user. The environment could be user and article information. 2. If the article a is clicked, rt, a = 1, otherwise 0. 3. Improve new article selection. Goal: Maximize expected Click-through-rate, i.e.,
  • 12. LinUCB (Disjoint Linear Model) Assumption: The expected reward for action a is a linear function in the features of the context, i.e.: 1. In each trial t, for each a ∈ At estimate θa via regularized linear regression using feature matrix Da . E[rt, a | xt, a ] = xT t, a θa * 2. Choose at such that,
  • 13. LinUCB (Hybrid Model) Assumption: The expected reward for action a is the sum of two linear terms, one that is independent of the action and one that is specific to each action, i.e.: E[rt, a | xt, a ] = zT t, a β* + xT t, a θa * Algorithm works similar to the previous LinUCB algorithm.
  • 14. Evaluation ● Testing on Live Data? ○ TOO EXPENSIVE. ● Then, testing offline? ○ DIFFERENT LOGGING POLICY ● Then, simulator-based approach? ○ BIASED.
  • 15. Results ● Training Set: 4.7 million events ● Test Set: 36 million events ● Articles and users clustered into 5 clusters: ○ Two 6-dimensional (one constant) feature vectors
  • 16. Questions? Ask in the comment section.