Personalized News Recommendation (Stream Data Based)

IR & E
Personalized News Article Recommendation (Stream Data Based)
Monsoon 17, IIIT Hyderabad

Keywords
● Contextual Bandit
● Web Service
● Personalization
● Recommender Systems
● Exploration/Exploitation dilemma

Example of Learning through Exploration
Repeatedly:
1. A user comes to Yahoo! (with history of previous visits, IP addresses, data related to his Yahoo!
account)
2. Yahoo! chooses information to present (from URLs, Ads, news stories)
3. The user reacts to the presented information (clicks on something, clicks, comes back and clicks
again, etc.)
Yahoo! wants to interactively choose content and use the observed feedback to improve future content
choices.

Another Example: Clinical Decision Making
Repeatedly:
1. A patient comes to a doctor with symptoms, medical history, test results
2. The doctor chooses and suggests a treatment
3. The patient responds to it
The doctor wants a policy for choosing targeted treatments for individual patients.

Current Scenario
Which article to feature?
Challenges:
● A lot of new users and articles.
● Incorporation of content.
● Changing relevance of articles.
Goal:
"Quickly" identify relevant news stories on
personal level.

The Contextual Bandit Setting
For t = 1, . . . , T:
1. The world produces some context xt
∈ X
2. The learner chooses an action at
∈ {1, . . . ,K}
3. The world reacts with reward rt
(at
) ∈ [0, 1]
Goal: Learn a good policy for choosing actions given context.
What does learning mean?

The Contextual Bandit Setting (Contd.)
What does learning mean?
Efficiently competing with a large reference class of possible policies Π = { π : X → {1, ..., K} }

Some Remarks
This is not a supervised learning problem.
● We don’t know the reward of actions not taken,
○ loss function is unknown even at training time.
● Exploration is needed to succeed.
● Simpler than reinforcement learning,
○ We know which action is responsible for each reward.

Some Remarks (Contd.)
This is not a bandit problem.
● In the bandit setting, there is no x, and the goal is to compete with the set of constant actions.
○ Too weak in practice.
● Generalization across x is required to succeed.

Mapping to our current problem
For each time t = 1, 2, 3, … , T, the news page is loaded:
1. Arms or actions are the articles, which can be shown to the user. The environment could be user
and article information.
2. If the article a is clicked, rt, a
= 1, otherwise 0.
3. Improve new article selection.
Goal: Maximize expected Click-through-rate, i.e.,

Balancing Exploration and Exploitation

LinUCB (Disjoint Linear Model)
Assumption: The expected reward for action a is a linear function in the features of the context, i.e.:
1. In each trial t, for each a ∈ At
estimate θa
via regularized linear regression using feature matrix Da
.
E[rt, a
| xt, a
] = xT
t, a
θa
*
2. Choose at
such that,

LinUCB (Hybrid Model)
Assumption: The expected reward for action a is the sum of two linear terms, one that is independent of
the action and one that is specific to each action, i.e.:
E[rt, a
| xt, a
] = zT
t, a
β*
+ xT
t, a
θa
*
Algorithm works similar to the previous LinUCB algorithm.

Evaluation
● Testing on Live Data?
○ TOO EXPENSIVE.
● Then, testing offline?
○ DIFFERENT LOGGING POLICY
● Then, simulator-based approach?
○ BIASED.

Results
● Training Set: 4.7 million events
● Test Set: 36 million events
● Articles and users clustered into 5 clusters:
○ Two 6-dimensional (one constant) feature
vectors

Questions?
Ask in the comment section.

Personalized News Recommendation (Stream Data Based)

More Related Content

Similar to Personalized News Recommendation (Stream Data Based) (20)

Recently uploaded (20)

Personalized News Recommendation (Stream Data Based)