Cikm 2013 - Beyond Data From User Information to Business Value

Beyond Data
From User Information to Business
Value

October, 2013
Xavier Amatriain
Director - Algorithms Engineering - Netflix

@xamat

“In a simple Netlfix-style item recommender, we would
simply apply some form of matrix factorization (i.e NMF)”

From the Netflix
Prize to today

2006

2013

Everything is personalized
Ranking

Over 75% of what
people watch
comes from a
recommendation

Top 10
Personalization awareness

Diversity

Support for Recommendations

Social Support

EVERYTHING is a Recommendation

Consumer (Data) Science
1.

Start with a hypothesis:
■ Algorithm/feature/design X will increase member engagement
with our service, and ultimately member retention

2.

Design a test
■ Develop a solution or prototype
■ Think about dependent & independent variables, control,
significance…

3.
4.

Execute the test
Let data speak for itself

Offline/Online testing process

Executing A/B tests
Measure differences in metrics across statistically identical populations
that each experience a different algorithm.

■ Decisions on the product always data-driven
■ Overall Evaluation Criteria (OEC) = member retention
■ Use long-term metrics whenever possible
■ Short-term metrics can be informative and allow faster decisions
■ But, not always aligned with OEC

■ Significance and hypothesis testing (1000s of members and 220 cells)

■ A/B Tests allow testing many (radical) ideas at the same
time (typically 100s of customer A/B tests running)

Offline testing
■ Measure model performance, using (IR) metrics
■ Offline performance used as an indication to make
informed decisions on follow-up A/B tests
■ A critical (and mostly unsolved) issue is how offline
metrics can correlate with A/B test results.
■ Extremely important to define offline evaluation
framework that maps to online OEC
■ e.g. How to create training/testing datasets may not be trivial

Big Data @Netflix
Time
Impressions
Metadata
Social

■ > 40M subscribers
■ Ratings: ~5M/day
■ Searches: >3M/day
Geo-information
■ Plays: > 50M/day
■ Streamed hours:
○ 5B hours in
Member Behavior Q3 2013

Ratings

Device Info
Demographics

Smart Models

■ Regression models (Logistic,
Linear, Elastic nets)
■ SVD & other MF models
■ Factorization Machines
■ Restricted Boltzmann Machines
■ Markov Chains & other graph
models
■ Clustering (from k-means to
HDP)
■ Deep ANN
■ LDA
■ Association Rules
■ GBDT/RF
■ …

SVD for Rating Prediction
■ User factor vectors
■ Baseline (bias)

and item-factors vectors
(user & item deviation

from average)
■ Predict rating as
■ SVD++ (Koren et. Al) asymmetric variation w.
implicit feedback

■ Where
■
are three item factor vectors
■ Users are not parametrized, but rather represented by:
■ R(u): items rated by user u & N(u): items for which the user
has given implicit preference (e.g. rated/not rated)

Restricted Boltzmann Machines
■ Restrict the connectivity in ANN to
make learning easier.

■

Only one layer of hidden units.

■

■

Although multiple layers are possible

No connections between hidden
units.

■ Hidden units are independent given
the visible states..
■ RBMs can be stacked to form Deep
Belief Networks (DBN) – 4th generation
of ANNs

Ranking
■ Ranking = Scoring + Sorting + Filtering
bags of movies for presentation to a user
■ Key algorithm, sorts titles in most contexts
■ Goal: Find the best possible ordering of a
set of videos for a user within a specific
context in real-time
■ Objective: maximize consumption &
“enjoyment”

■ Factors
■
■
■
■
■
■

Accuracy
Novelty
Diversity
Freshness
Scalability
…

Example: Two features, linear model

2
3
4

5

Popularity

Linear Model:
frank(u,v) = w1 p(v) + w2 r(u,v) + b

Final Ranking

Predicted Rating

1

Example: Two features, linear model

2
3
4

5

Popularity

Final Ranking

Predicted Rating

1

Learning to Rank Approaches
■ ML problem: construct ranking model from training data
1.
2.

3.

Pointwise (Ordinal regression, Logistic regression, SVM, GBDT, …)
■
Loss function defined on individual relevance judgment
Pairwise (RankSVM, RankBoost, RankNet, FRank…)
■
Loss function defined on pair-wise preferences
■
Goal: minimize number of inversions in ranking
Listwise
■
Indirect Loss Function (RankCosine, ListNet…)
■
Directly optimize IR measures (NDCG, MRR, FCP…)
■
■
■
■

Genetic Programming or Simulated Annealing
Use boosting to optimize NDCG (Adarank)
Gradient descent on smoothed version (CLiMF, TFMAP, GAPfm @cikm13)
Iterative Coordinate Ascent (Direct Rank @kdd13)

Other research questions we are working on
●
●
●
●
●
●
●
●
●

Row selection
Diversity
Similarity
Context-aware recommendations
Explore/exploit
Presentation bias correction
Mood and session intent inference
Unavailable Title Search
...

More data or better models?

Really?

Anand Rajaraman: Former Stanford Prof. &
Senior VP at Walmart

Sometimes, it’s not
about more data

[Banko and Brill, 2001]

Norvig: “Google does not
have better Algorithms,
only more Data”

Many features/
low-bias models


Sometimes, it’s not
about more data


X

“Data without a sound approach = noise”

More data +
Smarter models +
More accurate metrics +
Better approaches
Lots of room for improvement!

Xavier Amatriain (@xamat)
xavier@netflix.com

Thanks!

We are hiring!

Cikm 2013 - Beyond Data From User Information to Business Value

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Cikm 2013 - Beyond Data From User Information to Business Value (20)

More from Xavier Amatriain (17)

Recently uploaded (20)

Cikm 2013 - Beyond Data From User Information to Business Value