Shallow and Deep Latent Models for Recommender System

Shallow & Deep Latent Models
for Recommender Systems
Anoop Deoras, Dawen Liang
PRS Workshop, Netflix
06/08/2018
@adeoras, @dawen_liang

● Personalization and Recommendations at Netflix
● Discuss evolution of latent models in the Recommender System space
● Showcase some experimental results and interesting findings
● Take away points
Theme of the talk

● Recommendation Systems are means to an end.
● Our primary goal:
○ Maximize Netflix member’s enjoyment of the selected show
■ Enjoyment integrated over time
○ Minimize the time it takes to find them
■ Interaction cost integrated over time
Personalization
● Personalization
● How ?

Ordering of the titles in each row is personalized
From what shows to recommend

Selection and placement of the row types is personalized
... To how to construct the page

Personalized images.
Profile 1 Profile 2
... To what images to select

Personalization
● When the catalog size is very large, recommendations are the only saving grace.
● A good Recommender Systems should consider:
○ What is recommended
○ How it is recommended
○ When it is recommended
○ Where it is recommended

Personalization
● We try to model
○ User’s taste
○ Context
■ Time
■ Device
■ Country
■ Language
■ …
○ Difference in local tastes
■ What is popular in US may not be popular in India
■ Not available != Not Popular

Latent Models for
Recommendation

● Shallow
○ Latent Factor Models -- Matrix Factorization (MF)
○ Latent Dirichlet Allocation (LDA)
● Deep
○ Variational Autoencoder
○ Feedforward Neural Networks
○ Sequential Neural Networks (RNNs)
○ Convolutional Neural Networks
Latent Models

Latent Factor Model
1.0 2.0
3.0 4.0
3.0
5.0
*
#users
# items
K
K
User latent factors
Item latent factors
Observed ratings
Explicit Feedback

Latent Factor Model
1 0 0 1 0
0 1 0 0 1
0 0 0 1 0
0 0 1 0 0
*
#users
# items
K
K
User latent factors
Item latent factors
Observed plays
Implicit Feedback

1 0 0 1 0
0 1 0 0 1
0 0 0 1 0
0 0 1 0 0
Gaussian matrix factorization
*
#users
# items
K
K
User latent factors
Item latent factors
Observed plays
Confidence

1 0 0 1 0
0 1 0 0 1
0 0 0 1 0
0 0 1 0 0
Topic Models (Latent Dirichlet Allocation)
*
#users
# items
K
K
User latent factors
Item latent factors
Observed plays
# Plays of
User ‘u’

1 0 0 1 0
0 1 0 0 1
0 0 0 1 0
0 0 1 0 0
Deep Latent Factor Model
#users
K
User latent factorsObserved plays
DNN

Variational Autoencoders
zu
ru
Taste
fθ
ru
Encoder
Decoder
fѰ
fѰ
DNN
Liang et al. (2018), Variational Autoencoders for Collaborative Filtering, WWW.
Generative model:
Inference model:

● Commonly used in Language Models and Economics
● Close proxy to the top-N ranking loss
○ The likelihood (cross-entropy) rewards the model for putting probability
mass on the non-zero entries
○ The items have to compete for limited budget ( since )
● Effectively ranking non-zero entries higher
Why Multinomial?

Why VAEs (or rather, Bayesian)?
● Generalized linear latent factor models :
○ Recover LDA as a special linear case
● No ‘Fold-In’ necessary
○ Only evaluate inference and generative functions (amortized inference)
● Per user, RecSys is more of a “small data” than a “big data” problem

Neural Multi Class Models
play (t-n)
...
play (t-1)
cntxt
Soft-max over entire
vocabulary
play
(t-n)...
play
(t-1)cntxt
vocabulary
N-GRAM BoW-n
Feed
Forward User,Cntxt
P(next-video | <user, cntxt>)

Neural Multi Class Models
play
(t-1)
cntxt
vocabulary
state
(t-1)
RNN Family
play
(t-2)
...
play
(t-1)
vocabulary
cntxt
play
(t-4)play
(t-3)
play
(t-n)play
(t-n+1)
CNN Family
state
(t)
Recurrent
Convolutn
P(next-video | <user, cntxt>)

Why Conditional Models ?
● Maximizes the likelihood of user playing the next play ‘directly’
● No ‘Fold-In’ necessary
○ Only need to evaluate forward graph
● Enables encoding of temporal and sequential information seamlessly
● Rich literature around model adaptation and bootstrapping

Results (internal Netflix dataset)

Interpreting a CNN CF Model
● Deeper CNN layers have discovered higher level features in images:
○ Edges
○ Faces etc
● What would a CNN learn if it is trained on user-item interaction dataset?
○ Can it discover semantic topics ?

Interpreting a CNN CF Model
HorroR Filter
Kids Filter
Narcotics Filter
Thanks to Ko-Jen Hsiao for the CNN viz

Take Away Points
● Shallow models
○ Presented a unified view of various latent factor models
○ Discussed limited modeling capacity ⇒ inferior prediction power
● Deep models
○ Encoding of rich nonlinear user item interaction ⇒ superior prediction power
○ Discussed how VAEs can be thought of as non linear LDA
○ Showcased how ‘Next Play models’ model directly the task at hand

Thank you
Anoop Deoras: adeoras@netflix.com
Dawen Liang: dliang@netflix.com

Shallow and Deep Latent Models for Recommender System

More Related Content

What's hot (20)

Similar to Shallow and Deep Latent Models for Recommender System (20)

Recently uploaded (20)

Shallow and Deep Latent Models for Recommender System