Matrix Factorizations for Recommender Systems
Dmitriy Selivanov
selivanov.dmitriy@gmail.com
2017-11-16
Recommender systems are everywhere
Figure 1:
Recommender systems are everywhere
Figure 2:
Recommender systems are everywhere
Figure 3:
Recommender systems are everywhere
Figure 4:
Goals
Propose “relevant” items to customers
Retention
Exploration
Up-sale
Personalized offers
recommended items for a customer given history of activities (transactions, browsing
history, favourites)
Similar items
substitutions
bundles - frequently bought together
. . .
Live demo
Dataset - LastFM-360K:
360k users
160k artists
17M observations
sparsity - 0.9999999
Explicit feedback
Ratings, likes/dislikes, purchases:
cleaner data
smaller
hard to collect
RMSE2
=
1
D u,i∈D
(rui − ˆrui )2
Netflix prize
~ 480k users, 18k movies, 100m ratings
sparsity ~ 90%
goal is to reduce RMSE by 10% - from 0.9514 to 0.8563
Implicit feedback
noisy feedback (click, likes, purchases, search, . . . )
much easier to collect
wider user/item coverage
usually sparsity > 99.9%
One-Class Collaborative Filtering
observed entries are positive preferences
should have high confidence
missed entries in matrix are mix of negative preferences and positive preferences
consider them as negative with low confidence
we cannot really distinguish that user did not click a banner because of a lack of
interest or lack of awareness
Evaluation
Recap: we only care about how to produce small set of highly relevant items.
RMSE is bad metrics - very weak connection to business goals.
Only interested about relevance precision of retreived items:
space on the screen is limited
only order matters - most relevant items should be in top
Ranking - Mean average precision
AveragePrecision =
n
k=1
(P(k)×rel(k))
number of relevant documents
## index relevant precision_at_k
## 1: 1 0 0.0000000
## 2: 2 0 0.0000000
## 3: 3 1 0.3333333
## 4: 4 0 0.2500000
## 5: 5 0 0.2000000
map@5 = 0.1566667
Ranking - Normalized Discounted Cumulative Gain
Intuition is the same as for MAP@K, but also takes into account value of relevance:
DCGp =
p
i=1
2reli − 1
log2(i + 1)
nDCGp =
DCGp
IDCGp
IDCGp =
|REL|
i=1
2reli − 1
log2(i + 1)
Approaches
Content based
good for cold start
not personalized
Collaborative filtering
vanilla collaborative fitlering
matrix factorizations
. . .
Hybrid and context aware recommender systems
best of two worlds
Focus today
WRMF (Weighted Regularized Matrix Factorization) - Collaborative Filtering for
Implicit Feedback Datasets (2008)
efficient learning with accelerated approximate Alternating Least Squares
inference time
Linear-FLow - Practical Linear Models for Large-Scale One-Class Collaborative
Filtering (2016)
efficient truncated SVD
cheap cross-validation with full path regularization
Matrix Factorizations
Users can be described by small number of latent factors puk
Items can be described by small number of latent factors qki
Sparse data
items
users
Low rank matrix factorization
R = P × Q
factors
users
items
factors
Reconstruction
items
users
items
users
Truncated SVD
Take k largest singular values:
X ≈ UkDkV T
k
- Xk ∈ Rm∗n - Uk, V - columns are orthonormal bases (dot product of any 2 columns is
zero, unit norm) - Dk - matrix with singular values on diagonal
Truncated SVD is the best rank k approximation of the matrix X in terms of
Frobenius norm:
||X − UkDkV T
k ||F
P = Uk Dk
Q = DkV T
k
Issue with truncated SVD for “explicit” feedback
Optimal in terms of Frobenius norm - takes into account zeros in ratings -
RMSE =
1
users × items u∈users,i∈items
(rui − ˆrui )2
Overfits data
Objective = error only in “observed” ratings:
RMSE =
1
Observed u,i∈Observed
(rui − ˆrui )2
SVD-like matrix factorization with ALS
J =
u,i∈Observed
(rui − pu × qi )2
+ λ(||Q2
|| + ||P2
||)
Given Q fixed solve for p:
min
i∈Observed
(ri − qi × P)2
+ λ
u
j=1
p2
j
Given P fixed solve for q:
min
u∈Observed
(ru − pu × Q)2
+ λ
i
j=1
q2
j
Ridge regression: P = (QT Q + λI)−1QT r, Q = (PT P + λI)−1PT r
“Collaborative Filtering for Implicit Feedback Datasets”
WRMF - Weighted Regularized Matrix Factorization
“Default” approach
Proposed in 2008, but still widely used in industry (even at youtube)
several high-quality open-source implementations
J =
u,i
Cui (Pui − XuYi )2
+ λ(||X||F + ||Y ||F )
Preferences - binary
Pij =
1 if Rij > 0
0 otherwise
Confidence - Cui = 1 + f (Rui )
Alternating Least Squares for implicit feedback
For fixed Y :
dL/dxu = −2
i=item
cui (pui − xT
u yi )yi + 2λxu =
−2
i=item
cui (pui − yT
i xu)yi + 2λxu =
−2Y T
Cu
p(u) + 2Y T
Cu
Yxu + 2λxu
Setting dL/dxu = 0 for optimal solution gives us (Y T CuY + λI)xu = Y T Cup(u)
xu can be obtained by solving system of linear equations:
xu = solve(Y T
Cu
Y + λI, Y T
Cu
p(u))
Alternating Least Squares for implicit feedback
Similarly for fixed X:
dL/dyi = −2XT Ci p(i) + 2XT Ci Yyi + 2λyi
yi = solve(XT Ci X + λI, XT Ci p(i))
Another optimization:
XT Ci X = XT X + XT (Ci − I)X
Y T CuY = Y T Y + Y T (Cu − I)Y
XT X and Y T Y can be precomputed
Accelerated Approximate Alternating Least Squares
yi = solve(XT Ci X + λI, XT Ci p(i))
Iterative methods
Conjugate Gradient
Coordinate Descend
Fixed number of steps of (usually 3-4 is enough):
Inference time
How to make recommendations for new users?
There are no user embeddings since users are not in original matrix!
Inference time
Make one step on ALS with fixed item embeddings matrix => get new user embeddings:
given Y fixed, Cnew - new user-item interactions confidence
xunew = solve(Y T Cunew Y + λI, Y T Cunew p(unew ))
scores = Xnew Y T
WRMF Implementations
python implicit - implemets Conjugate Gradient. With GPU support recently!
R reco - implemets Conjugate Gradient
Spark ALS
Quora qmf
Google tensorflow
*titles are clickable
Linear-Flow
Idea is to learn item-item similarity matrix W from the data.
First
min J = ||X − XWk||F + λ||Wk||F
With constraint:
rank(W ) ≤ k
Linear-Flow observations
1. Whithout L2 regularization optimal solution is Wk = QkQT
k where
SVDk(X) = PkΣkQT
k
2. Whithout rank(W ) ≤ k optimal solution is just solution for ridge regression:
W = (XT X + λI)−1XT X - infeasible.
Linear-Flow reparametrization
SVDk(X) = PkΣkQT
k
Let W = QkY :
argmin(Y ) : ||X − XQkY ||F + λ||QkY ||F
Motivation
λ = 0 => W = QkQT
k and also soliton for current problem Y = QT
k
Linear-Flow closed-form solution
Notice that if Qk orthogogal then ||QkY ||F = ||Y ||F
Solve ||X − XQkY ||F + λ||Y ||F
Simple ridge regression with close form solution
Y = (QT
k XT
XQk + λI)−1
QT
k XT
X
Very cheap inversion of the matrix of rank k!
Linear-Flow hassle-free cross-validation
Y = (QT
k XT
XQk + λI)−1
QT
k XT
X
How to find lamda with cross-validation?
pre-compute Z = QT
k XT X so Y = (ZQk + λI)−1Z -
pre-compute ZQk
notice that value of lambda affects only diagonal of ZQk
generate sequence of lambda (say of length 50) based on min/max diagonal values
solving 50 rigde regression of a small rank is super-fast
Linear-Flow hassle-free cross-validation
Figure 7:
Suggestions
start simple - SVD, WRMF
design proper cross-validation - both objective and data split
think about how to incorporate business logic (for example how to exclude
something)
use single machine implementations
think about inference time
don’t waste time with libraries/articles/blogposts wich demonstrate MF with dense
matrices
Questions?
http://dsnotes.com/tags/recommender-systems/
https://github.com/dselivanov/reco
Contacts:
selivanov.dmitriy@gmail.com
https://github.com/dselivanov
https://www.linkedin.com/in/dselivanov1

More Related Content

PDF
Sequence Modelling with Deep Learning
PPTX
Matrix factorization
PPTX
Cross-department Kanban Systems - 3 dimensions of scaling #llkd15
PDF
PPTX
Graph Representation Learning
PDF
Deep Learning for Recommender Systems RecSys2017 Tutorial
PDF
Deep Learning for Recommender Systems
PPTX
Kanban Cadences & Information Flow
Sequence Modelling with Deep Learning
Matrix factorization
Cross-department Kanban Systems - 3 dimensions of scaling #llkd15
Graph Representation Learning
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems
Kanban Cadences & Information Flow

What's hot (20)

PPTX
Travelling Salesman Problem
PDF
Prompt Engineering
PDF
Introduction to Natural Language Processing (NLP)
PDF
Conversational AI and Chatbot Integrations
PDF
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
PPTX
Deep Learning Tutorial
PDF
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
PDF
An introduction to the Transformers architecture and BERT
PPTX
Genetic programming
PPTX
Natural Language Processing in AI
PDF
Concept Drift: Monitoring Model Quality In Streaming ML Applications
PPTX
Steve McGee | Leadership Practices - Kanban Maturity Model
PPTX
Traveling salesman problem
PDF
NLP using transformers
PPTX
Group and Community Detection in Social Networks
PDF
Learning to Rank - From pairwise approach to listwise
PDF
Word2Vec
PDF
Recommendation System Explained
PDF
MLOps Using MLflow
PDF
Sequential Decision Making in Recommendations
Travelling Salesman Problem
Prompt Engineering
Introduction to Natural Language Processing (NLP)
Conversational AI and Chatbot Integrations
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Deep Learning Tutorial
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
An introduction to the Transformers architecture and BERT
Genetic programming
Natural Language Processing in AI
Concept Drift: Monitoring Model Quality In Streaming ML Applications
Steve McGee | Leadership Practices - Kanban Maturity Model
Traveling salesman problem
NLP using transformers
Group and Community Detection in Social Networks
Learning to Rank - From pairwise approach to listwise
Word2Vec
Recommendation System Explained
MLOps Using MLflow
Sequential Decision Making in Recommendations
Ad

Viewers also liked (10)

PDF
Recsys matrix-factorizations
PDF
Disorder And Tolerance In Distributed Systems At Scale
PDF
Nelson: Rigorous Deployment for a Functional World
PDF
Finding similar items in high dimensional spaces locality sensitive hashing
PDF
Return of the transaction king
PPTX
Analyzing Functional Programs
PDF
Pythonが動く仕組み(の概要)
PDF
JVM上で動くPython処理系実装のススメ
PDF
機械学習のためのベイズ最適化入門
PDF
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
Recsys matrix-factorizations
Disorder And Tolerance In Distributed Systems At Scale
Nelson: Rigorous Deployment for a Functional World
Finding similar items in high dimensional spaces locality sensitive hashing
Return of the transaction king
Analyzing Functional Programs
Pythonが動く仕組み(の概要)
JVM上で動くPython処理系実装のススメ
機械学習のためのベイズ最適化入門
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
Ad

Similar to Matrix Factorizations for Recommender Systems (20)

PDF
Matrix Factorization
PDF
Building Data Pipelines for Music Recommendations at Spotify
PPTX
PhD Consortium ADBIS presetation.
PDF
CF Models for Music Recommendations At Spotify
PDF
Music Recommendations at Scale with Spark
PDF
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
PDF
Recommendation System --Theory and Practice
PPTX
lecture26-mf.pptx
PPTX
lecture244-mf.pptx
PDF
Modeling Social Data, Lecture 8: Recommendation Systems
PPTX
Recommendation system
PDF
Mining at scale with latent factor models for matrix completion
PPTX
Collaborative Filtering at Spotify
PDF
Scala Data Pipelines for Music Recommendations
PPT
Download
PPT
Download
PDF
PDF
Introduction to behavior based recommendation system
PDF
Recommender system
PDF
Latent Factor Model For Collaborative Filtering
Matrix Factorization
Building Data Pipelines for Music Recommendations at Spotify
PhD Consortium ADBIS presetation.
CF Models for Music Recommendations At Spotify
Music Recommendations at Scale with Spark
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
Recommendation System --Theory and Practice
lecture26-mf.pptx
lecture244-mf.pptx
Modeling Social Data, Lecture 8: Recommendation Systems
Recommendation system
Mining at scale with latent factor models for matrix completion
Collaborative Filtering at Spotify
Scala Data Pipelines for Music Recommendations
Download
Download
Introduction to behavior based recommendation system
Recommender system
Latent Factor Model For Collaborative Filtering

Recently uploaded (20)

PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PPTX
MBA JAPAN: 2025 the University of Waseda
PDF
Microsoft 365 products and services descrption
PPTX
chrmotography.pptx food anaylysis techni
PDF
An essential collection of rules designed to help businesses manage and reduc...
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
ai agent creaction with langgraph_presentation_
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Global Data and Analytics Market Outlook Report
PPTX
recommendation Project PPT with details attached
PPTX
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
PPT
Image processing and pattern recognition 2.ppt
PPT
expt-design-lecture-12 hghhgfggjhjd (1).ppt
PPTX
Business_Capability_Map_Collection__pptx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
1 hour to get there before the game is done so you don’t need a car seat for ...
MBA JAPAN: 2025 the University of Waseda
Microsoft 365 products and services descrption
chrmotography.pptx food anaylysis techni
An essential collection of rules designed to help businesses manage and reduc...
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Session 11 - Data Visualization Storytelling (2).pdf
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
ai agent creaction with langgraph_presentation_
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Global Data and Analytics Market Outlook Report
recommendation Project PPT with details attached
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
Image processing and pattern recognition 2.ppt
expt-design-lecture-12 hghhgfggjhjd (1).ppt
Business_Capability_Map_Collection__pptx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx

Matrix Factorizations for Recommender Systems

  • 1. Matrix Factorizations for Recommender Systems Dmitriy Selivanov selivanov.dmitriy@gmail.com 2017-11-16
  • 2. Recommender systems are everywhere Figure 1:
  • 3. Recommender systems are everywhere Figure 2:
  • 4. Recommender systems are everywhere Figure 3:
  • 5. Recommender systems are everywhere Figure 4:
  • 6. Goals Propose “relevant” items to customers Retention Exploration Up-sale Personalized offers recommended items for a customer given history of activities (transactions, browsing history, favourites) Similar items substitutions bundles - frequently bought together . . .
  • 7. Live demo Dataset - LastFM-360K: 360k users 160k artists 17M observations sparsity - 0.9999999
  • 8. Explicit feedback Ratings, likes/dislikes, purchases: cleaner data smaller hard to collect RMSE2 = 1 D u,i∈D (rui − ˆrui )2
  • 9. Netflix prize ~ 480k users, 18k movies, 100m ratings sparsity ~ 90% goal is to reduce RMSE by 10% - from 0.9514 to 0.8563
  • 10. Implicit feedback noisy feedback (click, likes, purchases, search, . . . ) much easier to collect wider user/item coverage usually sparsity > 99.9% One-Class Collaborative Filtering observed entries are positive preferences should have high confidence missed entries in matrix are mix of negative preferences and positive preferences consider them as negative with low confidence we cannot really distinguish that user did not click a banner because of a lack of interest or lack of awareness
  • 11. Evaluation Recap: we only care about how to produce small set of highly relevant items. RMSE is bad metrics - very weak connection to business goals. Only interested about relevance precision of retreived items: space on the screen is limited only order matters - most relevant items should be in top
  • 12. Ranking - Mean average precision AveragePrecision = n k=1 (P(k)×rel(k)) number of relevant documents ## index relevant precision_at_k ## 1: 1 0 0.0000000 ## 2: 2 0 0.0000000 ## 3: 3 1 0.3333333 ## 4: 4 0 0.2500000 ## 5: 5 0 0.2000000 map@5 = 0.1566667
  • 13. Ranking - Normalized Discounted Cumulative Gain Intuition is the same as for MAP@K, but also takes into account value of relevance: DCGp = p i=1 2reli − 1 log2(i + 1) nDCGp = DCGp IDCGp IDCGp = |REL| i=1 2reli − 1 log2(i + 1)
  • 14. Approaches Content based good for cold start not personalized Collaborative filtering vanilla collaborative fitlering matrix factorizations . . . Hybrid and context aware recommender systems best of two worlds
  • 15. Focus today WRMF (Weighted Regularized Matrix Factorization) - Collaborative Filtering for Implicit Feedback Datasets (2008) efficient learning with accelerated approximate Alternating Least Squares inference time Linear-FLow - Practical Linear Models for Large-Scale One-Class Collaborative Filtering (2016) efficient truncated SVD cheap cross-validation with full path regularization
  • 16. Matrix Factorizations Users can be described by small number of latent factors puk Items can be described by small number of latent factors qki
  • 18. Low rank matrix factorization R = P × Q factors users items factors
  • 20. Truncated SVD Take k largest singular values: X ≈ UkDkV T k - Xk ∈ Rm∗n - Uk, V - columns are orthonormal bases (dot product of any 2 columns is zero, unit norm) - Dk - matrix with singular values on diagonal Truncated SVD is the best rank k approximation of the matrix X in terms of Frobenius norm: ||X − UkDkV T k ||F P = Uk Dk Q = DkV T k
  • 21. Issue with truncated SVD for “explicit” feedback Optimal in terms of Frobenius norm - takes into account zeros in ratings - RMSE = 1 users × items u∈users,i∈items (rui − ˆrui )2 Overfits data Objective = error only in “observed” ratings: RMSE = 1 Observed u,i∈Observed (rui − ˆrui )2
  • 22. SVD-like matrix factorization with ALS J = u,i∈Observed (rui − pu × qi )2 + λ(||Q2 || + ||P2 ||) Given Q fixed solve for p: min i∈Observed (ri − qi × P)2 + λ u j=1 p2 j Given P fixed solve for q: min u∈Observed (ru − pu × Q)2 + λ i j=1 q2 j Ridge regression: P = (QT Q + λI)−1QT r, Q = (PT P + λI)−1PT r
  • 23. “Collaborative Filtering for Implicit Feedback Datasets” WRMF - Weighted Regularized Matrix Factorization “Default” approach Proposed in 2008, but still widely used in industry (even at youtube) several high-quality open-source implementations J = u,i Cui (Pui − XuYi )2 + λ(||X||F + ||Y ||F ) Preferences - binary Pij = 1 if Rij > 0 0 otherwise Confidence - Cui = 1 + f (Rui )
  • 24. Alternating Least Squares for implicit feedback For fixed Y : dL/dxu = −2 i=item cui (pui − xT u yi )yi + 2λxu = −2 i=item cui (pui − yT i xu)yi + 2λxu = −2Y T Cu p(u) + 2Y T Cu Yxu + 2λxu Setting dL/dxu = 0 for optimal solution gives us (Y T CuY + λI)xu = Y T Cup(u) xu can be obtained by solving system of linear equations: xu = solve(Y T Cu Y + λI, Y T Cu p(u))
  • 25. Alternating Least Squares for implicit feedback Similarly for fixed X: dL/dyi = −2XT Ci p(i) + 2XT Ci Yyi + 2λyi yi = solve(XT Ci X + λI, XT Ci p(i)) Another optimization: XT Ci X = XT X + XT (Ci − I)X Y T CuY = Y T Y + Y T (Cu − I)Y XT X and Y T Y can be precomputed
  • 26. Accelerated Approximate Alternating Least Squares yi = solve(XT Ci X + λI, XT Ci p(i)) Iterative methods Conjugate Gradient Coordinate Descend Fixed number of steps of (usually 3-4 is enough):
  • 27. Inference time How to make recommendations for new users? There are no user embeddings since users are not in original matrix!
  • 28. Inference time Make one step on ALS with fixed item embeddings matrix => get new user embeddings: given Y fixed, Cnew - new user-item interactions confidence xunew = solve(Y T Cunew Y + λI, Y T Cunew p(unew )) scores = Xnew Y T
  • 29. WRMF Implementations python implicit - implemets Conjugate Gradient. With GPU support recently! R reco - implemets Conjugate Gradient Spark ALS Quora qmf Google tensorflow *titles are clickable
  • 30. Linear-Flow Idea is to learn item-item similarity matrix W from the data. First min J = ||X − XWk||F + λ||Wk||F With constraint: rank(W ) ≤ k
  • 31. Linear-Flow observations 1. Whithout L2 regularization optimal solution is Wk = QkQT k where SVDk(X) = PkΣkQT k 2. Whithout rank(W ) ≤ k optimal solution is just solution for ridge regression: W = (XT X + λI)−1XT X - infeasible.
  • 32. Linear-Flow reparametrization SVDk(X) = PkΣkQT k Let W = QkY : argmin(Y ) : ||X − XQkY ||F + λ||QkY ||F Motivation λ = 0 => W = QkQT k and also soliton for current problem Y = QT k
  • 33. Linear-Flow closed-form solution Notice that if Qk orthogogal then ||QkY ||F = ||Y ||F Solve ||X − XQkY ||F + λ||Y ||F Simple ridge regression with close form solution Y = (QT k XT XQk + λI)−1 QT k XT X Very cheap inversion of the matrix of rank k!
  • 34. Linear-Flow hassle-free cross-validation Y = (QT k XT XQk + λI)−1 QT k XT X How to find lamda with cross-validation? pre-compute Z = QT k XT X so Y = (ZQk + λI)−1Z - pre-compute ZQk notice that value of lambda affects only diagonal of ZQk generate sequence of lambda (say of length 50) based on min/max diagonal values solving 50 rigde regression of a small rank is super-fast
  • 36. Suggestions start simple - SVD, WRMF design proper cross-validation - both objective and data split think about how to incorporate business logic (for example how to exclude something) use single machine implementations think about inference time don’t waste time with libraries/articles/blogposts wich demonstrate MF with dense matrices