SlideShare a Scribd company logo
Matrix Factorization
and
Collaborative Filtering
1
Matt Gormley
Lecture 26
November 30, 2016
School of Computer Science
Readings:
Koren et al. (2009)
Gemulla et al. (2011)
10-601B Introduction to Machine Learning
Reminders
• Homework 7
– due Mon., Dec. 5
• In-class Review Session
– Mon., Dec. 5
• Final Exam
– in-class Wed., Dec. 7
2
Outline
• Recommender Systems
– Content Filtering
– Collaborative Filtering
– CF: Neighborhood Methods
– CF: Latent Factor Methods
• Matrix Factorization
– User / item vectors
– Prediction model
– Training by SGD
• Extra: Matrix Multiplication in ML
– Matrix Factorization
– Linear Regression
– PCA
– (Autoencoders)
– K-means
3
RECOMMENDER SYSTEMS
4
Recommender Systems
A Common Challenge:
– Assume you’re a company
selling items of some sort:
movies, songs, products,
etc.
– Company collects millions
of ratings from users of
their items
– To maximize profit / user
happiness, you want to
recommend items that
users are likely to want
5
Recommender Systems
6
Recommender Systems
7
Recommender Systems
8
Recommender Systems
9
Problem Setup
• 500,000 users
• 20,000 movies
• 100 million ratings
• Goal: To obtain lower root mean squared error
(RMSE) than Netflix’s existing system on 3 million
held out ratings
Recommender Systems
10
Recommender Systems
• Setup:
– Items:
movies, songs, products, etc.
(often many thousands)
– Users:
watchers, listeners, purchasers, etc.
(often many millions)
– Feedback:
5-star ratings, not-clicking ‘next’,
purchases, etc.
• Key Assumptions:
– Can represent ratings numerically
as a user/item matrix
– Users only rate a small number of
items (the matrix is sparse)
11
Doctor
Strange
Star
Trek:
Beyond
Zootopia
Alice 1 5
Bob 3 4
Charlie 3 5 2
Recommender Systems
12
Two Types of Recommender Systems
Content Filtering
• Example: Pandora.com
music recommendations
(Music Genome Project)
• Con: Assumes access to
side information about
items (e.g. properties of a
song)
• Pro: Got a new item to
add? No problem, just be
sure to include the side
information
Collaborative Filtering
• Example: Netflix movie
recommendations
• Pro: Does not assume
access to side information
about items (e.g. does not
need to know about movie
genres)
• Con: Does not work on
new items that have no
ratings
13
Collaborative Filtering
• Everyday Examples of Collaborative Filtering...
– Bestseller lists
– Top 40 music lists
– The “recent returns” shelf at the library
– Unmarked but well-used paths thru the woods
– The printer room at work
– “Read any good books lately?”
– …
• Common insight: personal tastes are correlated
– If Alice and Bob both like X and Alice likes Y then
Bob is more likely to like Y
– especially (perhaps) if Bob knows Alice
15
Slide from William Cohen
Two Types of Collaborative Filtering
1. Neighborhood Methods 2. Latent Factor Methods
16
Figures from Koren et al. (2009)
Two Types of Collaborative Filtering
1. Neighborhood Methods
17
In the figure, assume that
a green line indicates the
movie was watched
Algorithm:
1. Find neighbors based
on similarity of movie
preferences
2. Recommend movies
that those neighbors
watched
Figures from Koren et al. (2009)
Two Types of Collaborative Filtering
2. Latent Factor Methods
18
Figures from Koren et al. (2009)
• Assume that both
movies and users
live in some low-
dimensional
space describing
their properties
• Recommend a
movie based on
its proximity to
the user in the
latent space
MATRIX FACTORIZATION
19
Matrix Factorization
• User vectors:
• Item vectors:
• Rating prediction:
20
Figures from Koren et al. (2009)
Matrix$
factor
this$
work?
Figures from Gemulla et al. (2011)
(with matrices)
• User vectors:
• Item vectors:
• Rating prediction:
Matrix Factorization
(with vectors)
21
Figures from Koren et al. (2009)
Matrix Factorization
• Set of non-zero entries:
• Objective:
22
Figures from Koren et al. (2009)
(with vectors)
Matrix Factorization
• Regularized Objective:
• SGD update for random (u,i):
23
Figures from Koren et al. (2009)
(with vectors)
Matrix Factorization
• Regularized Objective:
• SGD update for random (u,i):
24
Figures from Koren et al. (2009)
(with vectors)
Matrix Factorization
• User vectors:
• Item vectors:
• Rating prediction:
25
Figures from Koren et al. (2009)
Matrix$
factor
this$
work?
Figures from Gemulla et al. (2011)
(with matrices)
Matrix Factorization
• SGD
26
Figures from Koren et al. (2009)
Matrix$
factor
this$
work?
Figure from Gemulla et al. (2011)
(with matrices)
Matrix$
factorization$
as$
SGD$
V
$
why$
does$
this$
work?
step size
Figure from Gemulla et al. (2011)
Matrix Factorization
27
Figure from Koren et al. (2009)
Example
Factors
Matrix Factorization
28
ALS= alternating least squares
Comparison
of
Optimization
Algorithms
Figure from Gemulla et al. (2011)
MATRIX MULTIPLICATION IN
MACHINE LEARNING
29
Slides in this section from William Cohen
Recovering$
latent$
factors$
in$
a$
matrix
m movies
n
users
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm
v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V
30
Slide from William Cohen
31
…$
is$
like$
Linear$
Regression$
….
m=1$
regressors
n
instances
(e.g.,
150)
predictions
pl1 pw1 sl1 sw1
pl2 pw2 sl2 sw2
.. ..
… …
pln pwn
w1
w2
w3
w4
y1
…
yi
yn
~
Y[i,1]$
=$
instance$
i’s$
prediction
W
H
Y
r features(eg4)
Slide from William Cohen
32
..$
for$
many$
outputs$
at$
once….
m#
regressors
n
instances
(e.g.,
150)
predictions
pl1 pw1 sl1 sw1
pl2 pw2 sl2 sw2
.. ..
… …
pln …
w11 w12
w21 ..
w31 ..
w41 .. …
y11 y12
…
yn1
~
Y[I,j]$
=$
instance$
i’s$
prediction$
for$
regression$
task$
j
W
H
Y
ym
…
ynm
r features(eg4)
…$
where$
we$
also$
have$
to$
O
ind$
the$
dataset!
Slide from William Cohen
33
…$
vs$
PCA
m movies
n
users
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm
v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V
Minimize squared error
reconstruction error
and force the
“prototype” users to be
orthogonal ! PCA
Slide from William Cohen
34
Slide from William Cohen
35
…..$
vs$
kV
means
cluster means
n
examples
0 1
1 0
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm
v11 …
… …
vij
…
vnm
~
original data set
indicatorsfor r
clusters
Z
M
X
Slide from William Cohen
Recovering$
latent$
factors$
in$
a$
matrix
m movies
n
users
m movies
x1 y1
x2 y2
.. ..
… …
xn yn
a1 a2 .. … am
b1 b2 … … bm
v11 …
… …
vij
…
vnm
~
V[i,j] = user i’s rating of movie j
r
W
H
V
36
Slide from William Cohen
Summary
• Recommender systems solve many real-
world (*large-scale) problems
• Collaborative filtering by Matrix
Factorization (MF) is an efficient and
effective approach
• MF is just another example of a common
recipe:
1. define a model
2. define an objective function
3. optimize with SGD
37

More Related Content

PDF
Tutorial: Context In Recommender Systems
PPTX
Rokach-GomaxSlides (1).pptx
PPTX
Rokach-GomaxSlides.pptx
PDF
IntroductionRecommenderSystems_Petroni.pdf
PDF
Real-world News Recommender Systems
PDF
Ensemble Methods and Recommender Systems
PPTX
Recommendation system
PDF
Recent advances in deep recommender systems
Tutorial: Context In Recommender Systems
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides.pptx
IntroductionRecommenderSystems_Petroni.pdf
Real-world News Recommender Systems
Ensemble Methods and Recommender Systems
Recommendation system
Recent advances in deep recommender systems

Similar to lecture26-mf.pptx (20)

PPTX
Collaborative Filtering Recommendation System
PDF
Matrix Factorization
PPTX
Recommender Systems: Advances in Collaborative Filtering
PDF
Recommender systems in practice
PPT
Download
PPT
Download
PPT
Collaborative filtering using orthogonal nonnegative matrix
PDF
Beginners Guide to Non-Negative Matrix Factorization
PDF
Notes on Recommender Systems pdf 2nd module
PDF
Bando de Dados Avançados - Recommender Systems
PPTX
Teacher training material
PDF
Nbe rtopicsandrecomvlecture1
PDF
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PDF
Tutorial 14 (collaborative filtering)
PDF
Recommendation System Explained
PDF
ESSIR 2013 Recommender Systems tutorial
PDF
Recommender Systems from A to Z – Model Evaluation
PPTX
Олександр Обєдніков “Рекомендательные системы”
PDF
Product Recommendation System​ By Using Collaborative Filtering and Network B...
PPTX
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
Collaborative Filtering Recommendation System
Matrix Factorization
Recommender Systems: Advances in Collaborative Filtering
Recommender systems in practice
Download
Download
Collaborative filtering using orthogonal nonnegative matrix
Beginners Guide to Non-Negative Matrix Factorization
Notes on Recommender Systems pdf 2nd module
Bando de Dados Avançados - Recommender Systems
Teacher training material
Nbe rtopicsandrecomvlecture1
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
Tutorial 14 (collaborative filtering)
Recommendation System Explained
ESSIR 2013 Recommender Systems tutorial
Recommender Systems from A to Z – Model Evaluation
Олександр Обєдніков “Рекомендательные системы”
Product Recommendation System​ By Using Collaborative Filtering and Network B...
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
Ad

More from Jadna Almeida (20)

PPTX
Introdução Segurança e Auditoria.pptx
PPTX
Tópicos em Sistemas de Informação e Web I.pptx
PPT
PadroesGRASP.ppt
PPT
lect22.ppt
PPT
Aula 02- Projeto de Interfaces.ppt
PDF
Aula03_04_ModelosProcessos.pdf
PDF
2019_Aula 1 - Introdução à Engenharia de Software.pdf
PDF
Aula 01 e 02 - Engenharia de Software.pdf
PDF
Aula 08LingProgrMauricio.pdf
PDF
Slides 02 - Orientacao a Objetos.pdf
PDF
Slides 04 - A Linguagem Java.pdf
PDF
poo-aula01.pdf
PPT
Aula 2 - Introducao e Algoritmos.ppt
PDF
A04_Orientacao a Objetos 02.pdf
PDF
POO2 - Orientacao a Objetos (1).pdf
PPT
linguagens_de_programacao.ppt
PPTX
Aula 2 - Introducao a Algoritmo.pptx
PPT
COMP6411.1.history.ppt
PPT
22_ideals (1).ppt
PPTX
lecture244-mf.pptx
Introdução Segurança e Auditoria.pptx
Tópicos em Sistemas de Informação e Web I.pptx
PadroesGRASP.ppt
lect22.ppt
Aula 02- Projeto de Interfaces.ppt
Aula03_04_ModelosProcessos.pdf
2019_Aula 1 - Introdução à Engenharia de Software.pdf
Aula 01 e 02 - Engenharia de Software.pdf
Aula 08LingProgrMauricio.pdf
Slides 02 - Orientacao a Objetos.pdf
Slides 04 - A Linguagem Java.pdf
poo-aula01.pdf
Aula 2 - Introducao e Algoritmos.ppt
A04_Orientacao a Objetos 02.pdf
POO2 - Orientacao a Objetos (1).pdf
linguagens_de_programacao.ppt
Aula 2 - Introducao a Algoritmo.pptx
COMP6411.1.history.ppt
22_ideals (1).ppt
lecture244-mf.pptx
Ad

Recently uploaded (20)

PPTX
1_Introduction to advance data techniques.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Database Infoormation System (DBIS).pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Foundation of Data Science unit number two notes
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Computer network topology notes for revision
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
1_Introduction to advance data techniques.pptx
annual-report-2024-2025 original latest.
Database Infoormation System (DBIS).pptx
ISS -ESG Data flows What is ESG and HowHow
Foundation of Data Science unit number two notes
Business Acumen Training GuidePresentation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Clinical guidelines as a resource for EBP(1).pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
climate analysis of Dhaka ,Banglades.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Computer network topology notes for revision
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Miokarditis (Inflamasi pada Otot Jantung)

lecture26-mf.pptx

  • 1. Matrix Factorization and Collaborative Filtering 1 Matt Gormley Lecture 26 November 30, 2016 School of Computer Science Readings: Koren et al. (2009) Gemulla et al. (2011) 10-601B Introduction to Machine Learning
  • 2. Reminders • Homework 7 – due Mon., Dec. 5 • In-class Review Session – Mon., Dec. 5 • Final Exam – in-class Wed., Dec. 7 2
  • 3. Outline • Recommender Systems – Content Filtering – Collaborative Filtering – CF: Neighborhood Methods – CF: Latent Factor Methods • Matrix Factorization – User / item vectors – Prediction model – Training by SGD • Extra: Matrix Multiplication in ML – Matrix Factorization – Linear Regression – PCA – (Autoencoders) – K-means 3
  • 5. Recommender Systems A Common Challenge: – Assume you’re a company selling items of some sort: movies, songs, products, etc. – Company collects millions of ratings from users of their items – To maximize profit / user happiness, you want to recommend items that users are likely to want 5
  • 9. Recommender Systems 9 Problem Setup • 500,000 users • 20,000 movies • 100 million ratings • Goal: To obtain lower root mean squared error (RMSE) than Netflix’s existing system on 3 million held out ratings
  • 11. Recommender Systems • Setup: – Items: movies, songs, products, etc. (often many thousands) – Users: watchers, listeners, purchasers, etc. (often many millions) – Feedback: 5-star ratings, not-clicking ‘next’, purchases, etc. • Key Assumptions: – Can represent ratings numerically as a user/item matrix – Users only rate a small number of items (the matrix is sparse) 11 Doctor Strange Star Trek: Beyond Zootopia Alice 1 5 Bob 3 4 Charlie 3 5 2
  • 13. Two Types of Recommender Systems Content Filtering • Example: Pandora.com music recommendations (Music Genome Project) • Con: Assumes access to side information about items (e.g. properties of a song) • Pro: Got a new item to add? No problem, just be sure to include the side information Collaborative Filtering • Example: Netflix movie recommendations • Pro: Does not assume access to side information about items (e.g. does not need to know about movie genres) • Con: Does not work on new items that have no ratings 13
  • 14. Collaborative Filtering • Everyday Examples of Collaborative Filtering... – Bestseller lists – Top 40 music lists – The “recent returns” shelf at the library – Unmarked but well-used paths thru the woods – The printer room at work – “Read any good books lately?” – … • Common insight: personal tastes are correlated – If Alice and Bob both like X and Alice likes Y then Bob is more likely to like Y – especially (perhaps) if Bob knows Alice 15 Slide from William Cohen
  • 15. Two Types of Collaborative Filtering 1. Neighborhood Methods 2. Latent Factor Methods 16 Figures from Koren et al. (2009)
  • 16. Two Types of Collaborative Filtering 1. Neighborhood Methods 17 In the figure, assume that a green line indicates the movie was watched Algorithm: 1. Find neighbors based on similarity of movie preferences 2. Recommend movies that those neighbors watched Figures from Koren et al. (2009)
  • 17. Two Types of Collaborative Filtering 2. Latent Factor Methods 18 Figures from Koren et al. (2009) • Assume that both movies and users live in some low- dimensional space describing their properties • Recommend a movie based on its proximity to the user in the latent space
  • 19. Matrix Factorization • User vectors: • Item vectors: • Rating prediction: 20 Figures from Koren et al. (2009) Matrix$ factor this$ work? Figures from Gemulla et al. (2011) (with matrices)
  • 20. • User vectors: • Item vectors: • Rating prediction: Matrix Factorization (with vectors) 21 Figures from Koren et al. (2009)
  • 21. Matrix Factorization • Set of non-zero entries: • Objective: 22 Figures from Koren et al. (2009) (with vectors)
  • 22. Matrix Factorization • Regularized Objective: • SGD update for random (u,i): 23 Figures from Koren et al. (2009) (with vectors)
  • 23. Matrix Factorization • Regularized Objective: • SGD update for random (u,i): 24 Figures from Koren et al. (2009) (with vectors)
  • 24. Matrix Factorization • User vectors: • Item vectors: • Rating prediction: 25 Figures from Koren et al. (2009) Matrix$ factor this$ work? Figures from Gemulla et al. (2011) (with matrices)
  • 25. Matrix Factorization • SGD 26 Figures from Koren et al. (2009) Matrix$ factor this$ work? Figure from Gemulla et al. (2011) (with matrices) Matrix$ factorization$ as$ SGD$ V $ why$ does$ this$ work? step size Figure from Gemulla et al. (2011)
  • 26. Matrix Factorization 27 Figure from Koren et al. (2009) Example Factors
  • 27. Matrix Factorization 28 ALS= alternating least squares Comparison of Optimization Algorithms Figure from Gemulla et al. (2011)
  • 28. MATRIX MULTIPLICATION IN MACHINE LEARNING 29 Slides in this section from William Cohen
  • 29. Recovering$ latent$ factors$ in$ a$ matrix m movies n users m movies x1 y1 x2 y2 .. .. … … xn yn a1 a2 .. … am b1 b2 … … bm v11 … … … vij … vnm ~ V[i,j] = user i’s rating of movie j r W H V 30 Slide from William Cohen
  • 30. 31 …$ is$ like$ Linear$ Regression$ …. m=1$ regressors n instances (e.g., 150) predictions pl1 pw1 sl1 sw1 pl2 pw2 sl2 sw2 .. .. … … pln pwn w1 w2 w3 w4 y1 … yi yn ~ Y[i,1]$ =$ instance$ i’s$ prediction W H Y r features(eg4) Slide from William Cohen
  • 31. 32 ..$ for$ many$ outputs$ at$ once…. m# regressors n instances (e.g., 150) predictions pl1 pw1 sl1 sw1 pl2 pw2 sl2 sw2 .. .. … … pln … w11 w12 w21 .. w31 .. w41 .. … y11 y12 … yn1 ~ Y[I,j]$ =$ instance$ i’s$ prediction$ for$ regression$ task$ j W H Y ym … ynm r features(eg4) …$ where$ we$ also$ have$ to$ O ind$ the$ dataset! Slide from William Cohen
  • 32. 33 …$ vs$ PCA m movies n users m movies x1 y1 x2 y2 .. .. … … xn yn a1 a2 .. … am b1 b2 … … bm v11 … … … vij … vnm ~ V[i,j] = user i’s rating of movie j r W H V Minimize squared error reconstruction error and force the “prototype” users to be orthogonal ! PCA Slide from William Cohen
  • 34. 35 …..$ vs$ kV means cluster means n examples 0 1 1 0 .. .. … … xn yn a1 a2 .. … am b1 b2 … … bm v11 … … … vij … vnm ~ original data set indicatorsfor r clusters Z M X Slide from William Cohen
  • 35. Recovering$ latent$ factors$ in$ a$ matrix m movies n users m movies x1 y1 x2 y2 .. .. … … xn yn a1 a2 .. … am b1 b2 … … bm v11 … … … vij … vnm ~ V[i,j] = user i’s rating of movie j r W H V 36 Slide from William Cohen
  • 36. Summary • Recommender systems solve many real- world (*large-scale) problems • Collaborative filtering by Matrix Factorization (MF) is an efficient and effective approach • MF is just another example of a common recipe: 1. define a model 2. define an objective function 3. optimize with SGD 37