Introduction to Matrix Factorization Methods Collaborative Filtering

INTRODUCTION TO
MATRIX FACTORIZATION

proprietary material
METHODS COLLABORATIVE
FILTERING
USER RATINGS PREDICTION

1 Alex Lin
Senior Architect
Intelligent Mining

Outline
  Factoranalysis
  Matrix decomposition

  Matrix Factorization Model

  Minimizing Cost Function
  Common Implementation

2

Factor Analysis
  Aprocedure can help identify the factors that
might be used to explain the interrelationships
among the variables

  Model based approach

3

Refresher: Matrix Decomposition
r q p
5 x 6 matrix 5 x 3 matrix 3 x 6 matrix

X11 X12 X13 X14 X15 X16 x

X21 X22 X12 X24 X25 X26 y

X31 X32 X33 X34 X35 X36 a b c z

X41 X42 X43 X44 X45 X46

X51 X52 X53 X54 X55 X56

X32 = (a, b, c) . (x, y, z) = a * x + b * y + c * z

Rating Prediction rui = qT pu
ˆ i
User Preference Factor Vector
4

Movie Preference Factor Vector

Making Prediction as Filling Missing
Value
users
1 2 3 4 5 6 7 8 9 10 … n

1 5 4 4 3
2 3 ?
items

3 3 ? 5 ? 2
4 3 3 4 3

4 4
…

m

rui = qT pu
ˆ i
User Preference Factor Vector

5
Rating Prediction Movie Preference Factor Vector

€

Learn Factor Vectors
users
1 2 3 4 5 6 7 8 9 10 … n

1 5 4 4 3

2 3 ?
items

3 3 ? 5 ? 2
4 3 3 4 3

4 4
…

4 = U3-1 * I1-1 + U3-2 * I1-2 + U3-3 * I1-3 + U3-4 * I1-4
m
3 = U7-1 * I2-1 + U7-2 * I2-2 + U7-3 * I2-3 + U7-4 * I2-4
…..

3 = U86-1 * I12-1 + U86-2 * I12-2 + U86-3 * I12-3 + U86-4 * I12-4

Note: only train on known entries
2X + 3Y = 5 2X + 3Y = 5 6
4X - 2Y = 2 4X - 2Y = 2
3X - 2Y = 2

Why not use standard SVD?
  Standard SVD assumes all missing entries are
zero. This leads to bad prediction accuracy,
especially when dataset is extremely sparse. (98%

- 99.9%)
  See Appendix for SVD

  In some published literatures, they call Matrix
Factorization as SVD, but note it’s NOT the same
kind of classical low-rank SVD produced by
svdlibc.

7

How to Learn Factor Vectors
  How do we learn preference factor vectors (a, b, c) and
(x, y, z)?

  Minimize errors on the known ratings

To learn the factor
min
q*. p*
∑ (rui − x ui ) 2
vectors (pu and qi)
(u,i)∈k

Minimizing Cost Function
(Least Squares Problem)

rui : actual rating for user u on item I
€ xui : predicted rating for user u on item I

8

Data Normalization
  Remove Global mean
users

1 2 3 4 5 6 7 8 9 10 … n

1 1.5 -.9 -.2 .49

2 .79 ?
items

3 0.6 ? .46 ? -.4

4 .39 .82 .76 .69
…

.52 .8

m

9

Factorization Model
  Only Preference factors

min ∑ (rui − µ − qT pu ) 2

i
q*. p*
(u,i)∈k
To learn the factor vectors (pu and qi)

Rating = 4
€
Global Preference
Mean Factor

rui : actual rating of user u on item I
u : training rating average
bu : user u user bias 10
bi : item i item bias
qi : latent factor array of item i
pu : later factor array of user u

Adding Item Bias and User Bias
  Add Item bias and User bias as parameters

min ∑ (rui − µ − bi − bu − qT pu ) 2

i
q*. p*
(u,i)∈k
To learn Item bias and User bias

Rating = 4
€
Global Preference
Mean Factor

Item Bias User Bias
bu : user u user bias 11
bi : item i item bias

Regularization
  To prevent model overfitting
2 2
∑

min (rui − µ − bi − bu − q pu ) + λ ( qi + pu + bi2 + bu )
T
i
2 2
q*. p*
(u,i)∈k
Regularization to prevent overfitting

Rating = 4

Global Preference
Mean Factor

bu : user u user bias
Item Bias User Bias bi : item i item bias
12
λ : regularization Parameters

€

Optimize Factor Vectors
  Find optimal factor vectors - minimizing cost
function

  Algorithms:
  Stochastic gradient descent
  Others: Alternating least squares etc..
  Most frequently use:

13

Matrix Factorization Tuning
  Number of Factors in the Preference vectors
  Learning Rate of Gradient Descent

  Best result usually coming from different learning rate
for different parameter. Especially user/item bias terms.
  Parameters in Factorization Model
  Time dependent parameters
  Seasonality dependent parameters
  Many other considerations !

14

High-Level Implementation Steps
  Construct User-Item Matrix (sparse data structure!)
  Define factorization model - Cost function

  Take out global mean

  Decide what parameters in the model. (bias,
preference factor, anything else? SVD++)
  Minimizing cost function - model fitting
  Alternating least squares
  Assemble the predictions
  Evaluate predictions (RMSE, MAE etc..)

  Continue to tune the model
15

Thank you
  Any question or comment?

16

Appendix
  Stochastic Gradient Descent
  Batch Gradient Descent

  Singular Value Decomposition (SVD)

17

Stochastic Gradient Descent
Repeat Until Convergence {
for i=1 to m in random order {

θ j := θ j + α (y (i) − hθ (x ( i) ))x (ji) (for every j)
} partial derivative term
}

€ Your code Here:

18

Batch Gradient Descent
Repeat Until Convergence {
m
θ j := θ j + α ∑ (y (i) − hθ (x ( i) ))x (ji) (for every j)

i=1 partial derivative term
}

€ Your code Here:

19

Singular Value Decomposition
(SVD)
A = U × S ×VT
A U S VT
m x r matrix r x r matrix r x n matrix

m x n matrix

€
items

items

rank = k
k<r

users users

Ak = U k × Sk × VkT

20

€

Introduction to Matrix Factorization Methods Collaborative Filtering

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to Introduction to Matrix Factorization Methods Collaborative Filtering (10)

Recently uploaded (20)

Introduction to Matrix Factorization Methods Collaborative Filtering