SlideShare a Scribd company logo
AIM3 – Scalable Data Analysis and Data
                     Mining



             11 – Latent factor models for Collaborative Filtering
              Sebastian Schelter, Christoph Boden, Volker Markl




         Fachgebiet Datenbanksysteme und Informationsmanagement
                        Technische Universität Berlin

20.06.2012
                         http://www.dima.tu-berlin.de/
                                  DIMA – TU Berlin                   1
Recap: Item-Based Collaborative Filtering


Itembased Collaborative Filtering


    • compute pairwise similarities of the columns of
      the rating matrix using some similarity measure
    • store top 20 to 50 most similar items per item
      in the item-similarity matrix
    • prediction: use a weighted sum over all items
      similar to the unknown item that have been
      rated by the current user


              p ui =
                          j S ( i , u )
                                            s ij ruj

                            j S ( i , u )
                                              s 
                                                ij



 20.06.2012                  DIMA – TU Berlin          2
Drawbacks of similarity-based neighborhood
       methods


   • the assumption that a rating is defined by all the
     user's ratings for commonly co-rated items is
     hard to justify in general

   • lack of bias correction

   • every co-rated item is looked at in isolation,
     say a movie was similar to „Lord of the Rings“, do
     we want each part to of the trilogy to contribute as
     a single similar item?

   • best choice of similarity measure is based on
     experimentation not on mathematical reasons

20.06.2012              DIMA – TU Berlin            3
Latent factor models


■ Idea

    • ratings are deeply influenced by a set of factors that are
      very specific to the domain (e.g. amount of action in movies,
      complexity of characters)

    • these factors are in general not obvious, we might be able to
      think of some of them but it's hard to estimate their impact on
      the ratings

    • the goal is to infer those so called latent factors from the
      rating data by using mathematical techniques




 20.06.2012                   DIMA – TU Berlin                  4
Latent factor models

■ Approach

    • users and items are characterized by latent                                n
                                                                                     f
      factors, each user and item is mapped onto     ui ,m       j
                                                                      R
      a latent feature space

    • each rating is approximated by the dot                             T
                                                    rij  m j u i
      product of the user feature vector
      and the item feature vector

    • prediction of unknown ratings also uses
      this dot product

    • squared error as a measure of loss            r   ij
                                                                     T
                                                               m j ui          2




 20.06.2012                  DIMA – TU Berlin                            5
Latent factor models


■ Approach

    • decomposition of the rating matrix into the product of a user
      feature and an item feature matrix
    • row in U: vector of a user's affinity to the features
    • row in M: vector of an item's relation to the features

    • closely related to Singular Value Decomposition which
      produces an optimal low-rank optimization of a matrix



                                                MT
              R          ≈           U




 20.06.2012                  DIMA – TU Berlin              6
Latent factor models


■ Properties of the decomposition
   • automatically ranks features by their „impact“ on the ratings
   • features might not necessarily be intuitively understandable




  20.06.2012                 DIMA – TU Berlin                 7
Latent factor models

■ Problematic situation with explicit feedback data

    • the rating matrix is not only sparse, but partially defined,
      missing entries cannot be interpreted as 0 they are just
      unknown
    • standard decomposition algorithms like Lanczos method for
      SVD are not applicable

Solution

    • decomposition has to be done using the known ratings only
    • find the set of user and item feature vectors that minimizes the
      squared error to the known ratings



                                   r            m j ui 
                                                     T        2
                     min   U, M           i, j




 20.06.2012                       DIMA – TU Berlin                8
Latent factor models


■ quality of the decomposition is not measured with respect to
  the reconstruction error to the original data, but with
  respect to the generalization to unseen data
■ regularization necessary to avoid overfitting

■ model has hyperparameters (regularization, learning rate)
  that need to be chosen

■ process: split data into training, test and validation set
    □   train model using the training set
    □   choose hyperparameters according to performance on the test set
    □   evaluate generalization on the validation set
    □   ensure that each datapoint is used in each set once
        (cross-validation)



 20.06.2012                      DIMA – TU Berlin                    9
Stochastic Gradient Descent


   • add a regularizarion term

       min       U, M    r   i, j
                                          T
                                       m j ui   
                                                     2
                                                            
                                                         + λ ui
                                                                        2
                                                                            + m   j
                                                                                      2
                                                                                          
   • loop through all ratings in the training set, compute
     associated prediction error
                               T
       e ui = rij  m j u i

   • modify parameters in the opposite direction of the gradient

        u i  u i + γ e u, i m                      j
                                                          λu       i
                                                                        
        m    j
                   m j + γ e u, i u i  λm                    j
                                                                    
   • problem: approach is inherently sequential (although recent
     research might have unveiled a parallelization technique)



20.06.2012                                                      DIMA – TU Berlin              10
Alternating Least Squares with
        Weighted λ-Regularization
■ Model

    • feature matrices are modeled directly by using only
      the observed ratings
    • add a regularization term to avoid overfitting
    • minimize regularized error of:

          f U, M   =    r   ij
                                      m j ui  + λ
                                         T    2
                                                       n   u
                                                                 i
                                                                     ui
                                                                          2
                                                                              +      nm
                                                                                           j
                                                                                               m   j
                                                                                                       2
                                                                                                           
Solving technique

    • fixing one of the unknown variable to make this a simple
      quadratic equation
    • rotate between fixing u and m until convergence
      („Alternating Least Squares“)



 20.06.2012                                       DIMA – TU Berlin                                             11
ALS-WR is scalable


■ Which properties make this approach scalable?

    • all the features in one iteration can be computed
      independently of each other
    • only a small portion of the data necessary to compute
      a feature vector

Parallelization with Map/Reduce

    • Computing user feature vectors: the mappers need to send
      each user's rating vector and the feature vectors of his/her
      rated items to the same reducer

    • Computing item feature vectors: the mappers need to send
      each item's rating vector and the feature vectors of users who
      rated it to the same reducer

 20.06.2012                  DIMA – TU Berlin                  12
Incorporating biases


■ Problem: explicit feedback data is highly biased
    □ some users tend to rate more extreme than others
    □ some items tend to get higher ratings than others


■ Solution: explicitly model biases
    □ the bias of a rating is model as a combination of the items average
      rating, the item bias and the user bias

         b ij    b i  b j


    □ the rating bias can be incorporated into the prediction

         rij    b i  b j  m j u i
                                  T
         ˆ




 20.06.2012                           DIMA – TU Berlin                13
Latent factor models


■ implicit feedback data is very different from explicit data!

    □ e.g. use the number of clicks on a product page of an online shop

    □   the whole matrix is defined!
    □   no negative feedback
    □   interactions that did not happen produce zero values
    □   however we should have only little confidence in these (maybe the user
        never had the chance to interact with these items)

    □ using standard decomposition techniques like SVD would give us a
      decomposition that is biased towards the zero entries, again not
      applicable




 20.06.2012                      DIMA – TU Berlin                     14
Latent factor models

■ Solution for working with implicit data:
  weighted matrix factorization

                                                                                           1        rij  0
■ create a binary preference matrix P                                             p ij    
                                                                                             0       rij  0
                                                                                           

■ each entry in this matrix can be weighted
  by a confidence function
    □ zero values should get low confidence                                       c ( i , j )  1   rij

    □ values that are based on a lot of interactions
      should get high confidence


■ confidence is incorporated into the model
    □ the factorization will ‚prefer‘ more confident values


  f U, M     =                        T
                     c ( i , j ) p ij  m j u i   
                                                      2
                                                          + λ      ui
                                                                          2
                                                                              +            m    j
                                                                                                      2
                                                                                                          
 20.06.2012                           DIMA – TU Berlin                                               15
Sources


   • Sarwar et al.: „Item-Based Collaborative Filtering
     Recommendation Algorithms“, 2001
   • Koren et al.: „Matrix Factorization Techniques for Recommender
     Systems“, 2009
   • Funk: „Netflix Update: Try This at Home“,
     http://sifter.org/~simon/journal/20061211.html, 2006
   • Zhou et al.: „Large-scale Parallel Collaborative Filtering for the
     Netflix Prize“, 2008
   • Hu et al.: „Collaborative Filtering for Implicit Feedback
     Datasets“, 2008




20.06.2012                   DIMA – TU Berlin                   16

More Related Content

PDF
Matrix Factorization Techniques For Recommender Systems
PDF
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
PPTX
Movie lens movie recommendation system
PDF
Matrix Factorization In Recommender Systems
PDF
Expert System With Python -1
PPTX
Spectral Clustering
PPTX
Recommendation Systems
PDF
The Science and the Magic of User Feedback for Recommender Systems
Matrix Factorization Techniques For Recommender Systems
Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI
Movie lens movie recommendation system
Matrix Factorization In Recommender Systems
Expert System With Python -1
Spectral Clustering
Recommendation Systems
The Science and the Magic of User Feedback for Recommender Systems

What's hot (20)

PPTX
Supervised learning
PDF
Deep Learning for Recommender Systems
PPTX
Recommender system
PDF
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
PDF
Introduction to Recommendation Systems
PPTX
Recommendation System
PDF
Moving Object Detection And Tracking Using CNN
PDF
Collaborative filtering
PPT
Machine Learning presentation.
PDF
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018
PPTX
Recommender Systems
PPTX
How to Build Recommender System with Content based Filtering
PPTX
Deep Learning With Neural Networks
PDF
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
PDF
Recommender systems
PDF
SVD and the Netflix Dataset
PPT
Recommendation system
PDF
Overview of recommender system
PPTX
Few shot learning/ one shot learning/ machine learning
PPTX
Diffusion models beat gans on image synthesis
Supervised learning
Deep Learning for Recommender Systems
Recommender system
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Introduction to Recommendation Systems
Recommendation System
Moving Object Detection And Tracking Using CNN
Collaborative filtering
Machine Learning presentation.
Tutorial on Sequence Aware Recommender Systems - ACM RecSys 2018
Recommender Systems
How to Build Recommender System with Content based Filtering
Deep Learning With Neural Networks
Presentation - Msc Thesis - Machine Learning Techniques for Short-Term Electr...
Recommender systems
SVD and the Netflix Dataset
Recommendation system
Overview of recommender system
Few shot learning/ one shot learning/ machine learning
Diffusion models beat gans on image synthesis
Ad

Viewers also liked (11)

PPTX
Simple Matrix Factorization for Recommendation in Mahout
PDF
国際化時代の40カ国語言語判定
PDF
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
PDF
RecSys 2015: Large-scale real-time product recommendation at Criteo
PDF
PDF
情報推薦システム入門:講義スライド
PPTX
Deep forest
PDF
JP Chaosmap 2015-2016
PDF
Beginners Guide to Non-Negative Matrix Factorization
PDF
Ensembles of example dependent cost-sensitive decision trees slides
PDF
機会学習ハッカソン:ランダムフォレスト
Simple Matrix Factorization for Recommendation in Mahout
国際化時代の40カ国語言語判定
どの言語でつぶやかれたのか、機械が知る方法 #WebDBf2013
RecSys 2015: Large-scale real-time product recommendation at Criteo
情報推薦システム入門:講義スライド
Deep forest
JP Chaosmap 2015-2016
Beginners Guide to Non-Negative Matrix Factorization
Ensembles of example dependent cost-sensitive decision trees slides
機会学習ハッカソン:ランダムフォレスト
Ad

Similar to Latent factor models for Collaborative Filtering (20)

PPT
Recommender Systems Tutorial (Part 2) -- Offline Components
PDF
Collaborative Filtering Based on Star Users
PDF
R package Recommendation Engine
PPTX
Context-aware similarities within the factorization framework (CaRR 2013 pres...
PDF
Comparing State-of-the-Art Collaborative Filtering Systems
PDF
Introduction to Matrix Factorization Methods Collaborative Filtering
PDF
Predicting performance in Recommender Systems - Poster
PPTX
Big Practical Recommendations with Alternating Least Squares
PDF
Diversity versus accuracy: solving the apparent dilemma facing recommender sy...
PPTX
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
PDF
Incremental Item-based Collaborative Filtering
PPTX
Recommendation system
PPTX
Context-aware similarities within the factorization framework - presented at ...
PPTX
Kddcup2011
PDF
Matrix Factorization
PPTX
Principal Component Analysis For Novelty Detection
PDF
Predicting performance in Recommender Systems - Slides
PDF
Introduction to behavior based recommendation system
PDF
Multivariate statistics
PDF
Machine learning for medical imaging data
Recommender Systems Tutorial (Part 2) -- Offline Components
Collaborative Filtering Based on Star Users
R package Recommendation Engine
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Comparing State-of-the-Art Collaborative Filtering Systems
Introduction to Matrix Factorization Methods Collaborative Filtering
Predicting performance in Recommender Systems - Poster
Big Practical Recommendations with Alternating Least Squares
Diversity versus accuracy: solving the apparent dilemma facing recommender sy...
Leveraging collaborativetaggingforwebitemdesign ajithajjarani
Incremental Item-based Collaborative Filtering
Recommendation system
Context-aware similarities within the factorization framework - presented at ...
Kddcup2011
Matrix Factorization
Principal Component Analysis For Novelty Detection
Predicting performance in Recommender Systems - Slides
Introduction to behavior based recommendation system
Multivariate statistics
Machine learning for medical imaging data

More from sscdotopen (9)

PDF
Co-occurrence Based Recommendations with Mahout, Scala and Spark
PDF
Bringing Algebraic Semantics to Mahout
PDF
Next directions in Mahout's recommenders
PDF
New Directions in Mahout's Recommenders
PDF
Introduction to Collaborative Filtering with Apache Mahout
PDF
Scalable Similarity-Based Neighborhood Methods with MapReduce
PDF
Large Scale Graph Processing with Apache Giraph
PDF
Introducing Apache Giraph for Large Scale Graph Processing
PDF
mahout-cf
Co-occurrence Based Recommendations with Mahout, Scala and Spark
Bringing Algebraic Semantics to Mahout
Next directions in Mahout's recommenders
New Directions in Mahout's Recommenders
Introduction to Collaborative Filtering with Apache Mahout
Scalable Similarity-Based Neighborhood Methods with MapReduce
Large Scale Graph Processing with Apache Giraph
Introducing Apache Giraph for Large Scale Graph Processing
mahout-cf

Recently uploaded (20)

PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
01-Introduction-to-Information-Management.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
RMMM.pdf make it easy to upload and study
PDF
Computing-Curriculum for Schools in Ghana
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Institutional Correction lecture only . . .
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Sports Quiz easy sports quiz sports quiz
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
O7-L3 Supply Chain Operations - ICLT Program
GDM (1) (1).pptx small presentation for students
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
01-Introduction-to-Information-Management.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Microbial disease of the cardiovascular and lymphatic systems
Supply Chain Operations Speaking Notes -ICLT Program
RMMM.pdf make it easy to upload and study
Computing-Curriculum for Schools in Ghana
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Institutional Correction lecture only . . .
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Sports Quiz easy sports quiz sports quiz
102 student loan defaulters named and shamed – Is someone you know on the list?
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...

Latent factor models for Collaborative Filtering

  • 1. AIM3 – Scalable Data Analysis and Data Mining 11 – Latent factor models for Collaborative Filtering Sebastian Schelter, Christoph Boden, Volker Markl Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin 20.06.2012 http://www.dima.tu-berlin.de/ DIMA – TU Berlin 1
  • 2. Recap: Item-Based Collaborative Filtering Itembased Collaborative Filtering • compute pairwise similarities of the columns of the rating matrix using some similarity measure • store top 20 to 50 most similar items per item in the item-similarity matrix • prediction: use a weighted sum over all items similar to the unknown item that have been rated by the current user p ui =  j S ( i , u ) s ij ruj  j S ( i , u ) s  ij 20.06.2012 DIMA – TU Berlin 2
  • 3. Drawbacks of similarity-based neighborhood methods • the assumption that a rating is defined by all the user's ratings for commonly co-rated items is hard to justify in general • lack of bias correction • every co-rated item is looked at in isolation, say a movie was similar to „Lord of the Rings“, do we want each part to of the trilogy to contribute as a single similar item? • best choice of similarity measure is based on experimentation not on mathematical reasons 20.06.2012 DIMA – TU Berlin 3
  • 4. Latent factor models ■ Idea • ratings are deeply influenced by a set of factors that are very specific to the domain (e.g. amount of action in movies, complexity of characters) • these factors are in general not obvious, we might be able to think of some of them but it's hard to estimate their impact on the ratings • the goal is to infer those so called latent factors from the rating data by using mathematical techniques 20.06.2012 DIMA – TU Berlin 4
  • 5. Latent factor models ■ Approach • users and items are characterized by latent n f factors, each user and item is mapped onto ui ,m j  R a latent feature space • each rating is approximated by the dot T rij  m j u i product of the user feature vector and the item feature vector • prediction of unknown ratings also uses this dot product • squared error as a measure of loss r ij T  m j ui  2 20.06.2012 DIMA – TU Berlin 5
  • 6. Latent factor models ■ Approach • decomposition of the rating matrix into the product of a user feature and an item feature matrix • row in U: vector of a user's affinity to the features • row in M: vector of an item's relation to the features • closely related to Singular Value Decomposition which produces an optimal low-rank optimization of a matrix MT R ≈ U 20.06.2012 DIMA – TU Berlin 6
  • 7. Latent factor models ■ Properties of the decomposition • automatically ranks features by their „impact“ on the ratings • features might not necessarily be intuitively understandable 20.06.2012 DIMA – TU Berlin 7
  • 8. Latent factor models ■ Problematic situation with explicit feedback data • the rating matrix is not only sparse, but partially defined, missing entries cannot be interpreted as 0 they are just unknown • standard decomposition algorithms like Lanczos method for SVD are not applicable Solution • decomposition has to be done using the known ratings only • find the set of user and item feature vectors that minimizes the squared error to the known ratings  r  m j ui  T 2 min U, M i, j 20.06.2012 DIMA – TU Berlin 8
  • 9. Latent factor models ■ quality of the decomposition is not measured with respect to the reconstruction error to the original data, but with respect to the generalization to unseen data ■ regularization necessary to avoid overfitting ■ model has hyperparameters (regularization, learning rate) that need to be chosen ■ process: split data into training, test and validation set □ train model using the training set □ choose hyperparameters according to performance on the test set □ evaluate generalization on the validation set □ ensure that each datapoint is used in each set once (cross-validation) 20.06.2012 DIMA – TU Berlin 9
  • 10. Stochastic Gradient Descent • add a regularizarion term min U, M  r i, j T  m j ui  2  + λ ui 2 + m j 2  • loop through all ratings in the training set, compute associated prediction error T e ui = rij  m j u i • modify parameters in the opposite direction of the gradient u i  u i + γ e u, i m j  λu i  m j  m j + γ e u, i u i  λm j  • problem: approach is inherently sequential (although recent research might have unveiled a parallelization technique) 20.06.2012 DIMA – TU Berlin 10
  • 11. Alternating Least Squares with Weighted λ-Regularization ■ Model • feature matrices are modeled directly by using only the observed ratings • add a regularization term to avoid overfitting • minimize regularized error of: f U, M =  r ij  m j ui  + λ T 2  n u i ui 2 +  nm j m j 2  Solving technique • fixing one of the unknown variable to make this a simple quadratic equation • rotate between fixing u and m until convergence („Alternating Least Squares“) 20.06.2012 DIMA – TU Berlin 11
  • 12. ALS-WR is scalable ■ Which properties make this approach scalable? • all the features in one iteration can be computed independently of each other • only a small portion of the data necessary to compute a feature vector Parallelization with Map/Reduce • Computing user feature vectors: the mappers need to send each user's rating vector and the feature vectors of his/her rated items to the same reducer • Computing item feature vectors: the mappers need to send each item's rating vector and the feature vectors of users who rated it to the same reducer 20.06.2012 DIMA – TU Berlin 12
  • 13. Incorporating biases ■ Problem: explicit feedback data is highly biased □ some users tend to rate more extreme than others □ some items tend to get higher ratings than others ■ Solution: explicitly model biases □ the bias of a rating is model as a combination of the items average rating, the item bias and the user bias b ij    b i  b j □ the rating bias can be incorporated into the prediction rij    b i  b j  m j u i T ˆ 20.06.2012 DIMA – TU Berlin 13
  • 14. Latent factor models ■ implicit feedback data is very different from explicit data! □ e.g. use the number of clicks on a product page of an online shop □ the whole matrix is defined! □ no negative feedback □ interactions that did not happen produce zero values □ however we should have only little confidence in these (maybe the user never had the chance to interact with these items) □ using standard decomposition techniques like SVD would give us a decomposition that is biased towards the zero entries, again not applicable 20.06.2012 DIMA – TU Berlin 14
  • 15. Latent factor models ■ Solution for working with implicit data: weighted matrix factorization 1 rij  0 ■ create a binary preference matrix P p ij   0 rij  0  ■ each entry in this matrix can be weighted by a confidence function □ zero values should get low confidence c ( i , j )  1   rij □ values that are based on a lot of interactions should get high confidence ■ confidence is incorporated into the model □ the factorization will ‚prefer‘ more confident values f U, M =   T c ( i , j ) p ij  m j u i  2 + λ  ui 2 +  m j 2  20.06.2012 DIMA – TU Berlin 15
  • 16. Sources • Sarwar et al.: „Item-Based Collaborative Filtering Recommendation Algorithms“, 2001 • Koren et al.: „Matrix Factorization Techniques for Recommender Systems“, 2009 • Funk: „Netflix Update: Try This at Home“, http://sifter.org/~simon/journal/20061211.html, 2006 • Zhou et al.: „Large-scale Parallel Collaborative Filtering for the Netflix Prize“, 2008 • Hu et al.: „Collaborative Filtering for Implicit Feedback Datasets“, 2008 20.06.2012 DIMA – TU Berlin 16