SlideShare a Scribd company logo
A SCALABLE
    COLLABORATIVE
    FILTERING FRAMEWORK
    BASED ON CO-CLUSTERING
1   Authors/ Thomas George and Srujana Merugu
    Source/ ICDM’05, pp. 628-628
    Presenter/ Allen
OUTLINE
 Introduction
 Related Work

 Problem Definition

 Collaborative Filtering via Co-clustering

 Scalable Collaborative Filtering System

 Experimental Results

 Conclusion




                                              2
INTRODUCTION
   Due to the overwhelming increasing in web-based
    activities, users are often forced to choose from a large
    number of products or content items.

   To aid users in the decision making process, it has
    become increasingly important to design recommender
    systems.

   Collaborative filtering identify the likely preferences of a
    user based on the known preferences of other users.
                                                                   3
INTRODUCTION (CONT.)
   Existing collaborative filtering methods based on correlation criteria
      Singular value decomposition (SVD)
      Non-negative matrix factorization (NNMF)
           Drawbacks:
              Computationally expensive of training component




   The practical scenarios such as real-time news personalization
    require dynamic collaborative filtering.

   The key idea
      Simultaneously obtaining user and item neighborhoods via co-
       clustering.
      Generating predictions based on average ratings.                      4
INTRODUCTION (CONT.)
   Two new contributions:
     Dynamic      collaborative filtering approach
          Supporting the entry of new users, items and ratings via a hybrid of
           incremental and batch versions of the co-clustering algorithm.


     A scalable,    real-time collaborative filtering system
          Developing parallel versions of co-clustering, prediction and
           incremental training routines.


   Notation:
     A:   matrix, e.g. Aij denoting the corresponding matrix elements.
     χ: sets, and enumerated as {xi}ni=1, where xi are the elements of
                                                                                  5
      the set.
RELATED WORK
   Recommender System
     Content-based  filtering system
     Collaborative filtering system



   Co-clustering
     SVD   and NNMF-based filtering techniques that predict the
      unknown ratings based on a low rank approximation of the
      original ratings matrix.
          The missing values are filled with the average ratings.
     Incrementalversions of SVD has been proposed to solve the
      computational expensive problem. (SDM 2003)
                                                                     6
PROBLEM DEFINITION
   Let U={ui}mi=1 be the set of users such that |U|=m and
    P={pj}nj=1 be the set of items such that |P|=n.

   Let A be the m×n ratings matrix such that Aij is the rating
    of the user ui to the item pj.
     Let W be   the m×n matrix corresponding to the condifence of
       the ratings in A.
          Wij=1, the rating is known and 0 otherwise.


   Let user clustering ρ: {1, …, m} → {1, …, k}, and item
    clustering γ:{1, …, n} → {1, …, l}                               7
     k:   # user clusters; l: # item clusters
PROBLEM DEFINITION (CONT.)
   The approximate matrix  is given by

     where g=ρ(i), h=γ(j).
     AiR, AjC are the average ratings of user ui and item pj.




       AghCOC, AgRC and AhCC are the average ratings of the corresponding co-
        cluster, user-cluster and item-cluster.




                                                                                 8
COLLABORATIVE FILTERING VIA
CO-CLUSTERING
   Static training (co-clustering): the goal is to minimize



   The row and column assignment steps can be
    implemented efficiently by pre-computing the invariant
    parts of the update cost functions.
     Requiredinfo.
     Row updating: minimizing


     Column    updating: minimizing
        Aρ ( i )3j − Aρ (i ) h + Ah
         tmp          COC         CC
                                                               9
STATIC TRAINING: CO-CLUSTERING




                                 10
PREDICTION




             11
INCREMENTAL TRAINING




                       12
SCALABLE COLLABORATIVE
FILTERING SYSTEM
   Using a distributed memory representation for the data
    objects so that each of the processors P1 and P2 are in
    fact clusters of processors.
     P1 handles the prediction and incremental training.
     P2 is responsible for the static training.




                                                              13
PARALLEL CO-CLUSTERING




                         14
EXPERIMENTAL RESULTS
   Datasets and algorithm
     Movie-lens  (100K): 943 users and 1682 movies consists of
      100,000 ratings(1-5).
     BookCrossing: 470034 users and 133438 books consists of
      269392 ratings(1-10).
     Movie1-Movie10: 10-100% ratings of the movie-lens 100K.


 80% training and 20% testing for all the datasets.
 Evaluation metrics: Mean Absolute Error (MAE)
     The experiments evaluated the effectiveness and efficiency in
      terms of MAE and execution time.
                                                                      15
MAE COMPARISON
 Mov1: movie-lens
 Mov2: BookCrossing

 Mov3: 10 subsets of movie-lens




                K=3




                                   16
VARIATION OF MAE WITH #
PARAMETERS
   # prediction parameters:
     COCLUST:(m+n+kl-k-l) values
     SVD, NNMF: (m+n)(k+l) values
   Movie3 dataset




                                     17
EFFICIENCY
   The time is needed for prediction on each given test pair
    of movie-lens.




   Training time (co-clustering) vs. Data size
     Movie-lensdataset
     Experimental devices
        AMD 1.4Ghz on 128 computer
       nodes with 384MB RAM

                                                                18
TRAINING TIME VS. # OF
PROCESSORS
 Movie-lens dataset
 Experimental devices
     AMD   1.4Ghz on different # of processors with 384MB RAM




                                                                 19
CONCLUSION
   Recommender system are proving to be extremely useful
    for a number of online activities such as e-commerce.

   Regarding to the dynamic scenario, the efficiency and
    effectiveness issues should be concerned.
     New   users, items and ratings enter the system at a rapid rate.

   This paper proposed a new dynamic CF approach based
    on co-clustering.

   Empirical results indicate the high quality predictions at           20
    a much lower computational cost.

More Related Content

PPTX
Co-clustering of multi-view datasets: a parallelizable approach
PPTX
Incremental collaborative filtering via evolutionary co clustering
PDF
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
PDF
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
PDF
The Gaussian Process Latent Variable Model (GPLVM)
PDF
Training and Inference for Deep Gaussian Processes
PDF
Manifold learning with application to object recognition
PDF
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Co-clustering of multi-view datasets: a parallelizable approach
Incremental collaborative filtering via evolutionary co clustering
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
The Gaussian Process Latent Variable Model (GPLVM)
Training and Inference for Deep Gaussian Processes
Manifold learning with application to object recognition
Detection focal loss 딥러닝 논문읽기 모임 발표자료

What's hot (20)

PPTX
Self-organizing map
PDF
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
PDF
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
PDF
Dear - 딥러닝 논문읽기 모임 김창연님
PPTX
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
PDF
Deformable DETR Review [CDM]
PDF
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
PPTX
Convolutional Neural Network (CNN) presentation from theory to code in Theano
PDF
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
PDF
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
PDF
Lecture 6: Convolutional Neural Networks
PDF
On Sampling Strategies for Sampling Strategies-based Collaborative Filtering
PDF
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
PPTX
Machine learning applications in aerospace domain
PDF
Naver learning to rank question answer pairs using hrde-ltc
PDF
Workload-aware materialization for efficient variable elimination on Bayesian...
PDF
safe and efficient off policy reinforcement learning
PDF
Learning Convolutional Neural Networks for Graphs
PDF
Deep Learning for Computer Vision: Visualization (UPC 2016)
PDF
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Self-organizing map
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning...
Dear - 딥러닝 논문읽기 모임 김창연님
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Deformable DETR Review [CDM]
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
Lecture 6: Convolutional Neural Networks
On Sampling Strategies for Sampling Strategies-based Collaborative Filtering
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Machine learning applications in aerospace domain
Naver learning to rank question answer pairs using hrde-ltc
Workload-aware materialization for efficient variable elimination on Bayesian...
safe and efficient off policy reinforcement learning
Learning Convolutional Neural Networks for Graphs
Deep Learning for Computer Vision: Visualization (UPC 2016)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Ad

Viewers also liked (11)

PPTX
Using support vector machine with a hybrid feature selection method to the st...
PPTX
Transfer learning in heterogeneous collaborative filtering domains
PDF
Friends of Solr - Nutch & HDFS
PDF
Scaling search with SolrCloud
PDF
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
PPTX
01 Introduction to Data Mining
PPTX
05 Clustering in Data Mining
PDF
Apache Solr crash course
PDF
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
PDF
Building a Recommendation Engine - An example of a product recommendation engine
PPTX
Building a real time, solr-powered recommendation engine
Using support vector machine with a hybrid feature selection method to the st...
Transfer learning in heterogeneous collaborative filtering domains
Friends of Solr - Nutch & HDFS
Scaling search with SolrCloud
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
01 Introduction to Data Mining
05 Clustering in Data Mining
Apache Solr crash course
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Building a Recommendation Engine - An example of a product recommendation engine
Building a real time, solr-powered recommendation engine
Ad

Similar to A scalable collaborative filtering framework based on co clustering (20)

PPTX
A scalable collaborative filtering framework based on co-clustering
 
PPTX
Collaborative Filtering Recommendation System
PDF
IntroductionRecommenderSystems_Petroni.pdf
PDF
Multidirectional Product Support System for Decision Making In Textile Indust...
PDF
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
PDF
A survey of memory based methods for collaborative filtering based techniques
PDF
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
PPTX
Recommender Systems: Advances in Collaborative Filtering
PDF
Ijmet 10 02_050
PPT
Collaborative filtering using orthogonal nonnegative matrix
PDF
A Review Study OF Movie Recommendation Using Machine Learning
PPT
Chapter 02 collaborative recommendation
PPT
Chapter 02 collaborative recommendation
PDF
Survey of Recommendation Systems
PDF
PPT by Jannach_organized.pdf presentation on the recommendation
PDF
Recommendation System Explained
PPTX
Lessons learnt at building recommendation services at industry scale
PPTX
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
PDF
Notes on Recommender Systems pdf 2nd module
PDF
Overview of recommender system
A scalable collaborative filtering framework based on co-clustering
 
Collaborative Filtering Recommendation System
IntroductionRecommenderSystems_Petroni.pdf
Multidirectional Product Support System for Decision Making In Textile Indust...
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
A survey of memory based methods for collaborative filtering based techniques
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
Recommender Systems: Advances in Collaborative Filtering
Ijmet 10 02_050
Collaborative filtering using orthogonal nonnegative matrix
A Review Study OF Movie Recommendation Using Machine Learning
Chapter 02 collaborative recommendation
Chapter 02 collaborative recommendation
Survey of Recommendation Systems
PPT by Jannach_organized.pdf presentation on the recommendation
Recommendation System Explained
Lessons learnt at building recommendation services at industry scale
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
Notes on Recommender Systems pdf 2nd module
Overview of recommender system

More from AllenWu (8)

PPTX
Collaborative filtering with CCAM
PPT
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
PPTX
Co-clustering with augmented data
PPTX
Ch4.mapreduce algorithm design
PPT
地震知識
PPTX
Co clustering by-block_value_decomposition
PPTX
Information Theoretic Co Clustering
PPT
Semantics In Digital Photos A Contenxtual Analysis
Collaborative filtering with CCAM
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
Co-clustering with augmented data
Ch4.mapreduce algorithm design
地震知識
Co clustering by-block_value_decomposition
Information Theoretic Co Clustering
Semantics In Digital Photos A Contenxtual Analysis

A scalable collaborative filtering framework based on co clustering

  • 1. A SCALABLE COLLABORATIVE FILTERING FRAMEWORK BASED ON CO-CLUSTERING 1 Authors/ Thomas George and Srujana Merugu Source/ ICDM’05, pp. 628-628 Presenter/ Allen
  • 2. OUTLINE  Introduction  Related Work  Problem Definition  Collaborative Filtering via Co-clustering  Scalable Collaborative Filtering System  Experimental Results  Conclusion 2
  • 3. INTRODUCTION  Due to the overwhelming increasing in web-based activities, users are often forced to choose from a large number of products or content items.  To aid users in the decision making process, it has become increasingly important to design recommender systems.  Collaborative filtering identify the likely preferences of a user based on the known preferences of other users. 3
  • 4. INTRODUCTION (CONT.)  Existing collaborative filtering methods based on correlation criteria  Singular value decomposition (SVD)  Non-negative matrix factorization (NNMF)  Drawbacks:  Computationally expensive of training component  The practical scenarios such as real-time news personalization require dynamic collaborative filtering.  The key idea  Simultaneously obtaining user and item neighborhoods via co- clustering.  Generating predictions based on average ratings. 4
  • 5. INTRODUCTION (CONT.)  Two new contributions:  Dynamic collaborative filtering approach  Supporting the entry of new users, items and ratings via a hybrid of incremental and batch versions of the co-clustering algorithm.  A scalable, real-time collaborative filtering system  Developing parallel versions of co-clustering, prediction and incremental training routines.  Notation:  A: matrix, e.g. Aij denoting the corresponding matrix elements.  χ: sets, and enumerated as {xi}ni=1, where xi are the elements of 5 the set.
  • 6. RELATED WORK  Recommender System  Content-based filtering system  Collaborative filtering system  Co-clustering  SVD and NNMF-based filtering techniques that predict the unknown ratings based on a low rank approximation of the original ratings matrix.  The missing values are filled with the average ratings.  Incrementalversions of SVD has been proposed to solve the computational expensive problem. (SDM 2003) 6
  • 7. PROBLEM DEFINITION  Let U={ui}mi=1 be the set of users such that |U|=m and P={pj}nj=1 be the set of items such that |P|=n.  Let A be the m×n ratings matrix such that Aij is the rating of the user ui to the item pj.  Let W be the m×n matrix corresponding to the condifence of the ratings in A.  Wij=1, the rating is known and 0 otherwise.  Let user clustering ρ: {1, …, m} → {1, …, k}, and item clustering γ:{1, …, n} → {1, …, l} 7  k: # user clusters; l: # item clusters
  • 8. PROBLEM DEFINITION (CONT.)  The approximate matrix  is given by  where g=ρ(i), h=γ(j).  AiR, AjC are the average ratings of user ui and item pj.  AghCOC, AgRC and AhCC are the average ratings of the corresponding co- cluster, user-cluster and item-cluster. 8
  • 9. COLLABORATIVE FILTERING VIA CO-CLUSTERING  Static training (co-clustering): the goal is to minimize  The row and column assignment steps can be implemented efficiently by pre-computing the invariant parts of the update cost functions.  Requiredinfo.  Row updating: minimizing  Column updating: minimizing Aρ ( i )3j − Aρ (i ) h + Ah tmp COC CC 9
  • 13. SCALABLE COLLABORATIVE FILTERING SYSTEM  Using a distributed memory representation for the data objects so that each of the processors P1 and P2 are in fact clusters of processors.  P1 handles the prediction and incremental training.  P2 is responsible for the static training. 13
  • 15. EXPERIMENTAL RESULTS  Datasets and algorithm  Movie-lens (100K): 943 users and 1682 movies consists of 100,000 ratings(1-5).  BookCrossing: 470034 users and 133438 books consists of 269392 ratings(1-10).  Movie1-Movie10: 10-100% ratings of the movie-lens 100K.  80% training and 20% testing for all the datasets.  Evaluation metrics: Mean Absolute Error (MAE)  The experiments evaluated the effectiveness and efficiency in terms of MAE and execution time. 15
  • 16. MAE COMPARISON  Mov1: movie-lens  Mov2: BookCrossing  Mov3: 10 subsets of movie-lens K=3 16
  • 17. VARIATION OF MAE WITH # PARAMETERS  # prediction parameters:  COCLUST:(m+n+kl-k-l) values  SVD, NNMF: (m+n)(k+l) values  Movie3 dataset 17
  • 18. EFFICIENCY  The time is needed for prediction on each given test pair of movie-lens.  Training time (co-clustering) vs. Data size  Movie-lensdataset  Experimental devices  AMD 1.4Ghz on 128 computer nodes with 384MB RAM 18
  • 19. TRAINING TIME VS. # OF PROCESSORS  Movie-lens dataset  Experimental devices  AMD 1.4Ghz on different # of processors with 384MB RAM 19
  • 20. CONCLUSION  Recommender system are proving to be extremely useful for a number of online activities such as e-commerce.  Regarding to the dynamic scenario, the efficiency and effectiveness issues should be concerned.  New users, items and ratings enter the system at a rapid rate.  This paper proposed a new dynamic CF approach based on co-clustering.  Empirical results indicate the high quality predictions at 20 a much lower computational cost.