A scalable collaborative filtering framework based on co clustering

A SCALABLE
COLLABORATIVE
FILTERING FRAMEWORK
BASED ON CO-CLUSTERING
1 Authors/ Thomas George and Srujana Merugu
Source/ ICDM’05, pp. 628-628
Presenter/ Allen

OUTLINE
 Introduction
 Related Work

 Problem Definition

 Collaborative Filtering via Co-clustering

 Scalable Collaborative Filtering System

 Experimental Results

 Conclusion

2

INTRODUCTION
 Due to the overwhelming increasing in web-based
activities, users are often forced to choose from a large
number of products or content items.

 To aid users in the decision making process, it has
become increasingly important to design recommender
systems.

 Collaborative filtering identify the likely preferences of a
user based on the known preferences of other users.
3

INTRODUCTION (CONT.)
 Existing collaborative filtering methods based on correlation criteria
 Singular value decomposition (SVD)
 Non-negative matrix factorization (NNMF)
 Drawbacks:
 Computationally expensive of training component

 The practical scenarios such as real-time news personalization
require dynamic collaborative filtering.

 The key idea
 Simultaneously obtaining user and item neighborhoods via co-
clustering.
 Generating predictions based on average ratings. 4

INTRODUCTION (CONT.)
 Two new contributions:
 Dynamic collaborative filtering approach
 Supporting the entry of new users, items and ratings via a hybrid of
incremental and batch versions of the co-clustering algorithm.

 A scalable, real-time collaborative filtering system
 Developing parallel versions of co-clustering, prediction and
incremental training routines.

 Notation:
 A: matrix, e.g. Aij denoting the corresponding matrix elements.
 χ: sets, and enumerated as {xi}ni=1, where xi are the elements of
5
the set.

RELATED WORK
 Recommender System
 Content-based filtering system
 Collaborative filtering system

 Co-clustering
 SVD and NNMF-based filtering techniques that predict the
unknown ratings based on a low rank approximation of the
original ratings matrix.
 The missing values are filled with the average ratings.
 Incrementalversions of SVD has been proposed to solve the
computational expensive problem. (SDM 2003)
6

PROBLEM DEFINITION
 Let U={ui}mi=1 be the set of users such that |U|=m and
P={pj}nj=1 be the set of items such that |P|=n.

 Let A be the m×n ratings matrix such that Aij is the rating
of the user ui to the item pj.
 Let W be the m×n matrix corresponding to the condifence of
the ratings in A.
 Wij=1, the rating is known and 0 otherwise.

 Let user clustering ρ: {1, …, m} → {1, …, k}, and item
clustering γ:{1, …, n} → {1, …, l} 7
 k: # user clusters; l: # item clusters

PROBLEM DEFINITION (CONT.)
 The approximate matrix Â is given by

 where g=ρ(i), h=γ(j).
 AiR, AjC are the average ratings of user ui and item pj.

 AghCOC, AgRC and AhCC are the average ratings of the corresponding co-
cluster, user-cluster and item-cluster.

8

COLLABORATIVE FILTERING VIA
CO-CLUSTERING
 Static training (co-clustering): the goal is to minimize

 The row and column assignment steps can be
implemented efficiently by pre-computing the invariant
parts of the update cost functions.
 Requiredinfo.
 Row updating: minimizing

 Column updating: minimizing
Aρ ( i )3j − Aρ (i ) h + Ah
tmp COC CC
9

STATIC TRAINING: CO-CLUSTERING

10

INCREMENTAL TRAINING

12

SCALABLE COLLABORATIVE
FILTERING SYSTEM
 Using a distributed memory representation for the data
objects so that each of the processors P1 and P2 are in
fact clusters of processors.
 P1 handles the prediction and incremental training.
 P2 is responsible for the static training.

13

PARALLEL CO-CLUSTERING

14

EXPERIMENTAL RESULTS
 Datasets and algorithm
 Movie-lens (100K): 943 users and 1682 movies consists of
100,000 ratings(1-5).
 BookCrossing: 470034 users and 133438 books consists of
269392 ratings(1-10).
 Movie1-Movie10: 10-100% ratings of the movie-lens 100K.

 80% training and 20% testing for all the datasets.
 Evaluation metrics: Mean Absolute Error (MAE)
 The experiments evaluated the effectiveness and efficiency in
terms of MAE and execution time.
15

MAE COMPARISON
 Mov1: movie-lens
 Mov2: BookCrossing

 Mov3: 10 subsets of movie-lens

K=3

16

VARIATION OF MAE WITH #
PARAMETERS
 # prediction parameters:
 COCLUST:(m+n+kl-k-l) values
 SVD, NNMF: (m+n)(k+l) values
 Movie3 dataset

17

EFFICIENCY
 The time is needed for prediction on each given test pair
of movie-lens.

 Training time (co-clustering) vs. Data size
 Movie-lensdataset
 Experimental devices
 AMD 1.4Ghz on 128 computer
nodes with 384MB RAM

18

TRAINING TIME VS. # OF
PROCESSORS
 Movie-lens dataset
 Experimental devices
 AMD 1.4Ghz on different # of processors with 384MB RAM

19

CONCLUSION
 Recommender system are proving to be extremely useful
for a number of online activities such as e-commerce.

 Regarding to the dynamic scenario, the efficiency and
effectiveness issues should be concerned.
 New users, items and ratings enter the system at a rapid rate.

 This paper proposed a new dynamic CF approach based
on co-clustering.

 Empirical results indicate the high quality predictions at 20
a much lower computational cost.

A scalable collaborative filtering framework based on co clustering

More Related Content

What's hot (20)

Viewers also liked (11)

Similar to A scalable collaborative filtering framework based on co clustering (20)

More from AllenWu (8)

A scalable collaborative filtering framework based on co clustering