SlideShare a Scribd company logo
Clustering Techniques for Collaborative Filtering and the Application to Venue Recommendation Manh Cuong Pham , Yiwei Cao, Ralf Klamma Information Systems and Database Technology RWTH Aachen, Germany Graz , Austria, September 01, 2010 I-KNOW 2010
Agenda Introduction Clustering techniques for collaborative filtering Case study: venue recommendation Data sets: DBLP and CiteSeerX User-based  Item-based  Conclusions and Outlook
Introduction Recommender systems: help users dealing with information overload Components of a recommender system [ Burke2002 ] Set of users, set of items (products) Implicit/explicit user rating on items Additional information:  trust, collaboration, etc. Algorithms for generating recommendations Recommendation techniques  [ Adomavicius and Tuzhilin 2005 ] Collaborative Filtering (CF)  [Breese et al. 1998 ] Memory-based algorithms: user-based, item-based  [Sarwar 2001] Model-based algorithms: Bayesian network  [ Breese1998 ] ;  Clustering  [ Ungar 1998 ] ; Rule-based  [ Sarwar2000 ] ; Machine learning on graphs  [Zhou 2005, 2008];  PLSA  [Hofmann 1999] ; Matrix factorization  [Koren 2009] Content-based recommendation  [Sarwar et al. 2001] Hybrid approaches  [Burke 2002]
Clustering and Collaborative Filtering Cluster 2 Cluster 1 item-based CF User clustering Item clustering item-based CF item-based CF Problems:  large-scale data; sparse rating matrix;  diversity of users and items Previous approaches:  Clustering based on ratings K-means, Metis, etc.  [Rashid 2006, Xue 2005, O’Connor 2001] Our approach Clustering based on additional information: relationships between users, items Improvement on both efficiency and accuracy x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
Evaluation: Venue Recommendation Recommend venues (conferences, journals, workshops) to researchers User-based CF Populate user-item matrix using venue participation history Ratings: normalized venue publication counts User-clustering: co-authorship network Item-based CF Similarity between venues based on citation Similarity measure: cosine Venue clustering: similarity network
Data Sets DBLP (http://www.informatik.uni-trier.de/~ley/db/) 788,259 author’s names 1,226,412 publications 3,490 venues (conferences, workshops, journals) CiteSeerX (http://citeseerx.ist.psu.edu/) 7,385,652 publications (including publications in reference lists) 22,735,240 citations Over 4 million author’s names Combination Canopy clustering [ McCallum 2000 ] Result: 864,097 matched pairs  On average: venues cite 2306 and  are cited 2037 times
User-based CF: Author Clustering Data: DBLP  Perform 2 test cases for the years of 2005 and 2006  Clustering of co-authorship networks 2005s network: 478,108 nodes; 1,427,196 edges 2006s network: 544,601 nodes; 1,686,867 edges Prediction of the venue participation Clustering algorithm Density-based algorithm [Clauset  2004 ] Obtained modularity: 0.829 and 0.82 Cluster size distribution follows Power law
User-based CF: Performance Precisions for 1000 random chosen authors Precisions computed at 11 standard recall levels 0%, 10%,….,100% Results Clustering performs better Not significant improved Better efficiency Further improvement Different networks: citation Overlapping clustering
Item-based CF: Venue Network Creation and Clustering Knowledge network Aggregate bibliography coupling counts at venue level Undirected graph  G(V, E) , where  V : venues,  E : edges weighted by cosine similarity Threshold:  Clustering: density-based algorithm  [ Neuman 2004, Clauset 2004 ] Network visualization: force-directed paradigm [ Fruchterman 1991 ] Knowledge flow network  (for venue ranking, see  Pham & Klamma 2010 ) Aggregate bibliography coupling counts at venue level Threshold: citation counts >= 50 Domains from Microsoft Academic Search ( http://academic.research.microsoft.com/)
Knowledge Network: the Visualization
Knowledge Network: Clustering
Interdisciplinary Venues: Top Betweenness Centrality
High Prestige Series: Top PageRank
Conclusions and Future Research Clustering and recommender systems Advantage of using additional information for clustering Application of clustering for both user-based and item-based CF  Key issue: impact of the communities (cluster) on the quality of recommendations; non-overlapping communities vs. overlapping communities Outlook Further evaluation: trust networks clustering, paper and potential collaborator recommendation Datasets: Epinion, Last.fm, etc. Digital libraries in Web 2.0: Mendeley, ResearchGate, etc.

More Related Content

PPT
The Structure of Computer Science Knowledge Network
PDF
Hybrid recommender systems
PDF
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
PPT
Mahout part2
PPTX
Intro to Mahout -- DC Hadoop
PPTX
Assigning semantic labels to data sources
PPTX
A scalable architecture for extracting, aligning, linking, and visualizing mu...
PDF
Introduction to Collaborative Filtering with Apache Mahout
The Structure of Computer Science Knowledge Network
Hybrid recommender systems
محاضرة برنامج Nails لتحليل الدراسات السابقة د.شروق المقرن
Mahout part2
Intro to Mahout -- DC Hadoop
Assigning semantic labels to data sources
A scalable architecture for extracting, aligning, linking, and visualizing mu...
Introduction to Collaborative Filtering with Apache Mahout

What's hot (20)

PDF
Mahout classification presentation
PPTX
Intro to Apache Mahout
PPTX
Apache mahout
PPT
Domain Ontology Usage Analysis Framework (OUSAF)
PDF
Email Classification
KEY
Machine Learning with Apache Mahout
PPT
Recommendation and Information Retrieval: Two Sides of the Same Coin?
PDF
Mahout Tutorial and Hands-on (version 2015)
PPTX
Machine Learning and Apache Mahout : An Introduction
PDF
Recommendation engines
PPT
Filtering content bbased crs
ODP
Collaborative Filtering
PDF
SDEC2011 Mahout - the what, the how and the why
PPTX
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
PPTX
Survey of natural language processing(midp2)
PPTX
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
PDF
Matrix Factorization Technique for Recommender Systems
PDF
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
PPTX
Heterogeneous data annotation
PPT
Orchestrating the Intelligent Web with Apache Mahout
Mahout classification presentation
Intro to Apache Mahout
Apache mahout
Domain Ontology Usage Analysis Framework (OUSAF)
Email Classification
Machine Learning with Apache Mahout
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Mahout Tutorial and Hands-on (version 2015)
Machine Learning and Apache Mahout : An Introduction
Recommendation engines
Filtering content bbased crs
Collaborative Filtering
SDEC2011 Mahout - the what, the how and the why
Exploratory Search upon Semantically Described Web Data Sources: Service regi...
Survey of natural language processing(midp2)
CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and Map...
Matrix Factorization Technique for Recommender Systems
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Heterogeneous data annotation
Orchestrating the Intelligent Web with Apache Mahout
Ad

Similar to Clustering Technique for Collaborative Filtering Recommendation and Application to Venue Recommendation (20)

PDF
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
PPT
clustering_classification.ppt
PPT
Yoda an accurate and scalable web based recommendation systems
PPT
Synthese Recommender System
PPTX
Data mining techniques unit v
PPT
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
PDF
Profiling Users' Preferences with Text Mining '14
PPTX
Cluster Analysis.pptx
PDF
10 clusbasic
PPT
CLUSTERING
PPT
multiarmed bandit.ppt
PDF
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
PPT
Clustering
PDF
Recommendation systems
PDF
Clustering techniques data mining book ....
PPT
10 clusbasic
PPTX
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
PPT
3.5 model based clustering
PPT
data mining cocepts and techniques chapter
PPT
Capter10 cluster basic
COMMUNITY DETECTION IN THE COLLABORATIVE WEB
clustering_classification.ppt
Yoda an accurate and scalable web based recommendation systems
Synthese Recommender System
Data mining techniques unit v
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Profiling Users' Preferences with Text Mining '14
Cluster Analysis.pptx
10 clusbasic
CLUSTERING
multiarmed bandit.ppt
IMPROVING COLLABORATIVE RECOMMENDATION VIA USER-ITEM SUBGROUPS
Clustering
Recommendation systems
Clustering techniques data mining book ....
10 clusbasic
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
3.5 model based clustering
data mining cocepts and techniques chapter
Capter10 cluster basic
Ad

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Lesson notes of climatology university.
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Pre independence Education in Inndia.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
master seminar digital applications in india
PPTX
Institutional Correction lecture only . . .
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Cell Types and Its function , kingdom of life
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Anesthesia in Laparoscopic Surgery in India
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Lesson notes of climatology university.
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pre independence Education in Inndia.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
master seminar digital applications in india
Institutional Correction lecture only . . .
Renaissance Architecture: A Journey from Faith to Humanism
Abdominal Access Techniques with Prof. Dr. R K Mishra
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
RMMM.pdf make it easy to upload and study
TR - Agricultural Crops Production NC III.pdf
Cell Types and Its function , kingdom of life

Clustering Technique for Collaborative Filtering Recommendation and Application to Venue Recommendation

  • 1. Clustering Techniques for Collaborative Filtering and the Application to Venue Recommendation Manh Cuong Pham , Yiwei Cao, Ralf Klamma Information Systems and Database Technology RWTH Aachen, Germany Graz , Austria, September 01, 2010 I-KNOW 2010
  • 2. Agenda Introduction Clustering techniques for collaborative filtering Case study: venue recommendation Data sets: DBLP and CiteSeerX User-based Item-based Conclusions and Outlook
  • 3. Introduction Recommender systems: help users dealing with information overload Components of a recommender system [ Burke2002 ] Set of users, set of items (products) Implicit/explicit user rating on items Additional information: trust, collaboration, etc. Algorithms for generating recommendations Recommendation techniques [ Adomavicius and Tuzhilin 2005 ] Collaborative Filtering (CF) [Breese et al. 1998 ] Memory-based algorithms: user-based, item-based [Sarwar 2001] Model-based algorithms: Bayesian network [ Breese1998 ] ; Clustering [ Ungar 1998 ] ; Rule-based [ Sarwar2000 ] ; Machine learning on graphs [Zhou 2005, 2008]; PLSA [Hofmann 1999] ; Matrix factorization [Koren 2009] Content-based recommendation [Sarwar et al. 2001] Hybrid approaches [Burke 2002]
  • 4. Clustering and Collaborative Filtering Cluster 2 Cluster 1 item-based CF User clustering Item clustering item-based CF item-based CF Problems: large-scale data; sparse rating matrix; diversity of users and items Previous approaches: Clustering based on ratings K-means, Metis, etc. [Rashid 2006, Xue 2005, O’Connor 2001] Our approach Clustering based on additional information: relationships between users, items Improvement on both efficiency and accuracy x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
  • 5. Evaluation: Venue Recommendation Recommend venues (conferences, journals, workshops) to researchers User-based CF Populate user-item matrix using venue participation history Ratings: normalized venue publication counts User-clustering: co-authorship network Item-based CF Similarity between venues based on citation Similarity measure: cosine Venue clustering: similarity network
  • 6. Data Sets DBLP (http://www.informatik.uni-trier.de/~ley/db/) 788,259 author’s names 1,226,412 publications 3,490 venues (conferences, workshops, journals) CiteSeerX (http://citeseerx.ist.psu.edu/) 7,385,652 publications (including publications in reference lists) 22,735,240 citations Over 4 million author’s names Combination Canopy clustering [ McCallum 2000 ] Result: 864,097 matched pairs On average: venues cite 2306 and are cited 2037 times
  • 7. User-based CF: Author Clustering Data: DBLP Perform 2 test cases for the years of 2005 and 2006 Clustering of co-authorship networks 2005s network: 478,108 nodes; 1,427,196 edges 2006s network: 544,601 nodes; 1,686,867 edges Prediction of the venue participation Clustering algorithm Density-based algorithm [Clauset 2004 ] Obtained modularity: 0.829 and 0.82 Cluster size distribution follows Power law
  • 8. User-based CF: Performance Precisions for 1000 random chosen authors Precisions computed at 11 standard recall levels 0%, 10%,….,100% Results Clustering performs better Not significant improved Better efficiency Further improvement Different networks: citation Overlapping clustering
  • 9. Item-based CF: Venue Network Creation and Clustering Knowledge network Aggregate bibliography coupling counts at venue level Undirected graph G(V, E) , where V : venues, E : edges weighted by cosine similarity Threshold: Clustering: density-based algorithm [ Neuman 2004, Clauset 2004 ] Network visualization: force-directed paradigm [ Fruchterman 1991 ] Knowledge flow network (for venue ranking, see Pham & Klamma 2010 ) Aggregate bibliography coupling counts at venue level Threshold: citation counts >= 50 Domains from Microsoft Academic Search ( http://academic.research.microsoft.com/)
  • 10. Knowledge Network: the Visualization
  • 12. Interdisciplinary Venues: Top Betweenness Centrality
  • 13. High Prestige Series: Top PageRank
  • 14. Conclusions and Future Research Clustering and recommender systems Advantage of using additional information for clustering Application of clustering for both user-based and item-based CF Key issue: impact of the communities (cluster) on the quality of recommendations; non-overlapping communities vs. overlapping communities Outlook Further evaluation: trust networks clustering, paper and potential collaborator recommendation Datasets: Epinion, Last.fm, etc. Digital libraries in Web 2.0: Mendeley, ResearchGate, etc.

Editor's Notes

  • #3: Pham Manh Cuong