SlideShare a Scribd company logo
Collaborative Filtering Tayfun Şen 18 December 2006 You can reach the author at: stayfun{at}metu.edu.tr
What is the problem? In a nutshell: Life is too short! We don't have time to watch all the movies, listen to all the music, read every book etc...
Overwhelming quantity of information on the web We all ask our friends for recommendations. We read newspapers, web sites, watch TV to create an opinion for ourselves. We want to be sure that the activity we spend our time is worthwhile. We take into consideration the recommendations made by people we trust.
Time Person of the Year 2006: You? Yes, you. You control the Information Age. Welcome to your world.
From Time (25 Dec. 2006 edition) “It's a story about community and collaboration on a scale never seen before. It's about the cosmic compendium of knowledge Wikipedia and the million-channel people's network YouTube and the online metropolis MySpace. It's about the many wresting power from the few and helping one another for nothing and how that will not only change the world, but also change the way the world changes.”
Futurism Semantic Web? In his seminal paper published in Scientific American [1], creator of the WWW, Tim Berners-Lee talks about the semantic web. Adding meaning to the Internet looks like a ground breaking idea, but when will it be implemented? Standard ontologies, mappings between them, some sort of acceptance by the web community. 10-15 years needed maybe? Collaborative filtering saves the day.
Implications of the Recommendation in the Internet There are basically two types of filtering techniques in the Internet in use today: Content based filtering Collaborative filtering
Examples on the Internet Netflix, Amazon, Pandora.com, Last.fm ... It is natural for Web 2.0 too! Digg, flickr, stumbleupon etc. All these websites rely on their users' interaction to generate content relevant to every user. That's what Web 2.0 means. User interaction.
Content based algorithms These rely on the implicit data on the domain. For example, in a movie recommendation site, this could be the director information, movie length, PG rating, cast etc.  For the song recommendation this could be song date, other albums/songs from the same group, type of the song (jazz, classic, rock etc.) Implicit data is used in generating recommendations. For example: You see that a user has rated high to Brad Pitt movies, so you recommend her Babel.
Collaborative Filtering algorithms In CF, it is a little different: Other users have impact on the recommendations. Users generate recommendations implicitly. Similar users to the active user (user that recommendations are prepared for) are found. By weighting the users, a recommendation list is prepared from other user data.
CF Example It is found that a lot of users like Ayumi Hamasaki songs, given that they also like Ai Otsuka songs.  In this case, if the active user does not know about Ai Otsuka but she knows that she likes Ayumi, then Ai Otsuka is recommended to her.
CF Example (continued) In the movie domain: There is a user-movie-rating table.  It is very sparse. That is, for many users, for many movies no ratings exist.
CF Algorithms Two types of algorithms exist for CF: Model based algorithms Memory based algorithms In model based algorithms, you create a model of the domain. Most of the work is done offline. In memory based algorithms, you use the whole database in creating recommendations. Most of the work is done online.
Model based CF Model based algorithms are efficient (fast when recommending) and quite accurate (predictions are quite good). But they rely on long off line computations. Thus they are harder to maintain and update. In the Internet, new users need to be added all the time, so this creates a setback or model based algorithms. An example is Bayesian Networks:
Bayesian Networks
Memory based CF Many memory based CF algorithms exist, with the most known one described by Herlocker [4], the neighborhood based algorithm. In neighborhood based algorithms, most similar users to the active user is selected as that users neighborhood. After the neighborhood is found, the predictions are made using a weighted sum of the ratings by those neighboring users.
Neighborhood based algorithms For finding the neighbors, several correlation methods could be used. One such method is Pearson's correlation coefficient.
Neighborhood based algorithms   is the standard deviation, a is the subscript for active user, u for the user considered as neighbor. After the similarity weights are found, one needs to select the most similar users and generate a prediction. The neighborhood used in prediction can be selected in many ways: Top-n method Thresholding method
Neighborhood based algorithms After selecting the neighbors to be considered, you weigh these users and generate a prediction. Z-scores are used to normalize the ratings.
Cluster based algorithms The naive neighborhood based algorithm is computationally too complex. It is O(mn) where m and n are number of items and users respectively. In clustering approach, if you have constant number of clusters, the complexity is O(m). It is easier to compute the predictions for new users. Details are given next.
Cluster based algorithms Users are members of clusters. Clusters can be formed using many different algorithms, described in detail in the paper by Jain et al., Data Clustering, a review [7]. The goal is to group together similar users and use these clusters in choosing the neighborhood of the active user. Very efficient, scalable, easy to update. If the number of clusters = n, then it degrades into the neighborhood based algorithm.  There are accuracy considerations.
Cluster based algorithms If you choose the number of clusters to be small, your predictions get worse. You have a trade-off of speed and accuracy. Best method is to use empirical methods in determining the best cluster size and number.
CF Metrics The two main metrics for CF algorithms are accuracy and complexity.  For the accuracy MAE is used frequently. The absolute errors are averaged to find this value. For the complexity, one can use the big-oh metric. Other qualities are also important for predictions: These are: Coverage, novelty and serendipity, confidence and user feedback.
CF  Metrics Coverage refers to the percent of the movies is the system able to make prediction. Serendipity and novelty refers to the novel recommendations made by the recommender. Confidence is the value of how confident the system is while making a recommendation. User feedback is important in fine tuning the system so it should be used also.
Conclusion CF is already in use on the Internet, although its history only dates back several years. It still has development potential. Offers great improvements to user enjoyment. Thanks for your attention.  Any questions?
References [1] May 2001 issue of the Scientific American: http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21 [2] For more information about the Web 2.0, see the wikipedia article at:  http://en.wikipedia.org/wiki/Web_2.0 [3] Jon Herlocker, Joseph Konstan, John Riedl. An empirical analysis of design choices in neighborhood-based algorithms.  Information Retrieval , 2002. [4] Jon Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. An algorithmic framework for performing collaborative filtering.  SIGIR'99. [5] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time CF algorithm. [6]  Al Manumur Rashid, Shyong K. Lam, George Karypis, and John Riedl. ClustKNN: A highly scalable hybrid model & Memory based algorithm.  WEBKDD'06, 2006. [7] A. K. Jain, M. N. Murty, P. J. Flynn. Data Clustering: a review.  ACM Computing Survey 1999.

More Related Content

PPTX
Collaborative Filtering using KNN
PPT
Item Based Collaborative Filtering Recommendation Algorithms
PPTX
Recommender systems using collaborative filtering
PPTX
Collaborative filtering
PPT
Project presentation
PDF
Movie recommendation project
PDF
Overview of recommender system
PPTX
Recommender Systems
Collaborative Filtering using KNN
Item Based Collaborative Filtering Recommendation Algorithms
Recommender systems using collaborative filtering
Collaborative filtering
Project presentation
Movie recommendation project
Overview of recommender system
Recommender Systems

What's hot (20)

PDF
Collaborative filtering
PDF
Collaborative Filtering 1: User-based CF
PDF
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
PDF
Movie Recommendation engine
PDF
Recent advances in deep recommender systems
PPTX
Collaborative filtering
PPTX
Recommender system introduction
PPTX
[Final]collaborative filtering and recommender systems
PDF
Survey of Recommendation Systems
PDF
Recommendation engines
PDF
Summary of a Recommender Systems Survey paper
PPT
Social Recommender Systems
PDF
Collaborative Filtering Recommendation Algorithm based on Hadoop
PPTX
Recommender Systems: Advances in Collaborative Filtering
PPT
Recommender systems
PPTX
Collaborative filtering at scale
PDF
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
PDF
Hybrid recommender systems
PDF
Recommender Systems
PDF
Recommender Systems! @ASAI 2011
Collaborative filtering
Collaborative Filtering 1: User-based CF
ACM SIGIR 2020 Tutorial - Reciprocal Recommendation: matching users with the ...
Movie Recommendation engine
Recent advances in deep recommender systems
Collaborative filtering
Recommender system introduction
[Final]collaborative filtering and recommender systems
Survey of Recommendation Systems
Recommendation engines
Summary of a Recommender Systems Survey paper
Social Recommender Systems
Collaborative Filtering Recommendation Algorithm based on Hadoop
Recommender Systems: Advances in Collaborative Filtering
Recommender systems
Collaborative filtering at scale
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Hybrid recommender systems
Recommender Systems
Recommender Systems! @ASAI 2011
Ad

Viewers also liked (20)

PPTX
Collaborative Filtering Recommendation System
PDF
Hypothesis-Based Collaborative Filtering
PDF
Browsemap: Collaborative Filtering at LinkedIn
PDF
Collaborative Filtering in Map/Reduce
PDF
Collaborative Filtering with Spark
PDF
M.Sc. Jury Defense
PPT
Clustering Technique for Collaborative Filtering Recommendation and Applicat...
PDF
Tutorial 14 (collaborative filtering)
PDF
Bank of America Acquiring Countrywide Financial
PDF
Elliott Fisher | Monitoring Variation in Health Care
PPTX
Ostern in finnland daria
PPTX
Presentation23 (2)
PPTX
WINPOT CASINO
TXT
휴대폰결제『BU797』.『COM』미니초코볼자판기 봉평
PDF
Visual Resume of Fabrice L Broyld
DOC
China small appliance industry production and marketing demand and investment...
PPTX
Equipo de gestion tic del vicente hondarza
PPT
PDF
Curso Antena3 TV
DOC
Oksana cv 1
Collaborative Filtering Recommendation System
Hypothesis-Based Collaborative Filtering
Browsemap: Collaborative Filtering at LinkedIn
Collaborative Filtering in Map/Reduce
Collaborative Filtering with Spark
M.Sc. Jury Defense
Clustering Technique for Collaborative Filtering Recommendation and Applicat...
Tutorial 14 (collaborative filtering)
Bank of America Acquiring Countrywide Financial
Elliott Fisher | Monitoring Variation in Health Care
Ostern in finnland daria
Presentation23 (2)
WINPOT CASINO
휴대폰결제『BU797』.『COM』미니초코볼자판기 봉평
Visual Resume of Fabrice L Broyld
China small appliance industry production and marketing demand and investment...
Equipo de gestion tic del vicente hondarza
Curso Antena3 TV
Oksana cv 1
Ad

Similar to Collaborative Filtering (20)

PDF
C018211723
PDF
Recommendation System Using Social Networking
PPTX
Movie recommendation system using collaborative filtering system
PDF
IRJET- Hybrid Book Recommendation System
PDF
Movies recommendation system in R Studio, Machine learning
PDF
Analysis on Recommended System for Web Information Retrieval Using HMM
PDF
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
PDF
Review on Document Recommender Systems Using Hierarchical Clustering Techniques
PDF
20120140506003
PPT
Social Recommender Systems Tutorial - WWW 2011
PPTX
Recommenders Systems
PDF
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
DOCX
Btp 3rd Report
PDF
Introduction to recommender systems
PDF
A Recommendation Engine For Predicting Movie Ratings Using A Big Data Approach
PPTX
Invited Talk OAGM Workshop Salzburg, May 2015
PPT
Digital Trails Dave King 1 5 10 Part 2 D3
PDF
Demography basedhybridrecommendersystemformovierecommendation
PDF
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
PDF
Effective Cross-Domain Collaborative Filtering using Temporal Domain – A Brie...
C018211723
Recommendation System Using Social Networking
Movie recommendation system using collaborative filtering system
IRJET- Hybrid Book Recommendation System
Movies recommendation system in R Studio, Machine learning
Analysis on Recommended System for Web Information Retrieval Using HMM
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
Review on Document Recommender Systems Using Hierarchical Clustering Techniques
20120140506003
Social Recommender Systems Tutorial - WWW 2011
Recommenders Systems
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
Btp 3rd Report
Introduction to recommender systems
A Recommendation Engine For Predicting Movie Ratings Using A Big Data Approach
Invited Talk OAGM Workshop Salzburg, May 2015
Digital Trails Dave King 1 5 10 Part 2 D3
Demography basedhybridrecommendersystemformovierecommendation
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...
Effective Cross-Domain Collaborative Filtering using Temporal Domain – A Brie...

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
Teaching material agriculture food technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Cloud computing and distributed systems.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
Building Integrated photovoltaic BIPV_UPV.pdf
Empathic Computing: Creating Shared Understanding
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Teaching material agriculture food technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Dropbox Q2 2025 Financial Results & Investor Presentation
Cloud computing and distributed systems.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology

Collaborative Filtering

  • 1. Collaborative Filtering Tayfun Şen 18 December 2006 You can reach the author at: stayfun{at}metu.edu.tr
  • 2. What is the problem? In a nutshell: Life is too short! We don't have time to watch all the movies, listen to all the music, read every book etc...
  • 3. Overwhelming quantity of information on the web We all ask our friends for recommendations. We read newspapers, web sites, watch TV to create an opinion for ourselves. We want to be sure that the activity we spend our time is worthwhile. We take into consideration the recommendations made by people we trust.
  • 4. Time Person of the Year 2006: You? Yes, you. You control the Information Age. Welcome to your world.
  • 5. From Time (25 Dec. 2006 edition) “It's a story about community and collaboration on a scale never seen before. It's about the cosmic compendium of knowledge Wikipedia and the million-channel people's network YouTube and the online metropolis MySpace. It's about the many wresting power from the few and helping one another for nothing and how that will not only change the world, but also change the way the world changes.”
  • 6. Futurism Semantic Web? In his seminal paper published in Scientific American [1], creator of the WWW, Tim Berners-Lee talks about the semantic web. Adding meaning to the Internet looks like a ground breaking idea, but when will it be implemented? Standard ontologies, mappings between them, some sort of acceptance by the web community. 10-15 years needed maybe? Collaborative filtering saves the day.
  • 7. Implications of the Recommendation in the Internet There are basically two types of filtering techniques in the Internet in use today: Content based filtering Collaborative filtering
  • 8. Examples on the Internet Netflix, Amazon, Pandora.com, Last.fm ... It is natural for Web 2.0 too! Digg, flickr, stumbleupon etc. All these websites rely on their users' interaction to generate content relevant to every user. That's what Web 2.0 means. User interaction.
  • 9. Content based algorithms These rely on the implicit data on the domain. For example, in a movie recommendation site, this could be the director information, movie length, PG rating, cast etc. For the song recommendation this could be song date, other albums/songs from the same group, type of the song (jazz, classic, rock etc.) Implicit data is used in generating recommendations. For example: You see that a user has rated high to Brad Pitt movies, so you recommend her Babel.
  • 10. Collaborative Filtering algorithms In CF, it is a little different: Other users have impact on the recommendations. Users generate recommendations implicitly. Similar users to the active user (user that recommendations are prepared for) are found. By weighting the users, a recommendation list is prepared from other user data.
  • 11. CF Example It is found that a lot of users like Ayumi Hamasaki songs, given that they also like Ai Otsuka songs. In this case, if the active user does not know about Ai Otsuka but she knows that she likes Ayumi, then Ai Otsuka is recommended to her.
  • 12. CF Example (continued) In the movie domain: There is a user-movie-rating table. It is very sparse. That is, for many users, for many movies no ratings exist.
  • 13. CF Algorithms Two types of algorithms exist for CF: Model based algorithms Memory based algorithms In model based algorithms, you create a model of the domain. Most of the work is done offline. In memory based algorithms, you use the whole database in creating recommendations. Most of the work is done online.
  • 14. Model based CF Model based algorithms are efficient (fast when recommending) and quite accurate (predictions are quite good). But they rely on long off line computations. Thus they are harder to maintain and update. In the Internet, new users need to be added all the time, so this creates a setback or model based algorithms. An example is Bayesian Networks:
  • 16. Memory based CF Many memory based CF algorithms exist, with the most known one described by Herlocker [4], the neighborhood based algorithm. In neighborhood based algorithms, most similar users to the active user is selected as that users neighborhood. After the neighborhood is found, the predictions are made using a weighted sum of the ratings by those neighboring users.
  • 17. Neighborhood based algorithms For finding the neighbors, several correlation methods could be used. One such method is Pearson's correlation coefficient.
  • 18. Neighborhood based algorithms  is the standard deviation, a is the subscript for active user, u for the user considered as neighbor. After the similarity weights are found, one needs to select the most similar users and generate a prediction. The neighborhood used in prediction can be selected in many ways: Top-n method Thresholding method
  • 19. Neighborhood based algorithms After selecting the neighbors to be considered, you weigh these users and generate a prediction. Z-scores are used to normalize the ratings.
  • 20. Cluster based algorithms The naive neighborhood based algorithm is computationally too complex. It is O(mn) where m and n are number of items and users respectively. In clustering approach, if you have constant number of clusters, the complexity is O(m). It is easier to compute the predictions for new users. Details are given next.
  • 21. Cluster based algorithms Users are members of clusters. Clusters can be formed using many different algorithms, described in detail in the paper by Jain et al., Data Clustering, a review [7]. The goal is to group together similar users and use these clusters in choosing the neighborhood of the active user. Very efficient, scalable, easy to update. If the number of clusters = n, then it degrades into the neighborhood based algorithm. There are accuracy considerations.
  • 22. Cluster based algorithms If you choose the number of clusters to be small, your predictions get worse. You have a trade-off of speed and accuracy. Best method is to use empirical methods in determining the best cluster size and number.
  • 23. CF Metrics The two main metrics for CF algorithms are accuracy and complexity. For the accuracy MAE is used frequently. The absolute errors are averaged to find this value. For the complexity, one can use the big-oh metric. Other qualities are also important for predictions: These are: Coverage, novelty and serendipity, confidence and user feedback.
  • 24. CF Metrics Coverage refers to the percent of the movies is the system able to make prediction. Serendipity and novelty refers to the novel recommendations made by the recommender. Confidence is the value of how confident the system is while making a recommendation. User feedback is important in fine tuning the system so it should be used also.
  • 25. Conclusion CF is already in use on the Internet, although its history only dates back several years. It still has development potential. Offers great improvements to user enjoyment. Thanks for your attention. Any questions?
  • 26. References [1] May 2001 issue of the Scientific American: http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21 [2] For more information about the Web 2.0, see the wikipedia article at: http://en.wikipedia.org/wiki/Web_2.0 [3] Jon Herlocker, Joseph Konstan, John Riedl. An empirical analysis of design choices in neighborhood-based algorithms. Information Retrieval , 2002. [4] Jon Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. An algorithmic framework for performing collaborative filtering. SIGIR'99. [5] K. Goldberg, T. Roeder, D. Gupta, and C. Perkins. Eigentaste: A constant time CF algorithm. [6] Al Manumur Rashid, Shyong K. Lam, George Karypis, and John Riedl. ClustKNN: A highly scalable hybrid model & Memory based algorithm. WEBKDD'06, 2006. [7] A. K. Jain, M. N. Murty, P. J. Flynn. Data Clustering: a review. ACM Computing Survey 1999.