SlideShare a Scribd company logo
Collaborative Filtering Algorithms :
Getting Started
Vivek A. Ganesan
vivganes@gmail.com
Big Data Gods Meetup, Santa Clara, CA May 13,
2013
Before we start
Copyright 2013, Vivek A. Ganesan, All rights reserved 1
o A BIG thank you to our sponsors –
Big Data Cloud
o Meeting Space
o Support
o Check out their big data training
Introduction
Copyright 2013, Vivek A. Ganesan, All rights reserved 2
o Program Outline
o This is an opt-in program, it is FREE! (as in beer)
o We do social coding (which means you share your
code as open source, Apache v2 license)
o Program duration = 1 month, weekly sprints
o Weekly meetup (topical + social coding + Q/A)
o A weekend hackathon (Sat. afternoon) alternate
weeks (deep technical immersion)
o Demo at the end of the program
Agenda
Copyright 2013, Vivek A. Ganesan, All rights reserved 3
o Introduction to CF Algorithms
o When to use CF?
o Metrics
o Exercise
o Questions?
Introduction to CF Algorithms
Copyright 2013, Vivek A. Ganesan, All rights reserved 4
o A family of algorithms used to predict
o The preference of an user for an item, given
o a matrix of user preferences for items, where
o preferences must be expressed numerically (for e.g.
user ratings of item on a 1 to 5 integer scale)
o Collaborative because it only looks at user
preferences and does not take in to account user or
item attributes
o Filtering, is math speak for selecting a subset
CF : Common sense version
Copyright 2013, Vivek A. Ganesan, All rights reserved 5
o Out of a large group of users who have rated
items :
o Pick a “small” subset of users who are “similar” to
you
o Now, for an item that you have not yet rated but your
“similar” users have rated :
o Figure out an “average” rating for the item from your
“similar” group of users
o Weigh it with your rating history and predict a rating
CF : Visual
Copyright 2013, Vivek A. Ganesan, All rights reserved 6
User/Movie Sleepless in Seattle Titanic Terminator 2
Alice 5 5 3
Bob 1 3 5
Chandra 3 5 4
Dawood 2 3 5
Eduardo (you or
active user)
2 4
?
A sample approach
Copyright 2013, Vivek A. Ganesan, All rights reserved 7
o Compute Eduardo’s “similarity” to all other
users
o Pick the three users “most similar” to Eduardo
o Weigh their ratings for Terminator 2 by their
degree of similarity to Eduardo
o Make sure that the predicted rating is within
the given scale (0 to 5)
o … and predict Eduardo’s rating for Terminator 2
Step 1 : Measuring Similarity
Copyright 2013, Vivek A. Ganesan, All rights reserved 8
o Start with a distance metric
o There are several : let’s pick Euclidean for e.g.
o For n space, square root of sum of squared
differences
o Convert it to a similarity score (0 to 1)
o 1/(1 + Euclidean Distance) (adding 1 to avoid
division by zero)
o 0 for no match, 1 for perfect match
CF : Distances & Similarities
Copyright 2013, Vivek A. Ganesan, All rights reserved 9
Alice Bob Chandra Dawood
3.16 & 0.24 1.414 & 0.414 1.414 & 0.414 1 & 0.5
• Pick the top three users most similar to Eduardo :
• Dawood, Bob and Chandra
• Weigh their ratings for Terminator 2 by their
degree of similarity to Eduardo :
• (0.414 x 5) + (0.414 x 4) + (0.5 x 5) = 6.226
• Ooops – too big a rating (0 to 5 scale)!
• Divide by sum of similarities (0.414 + 0.414 + 0.5)
• Answer : 6.226/1.328 = 4.688 (our prediction)
Improvements
Copyright 2013, Vivek A. Ganesan, All rights reserved 10
o Some users rate movies consistently higher and
others rate them consistently lower
o Adjust for this by adding distance from mean
and then finally adding mean of the active
user
o Consult the Group Lens paper for details
o Use other measures that solves for “grade
inflation” e.g. Pearson’s
A recommendation engine
Copyright 2013, Vivek A. Ganesan, All rights reserved 11
o Imagine a much larger data set of users and
movie ratings
o Do the same math for all users against all other
users
o Then predict ratings for those movies for which
users have not yet rated
o For a given user, pick the top N predicted rating
movies and recommend those
Questions? Comments?
Thank You!
E-mail: vivganes@gmail.com
Twitter : onevivek
Copyright 2013, Vivek A. Ganesan, All rights
reserved
12

More Related Content

PPTX
Collaborative filtering common_problems_and_solutions
PPTX
Mongodb hackathon 01
PPTX
Recommendation Engines Program Kickoff
PPTX
Mongodb hackathon 02
PDF
Big data pipelines
PPTX
Introduction to Data Engineering
PDF
Movie recommendation project
PPTX
movierecommendationproject-171223181147.pptx
Collaborative filtering common_problems_and_solutions
Mongodb hackathon 01
Recommendation Engines Program Kickoff
Mongodb hackathon 02
Big data pipelines
Introduction to Data Engineering
Movie recommendation project
movierecommendationproject-171223181147.pptx

Similar to Collaborative filtering getting_started (20)

PDF
Bando de Dados Avançados - Recommender Systems
PPTX
Introduction to Recommendation System
PPT
recommender-systems-collaborative-filtering.ppt
PPT
Collab filtering-tutorial
PDF
A survey of memory based methods for collaborative filtering based techniques
PDF
Collaborative Filtering 1: User-based CF
PDF
Survey of Recommendation Systems
PPTX
Movie lens movie recommendation system
PPTX
Recommender systems: Content-based and collaborative filtering
PDF
Book Recommendation Engine
PDF
Recommender Systems, Matrices and Graphs
PPT
Item basedcollaborativefilteringrecommendationalgorithms
PPT
Lec7 collaborative filtering
PPTX
Movie Recommendation System.pptx
PPT
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
PPT
Chapter 02 collaborative recommendation
PPT
Chapter 02 collaborative recommendation
PPT
Filtering content bbased crs
PDF
IntroductionRecommenderSystems_Petroni.pdf
PDF
Sociocast CF Benchmark
Bando de Dados Avançados - Recommender Systems
Introduction to Recommendation System
recommender-systems-collaborative-filtering.ppt
Collab filtering-tutorial
A survey of memory based methods for collaborative filtering based techniques
Collaborative Filtering 1: User-based CF
Survey of Recommendation Systems
Movie lens movie recommendation system
Recommender systems: Content-based and collaborative filtering
Book Recommendation Engine
Recommender Systems, Matrices and Graphs
Item basedcollaborativefilteringrecommendationalgorithms
Lec7 collaborative filtering
Movie Recommendation System.pptx
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Chapter 02 collaborative recommendation
Chapter 02 collaborative recommendation
Filtering content bbased crs
IntroductionRecommenderSystems_Petroni.pdf
Sociocast CF Benchmark
Ad

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
MYSQL Presentation for SQL database connectivity
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
NewMind AI Monthly Chronicles - July 2025
Unlocking AI with Model Context Protocol (MCP)
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
MYSQL Presentation for SQL database connectivity
“AI and Expert System Decision Support & Business Intelligence Systems”
Building Integrated photovoltaic BIPV_UPV.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Modernizing your data center with Dell and AMD
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Ad

Collaborative filtering getting_started

  • 1. Collaborative Filtering Algorithms : Getting Started Vivek A. Ganesan vivganes@gmail.com Big Data Gods Meetup, Santa Clara, CA May 13, 2013
  • 2. Before we start Copyright 2013, Vivek A. Ganesan, All rights reserved 1 o A BIG thank you to our sponsors – Big Data Cloud o Meeting Space o Support o Check out their big data training
  • 3. Introduction Copyright 2013, Vivek A. Ganesan, All rights reserved 2 o Program Outline o This is an opt-in program, it is FREE! (as in beer) o We do social coding (which means you share your code as open source, Apache v2 license) o Program duration = 1 month, weekly sprints o Weekly meetup (topical + social coding + Q/A) o A weekend hackathon (Sat. afternoon) alternate weeks (deep technical immersion) o Demo at the end of the program
  • 4. Agenda Copyright 2013, Vivek A. Ganesan, All rights reserved 3 o Introduction to CF Algorithms o When to use CF? o Metrics o Exercise o Questions?
  • 5. Introduction to CF Algorithms Copyright 2013, Vivek A. Ganesan, All rights reserved 4 o A family of algorithms used to predict o The preference of an user for an item, given o a matrix of user preferences for items, where o preferences must be expressed numerically (for e.g. user ratings of item on a 1 to 5 integer scale) o Collaborative because it only looks at user preferences and does not take in to account user or item attributes o Filtering, is math speak for selecting a subset
  • 6. CF : Common sense version Copyright 2013, Vivek A. Ganesan, All rights reserved 5 o Out of a large group of users who have rated items : o Pick a “small” subset of users who are “similar” to you o Now, for an item that you have not yet rated but your “similar” users have rated : o Figure out an “average” rating for the item from your “similar” group of users o Weigh it with your rating history and predict a rating
  • 7. CF : Visual Copyright 2013, Vivek A. Ganesan, All rights reserved 6 User/Movie Sleepless in Seattle Titanic Terminator 2 Alice 5 5 3 Bob 1 3 5 Chandra 3 5 4 Dawood 2 3 5 Eduardo (you or active user) 2 4 ?
  • 8. A sample approach Copyright 2013, Vivek A. Ganesan, All rights reserved 7 o Compute Eduardo’s “similarity” to all other users o Pick the three users “most similar” to Eduardo o Weigh their ratings for Terminator 2 by their degree of similarity to Eduardo o Make sure that the predicted rating is within the given scale (0 to 5) o … and predict Eduardo’s rating for Terminator 2
  • 9. Step 1 : Measuring Similarity Copyright 2013, Vivek A. Ganesan, All rights reserved 8 o Start with a distance metric o There are several : let’s pick Euclidean for e.g. o For n space, square root of sum of squared differences o Convert it to a similarity score (0 to 1) o 1/(1 + Euclidean Distance) (adding 1 to avoid division by zero) o 0 for no match, 1 for perfect match
  • 10. CF : Distances & Similarities Copyright 2013, Vivek A. Ganesan, All rights reserved 9 Alice Bob Chandra Dawood 3.16 & 0.24 1.414 & 0.414 1.414 & 0.414 1 & 0.5 • Pick the top three users most similar to Eduardo : • Dawood, Bob and Chandra • Weigh their ratings for Terminator 2 by their degree of similarity to Eduardo : • (0.414 x 5) + (0.414 x 4) + (0.5 x 5) = 6.226 • Ooops – too big a rating (0 to 5 scale)! • Divide by sum of similarities (0.414 + 0.414 + 0.5) • Answer : 6.226/1.328 = 4.688 (our prediction)
  • 11. Improvements Copyright 2013, Vivek A. Ganesan, All rights reserved 10 o Some users rate movies consistently higher and others rate them consistently lower o Adjust for this by adding distance from mean and then finally adding mean of the active user o Consult the Group Lens paper for details o Use other measures that solves for “grade inflation” e.g. Pearson’s
  • 12. A recommendation engine Copyright 2013, Vivek A. Ganesan, All rights reserved 11 o Imagine a much larger data set of users and movie ratings o Do the same math for all users against all other users o Then predict ratings for those movies for which users have not yet rated o For a given user, pick the top N predicted rating movies and recommend those
  • 13. Questions? Comments? Thank You! E-mail: vivganes@gmail.com Twitter : onevivek Copyright 2013, Vivek A. Ganesan, All rights reserved 12