SlideShare a Scribd company logo
Jobin Wilson
                         jobin.wilson@flytxt.com


Copyright © 2011 Flytxt B.V. All rights reserved.   9/13/2011
Who am I ?
  • Architect @ Flytxt (Big Data Analytics & Automation)

  • Passionate about data, distributed computing , machine learning

  • Previously

       •Virtualization & Cloud Lifecycle Management(BMC)

             • Designed and Implemented Cloud Life Cycle Management Interface@BMC

       • Large Scale Data Centre Automation(AOL)

             • Implemented Centralized Data Center Management Framework for AOL

       •Workflow Systems & Automation (Accenture)

             • Implemented Service Management Suit for various customers
Session Agenda!

• Recommendation Engines – What's the big deal?

• Conceptual Overview

• Collaborative Filtering

• Engineering Challenges

• Apache Mahout

• Getting your recommender to production

• Q&A




                                                  3
What's the big deal?
Ooh Ads too!
Big deal?                                   Advertisers




                           Recommend Best Ads
                  Ads

                Content

  Users
                                                   Ad
                                                   Network



            Content Publishers
                                         ML Algorithms
                                        User Behavior Modelling
                                        Maximization Criteria
BTW, What was the challenge?
User Base : 2 billion+ users world wide

Content Base : 12.51 billion+ indexed pages

Advertiser Base : millions of active advertisers

Real-time nature : Responses in < 200 ms

Multi –objective optimization problem

Noisy Data
Recommendation Engines: Overview
 A specific type of information filtering system
 technique that attempts to recommend information
 items or social elements that are likely to be of interest
 to the user.

 Technologies that can help us sift through all the
 available information to predict products or services
 that could be interesting to us.

 Applying knowledge discovery techniques to the
 problem of making personalized recommendations for
 information, products or services, usually during a live
 interaction.
We need a crystal ball to predict ?
  We all have opinions/tastes which we express as our likes or dislikes.

  Our tastes follow some patterns.

  We tend to like things which are similar to things which we already
  like(e.g. Songs)

  We tend to like things which are liked by people who are similar to
  us(e.g. Movies)

  From fancy research to mainstream
Collaborative Filtering
 Problem : We have U users and I items in the system, a user Uk need to
 be recommended with a set of m items which are yet un-picked by him
 which he might be interested in picking up.

 Solution :

 Maintain a database of users’ ratings of a variety of items.

 For a given user, find other similar users whose ratings strongly
 correlate with the current user - User Neighborhood

 Recommend items rated highly by these similar users, but not rated by
 the current user.

 E.g. Amazon, Filpkart etc
Utility Matrix
 Matrix of values representing each user’s level of affinity to each item.
 Sparse matrix

 Recommendation engine needs to predict the values for the empty cells
 based on available cell values

 Denser the matrix, better the quality of recommendation

 User | Item i1           i2           i3           i4           i5
 u1                       r12                       r14          r15
 u2          r21          r22                                    r25
 u3                       r32                       r34
 u4                                    r43                       r45
Engineering Challenges
 Massive Data Volume : how do I deal with TBs of raw data to build my
 recommendations?

 Hadoop and Map-Reduce shines!


 How can I make it work in ‘Real-Time’ ?

 Batch pre-compute and store in HBase could help!



 Will my solution scale? soon my user base is going to double!.

 Sure, you can make it scale!
Engineering Challenges

 Do I need a cloud based infrastructure?

 Depends!


 Hadoop compatible Machine Learning library?

 Mahout would help!


 How can I represent/transform my input data appropriately?

 Pig/Hive might help!, if not ,map-reduce is always there!
Apache Mahout Overview
 Scalable machine learning library

 core algorithms for clustering, classification and batch based
 collaborative filtering implemented over Hadoop

 Few popular algos: K-Means, fuzzy K-Means ,Canopy clustering ,LDA
 etc

 Vibrant community support.

 Used by – Adobe ,Yahoo! ,Amazon , AOL, Flytxt…. (list goes on)

 mahout-dev-subscribe@apache.org
Taking Recommendation Engines to production

 Analyzing the input data, what kind of info I can collect from users

 Selecting the appropriate recommender (e.g. user based, Item based )

 Strategy to recommend to anonymous users(or first time users)

 Strategy for distributed computing, modeling the problem as map-
 reduce

 Choosing the deployment model

 Monitoring the system
Conclusion

 Very popular field of research and implementation

 More and more products and services are leveraging the concept

 From fancy research to live production systems at scale

 Making peoples lives easier by assisting in making decisions
Some more concepts.…

 Concept of similarity – distance measure etc

 Pearson Correlation

 User neighborhood computation
THANK YOU
  Contact : jobin.wilson@flytxt.com
http://www.flytxt.com/community/




                  Copyright © 2011 Flytxt B.V. All rights reserved.   9/13/2011   18
http://www.flytxt.com/community/




               Copyright © 2011 Flytxt B.V. All rights reserved.   9/13/2011   19

More Related Content

PPTX
Jane Recommendation Engines
PPTX
How to Build a Recommendation Engine on Spark
PPTX
Hadoop Turns a Corner and Sees the Future
PDF
Scalable advertising recommender systems
PPTX
Demystifying Systems for Interactive and Real-time Analytics
PDF
Deriving economic value for CSPs with Big Data [read-only]
PPTX
Warid uganda big data experience
PDF
Improving Collaborative Filtering Based Recommenders Using Topic Modelling
Jane Recommendation Engines
How to Build a Recommendation Engine on Spark
Hadoop Turns a Corner and Sees the Future
Scalable advertising recommender systems
Demystifying Systems for Interactive and Real-time Analytics
Deriving economic value for CSPs with Big Data [read-only]
Warid uganda big data experience
Improving Collaborative Filtering Based Recommenders Using Topic Modelling

Viewers also liked (9)

PDF
Data analytics driven customer experience programs
PPTX
Hadoop for carrier
PPT
7th prepaid mobile summit presentation by Abhay Doshi
PPTX
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
PPTX
Leveraging open source for big data stack
PDF
Big data analytics and building intelligent applications
PPTX
Multichannel Customer Journeys
PPTX
Roadmap to realizing the value of telco data – opportunities, challenges, use...
PPTX
Transforming Customer Experience: From Moments to Journeys
Data analytics driven customer experience programs
Hadoop for carrier
7th prepaid mobile summit presentation by Abhay Doshi
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
Leveraging open source for big data stack
Big data analytics and building intelligent applications
Multichannel Customer Journeys
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Transforming Customer Experience: From Moments to Journeys
Ad

Similar to Recommendation engines matching items to users (20)

PPTX
Apache Mahout
PDF
Agile data science
PPTX
Major_Project_Presentaion_B14.pptx
PPTX
Liberating data power of APIs
PPTX
Mini-training: Personalization & Recommendation Demystified
PPTX
The implementation of Big Data and AI on Digital Marketing
PDF
ML-Powered Recommendation System for Engagement & Revenue.pdf
PDF
Whats Next for Machine Learning
PPTX
Recommendation system (1).pptx
PDF
recommendationsystem1-221109055232-c8b46131.pdf
PPT
Map Reduce amrp presentation
PPTX
SRS2014: Towards a Scalable Recommender Engine for Online Marketplaces
PDF
How to build your own Delve: combining machine learning, big data and SharePoint
PDF
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
PDF
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
PPT
Recommender Systems Tutorial (Part 1) -- Introduction
PPTX
Predictive Analytics: An Executive Primer
PDF
C19013010 the tutorial to build shared ai services session 1
PPTX
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
PDF
Sweeny group think-ias2015
Apache Mahout
Agile data science
Major_Project_Presentaion_B14.pptx
Liberating data power of APIs
Mini-training: Personalization & Recommendation Demystified
The implementation of Big Data and AI on Digital Marketing
ML-Powered Recommendation System for Engagement & Revenue.pdf
Whats Next for Machine Learning
Recommendation system (1).pptx
recommendationsystem1-221109055232-c8b46131.pdf
Map Reduce amrp presentation
SRS2014: Towards a Scalable Recommender Engine for Online Marketplaces
How to build your own Delve: combining machine learning, big data and SharePoint
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Recommender Systems Tutorial (Part 1) -- Introduction
Predictive Analytics: An Executive Primer
C19013010 the tutorial to build shared ai services session 1
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
Sweeny group think-ias2015
Ad

More from Flytxt (12)

PDF
Flytxt corporate brochure
PDF
Data analytics is a game changer for telcos in the digital era
PDF
Omni channel customer experience
PDF
Analytics tools drive customer experience in the digital age
PDF
Enhancing Connected Customer Experience through Mobile Consumer Analytics
PDF
Flytxt: Personalizing Engagement
PDF
Flytxt a unique success story in big data analytics
PDF
Flytxt brochure
PDF
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
PPTX
Co-existence or competition - RDBMS and Hadoop
PPTX
Co existence or Competitions? RDBMS and Hadoop
PPTX
Co existence or Competition ? - RDBMS and Hadoop
Flytxt corporate brochure
Data analytics is a game changer for telcos in the digital era
Omni channel customer experience
Analytics tools drive customer experience in the digital age
Enhancing Connected Customer Experience through Mobile Consumer Analytics
Flytxt: Personalizing Engagement
Flytxt a unique success story in big data analytics
Flytxt brochure
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Co-existence or competition - RDBMS and Hadoop
Co existence or Competitions? RDBMS and Hadoop
Co existence or Competition ? - RDBMS and Hadoop

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
KodekX | Application Modernization Development
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Review of recent advances in non-invasive hemoglobin estimation
KodekX | Application Modernization Development
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
The Rise and Fall of 3GPP – Time for a Sabbatical?

Recommendation engines matching items to users

  • 1. Jobin Wilson jobin.wilson@flytxt.com Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011
  • 2. Who am I ? • Architect @ Flytxt (Big Data Analytics & Automation) • Passionate about data, distributed computing , machine learning • Previously •Virtualization & Cloud Lifecycle Management(BMC) • Designed and Implemented Cloud Life Cycle Management Interface@BMC • Large Scale Data Centre Automation(AOL) • Implemented Centralized Data Center Management Framework for AOL •Workflow Systems & Automation (Accenture) • Implemented Service Management Suit for various customers
  • 3. Session Agenda! • Recommendation Engines – What's the big deal? • Conceptual Overview • Collaborative Filtering • Engineering Challenges • Apache Mahout • Getting your recommender to production • Q&A 3
  • 6. Big deal? Advertisers Recommend Best Ads Ads Content Users Ad Network Content Publishers ML Algorithms User Behavior Modelling Maximization Criteria
  • 7. BTW, What was the challenge? User Base : 2 billion+ users world wide Content Base : 12.51 billion+ indexed pages Advertiser Base : millions of active advertisers Real-time nature : Responses in < 200 ms Multi –objective optimization problem Noisy Data
  • 8. Recommendation Engines: Overview A specific type of information filtering system technique that attempts to recommend information items or social elements that are likely to be of interest to the user. Technologies that can help us sift through all the available information to predict products or services that could be interesting to us. Applying knowledge discovery techniques to the problem of making personalized recommendations for information, products or services, usually during a live interaction.
  • 9. We need a crystal ball to predict ? We all have opinions/tastes which we express as our likes or dislikes. Our tastes follow some patterns. We tend to like things which are similar to things which we already like(e.g. Songs) We tend to like things which are liked by people who are similar to us(e.g. Movies) From fancy research to mainstream
  • 10. Collaborative Filtering Problem : We have U users and I items in the system, a user Uk need to be recommended with a set of m items which are yet un-picked by him which he might be interested in picking up. Solution : Maintain a database of users’ ratings of a variety of items. For a given user, find other similar users whose ratings strongly correlate with the current user - User Neighborhood Recommend items rated highly by these similar users, but not rated by the current user. E.g. Amazon, Filpkart etc
  • 11. Utility Matrix Matrix of values representing each user’s level of affinity to each item. Sparse matrix Recommendation engine needs to predict the values for the empty cells based on available cell values Denser the matrix, better the quality of recommendation User | Item i1 i2 i3 i4 i5 u1 r12 r14 r15 u2 r21 r22 r25 u3 r32 r34 u4 r43 r45
  • 12. Engineering Challenges Massive Data Volume : how do I deal with TBs of raw data to build my recommendations? Hadoop and Map-Reduce shines! How can I make it work in ‘Real-Time’ ? Batch pre-compute and store in HBase could help! Will my solution scale? soon my user base is going to double!. Sure, you can make it scale!
  • 13. Engineering Challenges Do I need a cloud based infrastructure? Depends! Hadoop compatible Machine Learning library? Mahout would help! How can I represent/transform my input data appropriately? Pig/Hive might help!, if not ,map-reduce is always there!
  • 14. Apache Mahout Overview Scalable machine learning library core algorithms for clustering, classification and batch based collaborative filtering implemented over Hadoop Few popular algos: K-Means, fuzzy K-Means ,Canopy clustering ,LDA etc Vibrant community support. Used by – Adobe ,Yahoo! ,Amazon , AOL, Flytxt…. (list goes on) mahout-dev-subscribe@apache.org
  • 15. Taking Recommendation Engines to production Analyzing the input data, what kind of info I can collect from users Selecting the appropriate recommender (e.g. user based, Item based ) Strategy to recommend to anonymous users(or first time users) Strategy for distributed computing, modeling the problem as map- reduce Choosing the deployment model Monitoring the system
  • 16. Conclusion Very popular field of research and implementation More and more products and services are leveraging the concept From fancy research to live production systems at scale Making peoples lives easier by assisting in making decisions
  • 17. Some more concepts.… Concept of similarity – distance measure etc Pearson Correlation User neighborhood computation
  • 18. THANK YOU Contact : jobin.wilson@flytxt.com http://www.flytxt.com/community/ Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011 18
  • 19. http://www.flytxt.com/community/ Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011 19