SlideShare a Scribd company logo
Sonpvh – VNG.R&D.12.2017
Zalo.Recommendation.System.Review
1. Introduction
2. Goals of Recommendation system
3. The high level architecture of Recommender system
4. Basic models of recommender Systems
5. Experiences
6. Q&A
1
2
DISCOVERY
3
Xavier 2014 [1]
BIG
4
Xavier 2014 [1]
• 80M Users
• A Thousand of shops, articles, OA, Music …
• >20K product
• Billion interactions times
• …
VALUE
5
Xavier 2014 [1]
ZingMp3: >30%
traffic
ZOA: improve
>30% total click
and follow
PERSONAL
EXPERIENCES
6
 “Mining” the user’s preferences
 Improve the user experiences
7
8
1. Prediction version of problem (Matrix completion problem):
1. m users, n items, r ratings: (m x n) matrix
2. Given specified values for training, then predict missing rating values
2. Ranking version of problem (top-k recommendation problem)
1. Recommend the top-k items/users for a particular user/item
Charu 2016 [2]
9
1. The ultimate goal is Increasing PROFIT
1. Direct: Increasing product sales, total click (product …), total time spend on pages (music,
news…)
2. Indirect: Increasing total connection (Facebook).
3. The common operational and technical goals is:
 Relevance
 Novelty
 Serendipity
 Diversity
Charu 2016 [2]
10
Charu 2016 [2]
11
12
Xavier [4]
13
Mendeley RS [5]
14
Bogers, T., & Van
Den Bosch [6]
15
16Xavier [3]
17
Tencent [9]
18
1. Architecture: pipeline for big data
1. Ex: Mp3 - daily (1M user x 100k song), 50G App + 8G Web log
2. Streaming, batch
2. Algorithm: Scalable
1. This is not Machine Learning, this is Scalable Machine Learning
2. Combine multi-model
3. Monitor system
4. Evaluation
19
Machine
learning
Online
Evaluatio
n
Offline
Evaluatio
n
80%-90% Time
“garbage in garbage out”
20
Cron.d
LuigiZDB
(3) Extract Column
(4) Union ...
(1) call Luigi(5) call java
(6)Streamdata
(7) upload DB
(8)visualize
21
22
1. Memory-based methods (neighborhood-
based CF)
1. User-based CF
2. Item-based CF
2. Model-based methods: predictive models
1. Decision trees
2. Rule-based model
3. Bayesian methods
4. Latent factor models
5. Clustering
6. LDA
7. SVD/MF
8. Deep learning
9. … Charu 2016 [2]
23
1. Recommend similar items based on their “description”
 Required to analyzing the content of items which user has rated in the past
 Usually keyword, prices..
2. Knowledge-based recommender systems
1. Useful in the context of item are not purchased often (house, car … )
2. Useful for search engine, conversational systems, navigation-based recommender
3. Demographic Recommender System
24
1. User preferences changes …
2. Context aware:
1. Location-based recommender systems
2. Time-sensitive recommender systems
3. Social recommender system => social network analysis
3. Features learning
25
1. Trustworthy Recommender system
2. Social feedback analysis
3. Group recommender systems
4. Multi-Criteria Recommender system
5. Active learning
6. Privacy in recommender system
26
27
Machine
learning
Online
Evaluatio
n
Offline
Evaluatio
n
1. Log 2. Feature 3. Indexes
Chart …
4. Model
6. Indexes
5. Problems
What do they want?
28
1. Advertisement (ZaloFeed)
2. Search engine (ZingMp3)
3. Next Song (ZingMp3)
4. Personal playlist (ZingMp3)
5. Personal suggestion (OA, Product)
6. Trade-off
1. Accuracy vs Diversity (short term vs long term)
2. Discovery vs Continuation
3. Freshness vs stability
4. Efficiency vs Accuracy
29
1. Business KPI – Indexes
2. Casual Effect Measurement
3. User behavior Analysis
30
1. Candidate generation or ranking prediction
2. Classification or CF
3. Cost function?
4. Performance: Time, Accuracy …
31Andrew NG [7]
1. Explaining recommendation system - Yifan[10]
2. Complex model is good?
3. Deep learning vs Boosting
32
Machine
learning
Online
Evaluatio
n
Offline
Evaluatio
n
1. Log 2. Feature 3. Indexes
Chart …
4. Model
6. Indexes
5. Problems
What do they want?
33
1. What kind of Rating?
1. Explicit or Implicit
2. Preferences vs confidences
3. Impression or No negative feedback
4. How to treat the missing data?
5. Appropriate evaluation measurement
1. Youtube: total time listen [11]
2. [10]
3. …
Yifan Hu[10]
𝑟𝑎𝑛𝑘
34
1. People laws and rich-get-richer phenomena [13]
2. The effect of long tail – stanford [12]
35
1. Take care about bias
2. Take care about data snooping
36
1. Outlier and abnormal
37
1. Keep moving and think about average
2. Monitoring
38
1. Pick the right features really help
1. Spotify: same playlist
2. Youtube: pair count
3. Tencent: co-items
4. ZingMp3:  same playlist, co-items
39
40
41
1. This is not Machine Learning, this is Scalable Machine Learning
2. Build a robust pipeline for big data
3. Monitoring – control the data qualification
4. Evaluation – user behaviors and casual effect
5. Define indexes and feature are really important
6. Keep in touch with product man and log man
7. Start with simple, improve with patient
8. …
42
ZingMp3: >30%
traffic
ZOA: improve
>30% total click
and follow
Product and personal
playlist is coming soon
43
44
1. https://www.slideshare.net/xamat/recommender-systems-machine-learning-
summer-school-2014-cmu
2. Recommender System The Text Book – Charu C. Aggarwal
3. Amatriain, X. (2013). Mining large streams of user data for personalized
recommendations. ACM SIGKDD Explorations Newsletter, 14(2), 37.
https://doi.org/10.1145/2481244.2481250
4. Amatriain, X. (2013). Big & personal: data and models behind netflix
recommendations. Proceedings of the 2nd International Workshop on Big Data,
Streams and Heterogeneous Source Mining Algorithms, Systems, Programming
Models and Applications - BigMine ’13, 1–6.
https://doi.org/10.1145/2501221.2501222
5. https://buildingrecommenders.wordpress.com/2016/10/10/mendeley-suggest-
architecture/
6. Bogers, T., & Van Den Bosch, A. (2009). Collaborative and content-based
filtering for item recommendation on social bookmarking websites. CEUR
Workshop Proceedings (Vol. 532). https://doi.org/10.1007/978-0-387-85820-3
7. Andrew NG https://www.youtube.com/watch?v=n1ViNeWhC24 45
8. https://hackernoon.com/spotifys-discover-weekly-how-machine-learning-finds-
your-new-music-19a41ab76efe
9. TencentRec: Real-time Stream Recommendation in Practice
10. Collaborative filtering for implicit Feedback Datasets – Yifan Hu
11. https://research.google.com/pubs/pub45530.html -youtube
12. http://bid.berkeley.edu/cs294-1-spring13/index.php/About_People - Stanford
13. http://www.cs.cornell.edu/home/kleinber/networks-book/networks-book-ch18.pdf
46

More Related Content

PDF
User Studies for APG: How to support system development with user feedback?
PPT
User Zoom Webinar Monster Aug09 Vf
PDF
Digital analytics lecture1
DOCX
426769701-Pet-Shop-Management-System.docx
PDF
User behavior model & recommendation on basis of social networks
PPTX
Udacity webinar on Recommendation Systems
PDF
Online Survey Software Reference Guide
PPTX
Productionalize content recommendation engine
User Studies for APG: How to support system development with user feedback?
User Zoom Webinar Monster Aug09 Vf
Digital analytics lecture1
426769701-Pet-Shop-Management-System.docx
User behavior model & recommendation on basis of social networks
Udacity webinar on Recommendation Systems
Online Survey Software Reference Guide
Productionalize content recommendation engine

Similar to Practical Recommendation System - Scalable Machine Learning (20)

PPTX
[UPDATE] Udacity webinar on Recommendation Systems
PDF
UX Design Process | Sample Proposal
PPTX
Teacher training material
PDF
Pages Jaunes client case
PDF
Recommender Systems
PPT
Homespun UX: Going Beyond Web Analytics
PPTX
Recommender System _Module 1_Introduction to Recommender System.pptx
PPTX
Production and Beyond: Deploying and Managing Machine Learning Models
PPTX
KB Seminars: Working with Technology - Product Management; 10/13
PDF
Recommender Systems
PPT
User Experience Strategy
PDF
Why User Experience Matters 2009
PDF
PDF
Analysis on Recommended System for Web Information Retrieval Using HMM
PDF
Web Surveys Builder Quick Reference manual
PDF
Recommender Systems @ Scale - PyData 2019
PPTX
Whatsapp chat anayliser usig python
PPT
Rashmi Xerox Parc
PPT
Rashmi Xerox Parc
PDF
Social intranets: 10 ways to drive adoption
[UPDATE] Udacity webinar on Recommendation Systems
UX Design Process | Sample Proposal
Teacher training material
Pages Jaunes client case
Recommender Systems
Homespun UX: Going Beyond Web Analytics
Recommender System _Module 1_Introduction to Recommender System.pptx
Production and Beyond: Deploying and Managing Machine Learning Models
KB Seminars: Working with Technology - Product Management; 10/13
Recommender Systems
User Experience Strategy
Why User Experience Matters 2009
Analysis on Recommended System for Web Information Retrieval Using HMM
Web Surveys Builder Quick Reference manual
Recommender Systems @ Scale - PyData 2019
Whatsapp chat anayliser usig python
Rashmi Xerox Parc
Rashmi Xerox Parc
Social intranets: 10 ways to drive adoption
Ad

Recently uploaded (20)

PDF
Transcultural that can help you someday.
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Global Data and Analytics Market Outlook Report
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Database Infoormation System (DBIS).pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
How to run a consulting project- client discovery
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
modul_python (1).pptx for professional and student
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Transcultural that can help you someday.
Qualitative Qantitative and Mixed Methods.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Leprosy and NLEP programme community medicine
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
STERILIZATION AND DISINFECTION-1.ppthhhbx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Optimise Shopper Experiences with a Strong Data Estate.pdf
Global Data and Analytics Market Outlook Report
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Database Infoormation System (DBIS).pptx
A Complete Guide to Streamlining Business Processes
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
How to run a consulting project- client discovery
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Pilar Kemerdekaan dan Identi Bangsa.pptx
modul_python (1).pptx for professional and student
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Ad

Practical Recommendation System - Scalable Machine Learning

  • 2. 1. Introduction 2. Goals of Recommendation system 3. The high level architecture of Recommender system 4. Basic models of recommender Systems 5. Experiences 6. Q&A 1
  • 3. 2
  • 5. BIG 4 Xavier 2014 [1] • 80M Users • A Thousand of shops, articles, OA, Music … • >20K product • Billion interactions times • …
  • 6. VALUE 5 Xavier 2014 [1] ZingMp3: >30% traffic ZOA: improve >30% total click and follow
  • 7. PERSONAL EXPERIENCES 6  “Mining” the user’s preferences  Improve the user experiences
  • 8. 7
  • 9. 8 1. Prediction version of problem (Matrix completion problem): 1. m users, n items, r ratings: (m x n) matrix 2. Given specified values for training, then predict missing rating values 2. Ranking version of problem (top-k recommendation problem) 1. Recommend the top-k items/users for a particular user/item Charu 2016 [2]
  • 10. 9 1. The ultimate goal is Increasing PROFIT 1. Direct: Increasing product sales, total click (product …), total time spend on pages (music, news…) 2. Indirect: Increasing total connection (Facebook). 3. The common operational and technical goals is:  Relevance  Novelty  Serendipity  Diversity Charu 2016 [2]
  • 12. 11
  • 15. 14 Bogers, T., & Van Den Bosch [6]
  • 16. 15
  • 19. 18 1. Architecture: pipeline for big data 1. Ex: Mp3 - daily (1M user x 100k song), 50G App + 8G Web log 2. Streaming, batch 2. Algorithm: Scalable 1. This is not Machine Learning, this is Scalable Machine Learning 2. Combine multi-model 3. Monitor system 4. Evaluation
  • 21. 20 Cron.d LuigiZDB (3) Extract Column (4) Union ... (1) call Luigi(5) call java (6)Streamdata (7) upload DB (8)visualize
  • 22. 21
  • 23. 22 1. Memory-based methods (neighborhood- based CF) 1. User-based CF 2. Item-based CF 2. Model-based methods: predictive models 1. Decision trees 2. Rule-based model 3. Bayesian methods 4. Latent factor models 5. Clustering 6. LDA 7. SVD/MF 8. Deep learning 9. … Charu 2016 [2]
  • 24. 23 1. Recommend similar items based on their “description”  Required to analyzing the content of items which user has rated in the past  Usually keyword, prices.. 2. Knowledge-based recommender systems 1. Useful in the context of item are not purchased often (house, car … ) 2. Useful for search engine, conversational systems, navigation-based recommender 3. Demographic Recommender System
  • 25. 24 1. User preferences changes … 2. Context aware: 1. Location-based recommender systems 2. Time-sensitive recommender systems 3. Social recommender system => social network analysis 3. Features learning
  • 26. 25 1. Trustworthy Recommender system 2. Social feedback analysis 3. Group recommender systems 4. Multi-Criteria Recommender system 5. Active learning 6. Privacy in recommender system
  • 27. 26
  • 28. 27 Machine learning Online Evaluatio n Offline Evaluatio n 1. Log 2. Feature 3. Indexes Chart … 4. Model 6. Indexes 5. Problems What do they want?
  • 29. 28 1. Advertisement (ZaloFeed) 2. Search engine (ZingMp3) 3. Next Song (ZingMp3) 4. Personal playlist (ZingMp3) 5. Personal suggestion (OA, Product) 6. Trade-off 1. Accuracy vs Diversity (short term vs long term) 2. Discovery vs Continuation 3. Freshness vs stability 4. Efficiency vs Accuracy
  • 30. 29 1. Business KPI – Indexes 2. Casual Effect Measurement 3. User behavior Analysis
  • 31. 30 1. Candidate generation or ranking prediction 2. Classification or CF 3. Cost function? 4. Performance: Time, Accuracy …
  • 32. 31Andrew NG [7] 1. Explaining recommendation system - Yifan[10] 2. Complex model is good? 3. Deep learning vs Boosting
  • 33. 32 Machine learning Online Evaluatio n Offline Evaluatio n 1. Log 2. Feature 3. Indexes Chart … 4. Model 6. Indexes 5. Problems What do they want?
  • 34. 33 1. What kind of Rating? 1. Explicit or Implicit 2. Preferences vs confidences 3. Impression or No negative feedback 4. How to treat the missing data? 5. Appropriate evaluation measurement 1. Youtube: total time listen [11] 2. [10] 3. … Yifan Hu[10] 𝑟𝑎𝑛𝑘
  • 35. 34 1. People laws and rich-get-richer phenomena [13] 2. The effect of long tail – stanford [12]
  • 36. 35 1. Take care about bias 2. Take care about data snooping
  • 37. 36 1. Outlier and abnormal
  • 38. 37 1. Keep moving and think about average 2. Monitoring
  • 39. 38 1. Pick the right features really help 1. Spotify: same playlist 2. Youtube: pair count 3. Tencent: co-items 4. ZingMp3:  same playlist, co-items
  • 40. 39
  • 41. 40
  • 42. 41 1. This is not Machine Learning, this is Scalable Machine Learning 2. Build a robust pipeline for big data 3. Monitoring – control the data qualification 4. Evaluation – user behaviors and casual effect 5. Define indexes and feature are really important 6. Keep in touch with product man and log man 7. Start with simple, improve with patient 8. …
  • 43. 42 ZingMp3: >30% traffic ZOA: improve >30% total click and follow Product and personal playlist is coming soon
  • 44. 43
  • 45. 44
  • 46. 1. https://www.slideshare.net/xamat/recommender-systems-machine-learning- summer-school-2014-cmu 2. Recommender System The Text Book – Charu C. Aggarwal 3. Amatriain, X. (2013). Mining large streams of user data for personalized recommendations. ACM SIGKDD Explorations Newsletter, 14(2), 37. https://doi.org/10.1145/2481244.2481250 4. Amatriain, X. (2013). Big & personal: data and models behind netflix recommendations. Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining Algorithms, Systems, Programming Models and Applications - BigMine ’13, 1–6. https://doi.org/10.1145/2501221.2501222 5. https://buildingrecommenders.wordpress.com/2016/10/10/mendeley-suggest- architecture/ 6. Bogers, T., & Van Den Bosch, A. (2009). Collaborative and content-based filtering for item recommendation on social bookmarking websites. CEUR Workshop Proceedings (Vol. 532). https://doi.org/10.1007/978-0-387-85820-3 7. Andrew NG https://www.youtube.com/watch?v=n1ViNeWhC24 45
  • 47. 8. https://hackernoon.com/spotifys-discover-weekly-how-machine-learning-finds- your-new-music-19a41ab76efe 9. TencentRec: Real-time Stream Recommendation in Practice 10. Collaborative filtering for implicit Feedback Datasets – Yifan Hu 11. https://research.google.com/pubs/pub45530.html -youtube 12. http://bid.berkeley.edu/cs294-1-spring13/index.php/About_People - Stanford 13. http://www.cs.cornell.edu/home/kleinber/networks-book/networks-book-ch18.pdf 46