SlideShare a Scribd company logo
Recommendations from the search
engine
Sesam Hackathon, Warsaw, 2014-03-23
Lars Marius Garshol, larsga@bouvet.no, http://twitter.com/larsga
1
This whole presentation is about Ted Dunning’s
proposed approach to recommendations
Based on his 1993 paper (below)
– references at the end
Very simple method, dead easy to implement
– seems to work pretty well
2
Inspiration
Usually designed as prediction of ratings
– Dunning believes this is the wrong approach
– people’s ratings don’t necessarily reflect what they’ll
buy
– go by what people do rather than what they say
You don’t want to recommend Bob Dylan
– everyone’s already heard about him, and know what
they think
– you want to recommend things that are new to the user
You don’t want to recommend things everyone
likes
3
Thoughts on recommendations
Step 1
– work out which things tend to occur together
– that is, if you buy this, you’re likely to also buy this
– however, we only want pairs which are statistically
significant
Step 2
– index up the significant pairs in a search engine
– use search to produce the actual results
4
The actual approach
Statistically significant co-
occurrence
Part the first
User Item
u1 i1
u1 i2
u2 i1
u3 i2
u3 i3
u3 i4
... ...
The starting point
Some kind of log of user actions
User has
– bought a movie | album | book | ...
– opened a document
– ...
From this raw material, we can work
out what things tend to go together
– and whether this is significant
7
i1 i2 i3 i4 i5 i6 i7
i1 23 42 0 0 5 7
i2 23 6 1 129 2 10
i3 42 6 3 0 492 1
i4 0 1 3 2 3 1
i5 0 129 0 2 94 2
i6 5 2 492 3 94 1
i7 7 10 1 1 2 1
8
Item-to-item matrix
k[0][0] = the number in the matrix on
previous slide
k[0][1] = the sum of that whole column
minus k[0][0]
k[1][0] = the sum of that whole row
minus k[0][0]
k[1][1] = the sum of the entire matrix
minus k[0][0] minus k[1][0] minus
k[0][1]
9
Producing the k 2x2 matrix
How to compute the k matrix for a given cell in the matrix
on the previous slide
If the output of LLR(k) is above some threshold, the pair is considered significant.
Check the Python code on
– https://github.com/larsga/py-
snippets/tree/master/machine-learning/llr
– this requires a lot of memory and CPU
Or just use Mahout
– RowSimilarityJob does exactly this
10
Doing it for real
Search engine as recommender
Part the second
Take all the items and index them up with the
search engine in the usual way
– that is, each title has an id, a title, a description, etc
Then, add a “magic” field
– put into it the IDs of all the items that appear in a
significant pair with this item
– let’s call this field “indicators”
Now we’re ready to do recommendations
12
Indexing with the search engine
Collect some set of items for which the user has
expressed a preference
– by buying them, looking at them, rating them, whatever
The IDs of these items are your query
– search the “indicators” field
– the search results are your recommendations
That’s it!
– pack up, go home
13
Doing recommendations
Imagine that you’re searching for movies, and you
type “the godfather”
– “the” appears in all documents, so documents matching that
get a low relevance score
– “godfather” appears in very few documents, so matches on
that get a high score
– this is basically TF/IDF in a nutshell
Now, imagine you liked two movies: “The Godfather”
and “The Daytrippers”
– nearly all movies have “The Godfather” as an indicator
– very few have “The Daytrippers”
– the second will therefore influence recommendations much
more
14
Why does it work?
Trying it out for real
Part the third
Again, the code is on Github
– very simple webapp based on web.py and Lucene
– https://github.com/larsga/py-
snippets/tree/master/machine-learning/llr
The underlying data is the MovieLens dataset
– 10 million ratings of 10,000 movies by 72,000 users
– http://grouplens.org/datasets/movielens/
16
Real demo with real data
llr.py
– this chews the data, producing the significant pairs
– takes huge amount of memory and about 30 minutes
– have made absolutely no attempts to optimize it
llr_index.py
– reads output of previous script, makes Lucene index
recom-ui.py
– the actual web application
17
Three scripts
18
19
20
Liked one movie
21
Liked two movies
Movies with highest llr scor
together with this movie
22
Liked three movies
Recommendations are actually now spot-on. At least for me.
class Movie:
def GET(self, movieid):
nocache()
doc = search.do_query('id', movieid)[0]
#recoms = search.do_query('indicators', movieid)
recoms = [search.do_query('id', movieid)[0] for movieid in doc.bets]
if hasattr(session, 'liked'):
youlike = search.do_query('indicators', session.liked)
else:
youlike = []
return render.movie(doc, recoms, youlike)
23
Complete code for movie page
Further work
Winding up
Tweak the parameters a bit to see what happens
Can we support a “Dislike” button?
Test it with more kinds of data
Learn how to do this with Mahout
25
Things left to do
26
What is this?
From Ted Dunning’s slides
27
And this?
From Ted Dunning’s slides
28
And this?
From Ted Dunning’s slides
The original 1993 paper
– http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14
.5962
Ebook with lots of background but little detail
– http://www.mapr.com/practical-machine-learning
Slides covering the same material
– www.slideshare.net/tdunning/building-multimodal-
recommendation-engines-using-search-engines
Blog post with actual equations
– http://tdunning.blogspot.com/2008/03/surprise-and-
coincidence.html
29
References

More Related Content

PPTX
Machine Learning - Challenges, Learnings & Opportunities
PPTX
Intro to Machine Learning
PDF
Data Science with Spark - Training at SparkSummit (East)
PDF
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
PDF
Introduction to machine learning
PDF
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
PDF
Introduction to Mahout and Machine Learning
PDF
Moving Your Machine Learning Models to Production with TensorFlow Extended
Machine Learning - Challenges, Learnings & Opportunities
Intro to Machine Learning
Data Science with Spark - Training at SparkSummit (East)
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Introduction to machine learning
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Introduction to Mahout and Machine Learning
Moving Your Machine Learning Models to Production with TensorFlow Extended

What's hot (20)

PPTX
Introduction to Big Data/Machine Learning
PDF
The Art of Social Media Analysis with Twitter & Python
PDF
Leveraging mesos as the ultimate distributed data science platform
PDF
AlphaPy: A Data Science Pipeline in Python
PDF
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
PPTX
Mauritius Big Data and Machine Learning JEDI workshop
PDF
Enhance discovery Solr and Mahout
PDF
Thought Vectors and Knowledge Graphs in AI-powered Search
PDF
Reflected intelligence evolving self-learning data systems
PPTX
Machine Learning using Big data
PDF
Crowdsourced query augmentation through the semantic discovery of domain spec...
PPTX
Python for Data Science with Anaconda
PDF
A New Year in Data Science: ML Unpaused
PPTX
Machine Learning in the age of Big Data
PDF
Natural Language Search with Knowledge Graphs (Haystack 2019)
PDF
Tutorial Data Management and workflows
PDF
Putting the Magic in Data Science
PPTX
From SQL to Python - A Beginner's Guide to Making the Switch
PPTX
Solr 6.0 Graph Query Overview
PDF
The Next Generation of AI-powered Search
Introduction to Big Data/Machine Learning
The Art of Social Media Analysis with Twitter & Python
Leveraging mesos as the ultimate distributed data science platform
AlphaPy: A Data Science Pipeline in Python
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Mauritius Big Data and Machine Learning JEDI workshop
Enhance discovery Solr and Mahout
Thought Vectors and Knowledge Graphs in AI-powered Search
Reflected intelligence evolving self-learning data systems
Machine Learning using Big data
Crowdsourced query augmentation through the semantic discovery of domain spec...
Python for Data Science with Anaconda
A New Year in Data Science: ML Unpaused
Machine Learning in the age of Big Data
Natural Language Search with Knowledge Graphs (Haystack 2019)
Tutorial Data Management and workflows
Putting the Magic in Data Science
From SQL to Python - A Beginner's Guide to Making the Switch
Solr 6.0 Graph Query Overview
The Next Generation of AI-powered Search
Ad

Viewers also liked (11)

PDF
Citation Graph Analysis to Identify Memes in Scientific Literature
PDF
Citation Graph Analysis to Identify Memes in Scientific Literature
PPT
Recommendation techniques
PPTX
Recomendation system: Community Detection Based Recomendation System using Hy...
PPTX
The Universal Recommender
PDF
Publishing Production: From the Desktop to the Cloud
PDF
Recommender system algorithm and architecture
PPT
Amazon Item-to-Item Recommendations
PPTX
Building a real time, solr-powered recommendation engine
PPT
Recommendation system
PPT
Report Writing - Conclusions & Recommendations sections
Citation Graph Analysis to Identify Memes in Scientific Literature
Citation Graph Analysis to Identify Memes in Scientific Literature
Recommendation techniques
Recomendation system: Community Detection Based Recomendation System using Hy...
The Universal Recommender
Publishing Production: From the Desktop to the Cloud
Recommender system algorithm and architecture
Amazon Item-to-Item Recommendations
Building a real time, solr-powered recommendation engine
Recommendation system
Report Writing - Conclusions & Recommendations sections
Ad

Similar to Using the search engine as recommendation engine (20)

PDF
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
PPTX
Олександр Обєдніков “Рекомендательные системы”
PDF
Recommender.system.presentation.pjug.01.21.2014
PPTX
My talk about recommendation and search to the Hive
PDF
Beyond Collaborative Filtering: Learning to Rank Research Articles
PDF
Measuring Relevance in the Negative Space
PPTX
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
PPTX
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
PDF
Introduction to Recommender Systems
PDF
Master in Big Data Analytics and Social Mining 20015
PDF
Recsys 2016
PPTX
Using Mahout and a Search Engine for Recommendation
PDF
Past, present, and future of Recommender Systems: an industry perspective
PDF
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
PPTX
Recommendation Techn
PDF
Recommender Systems In Industry
PPTX
Learn to Rank search results
PDF
Recommendation Systems
PDF
10 personalized-web-search-techniques
PDF
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
Олександр Обєдніков “Рекомендательные системы”
Recommender.system.presentation.pjug.01.21.2014
My talk about recommendation and search to the Hive
Beyond Collaborative Filtering: Learning to Rank Research Articles
Measuring Relevance in the Negative Space
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Introduction to Recommender Systems
Master in Big Data Analytics and Social Mining 20015
Recsys 2016
Using Mahout and a Search Engine for Recommendation
Past, present, and future of Recommender Systems: an industry perspective
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
Recommendation Techn
Recommender Systems In Industry
Learn to Rank search results
Recommendation Systems
10 personalized-web-search-techniques
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data

More from Lars Marius Garshol (20)

PDF
JSLT: JSON querying and transformation
PDF
Data collection in AWS at Schibsted
PPTX
Kveik - what is it?
PDF
Nature-inspired algorithms
PDF
Collecting 600M events/day
PDF
History of writing
PDF
NoSQL and Einstein's theory of relativity
PPTX
Norwegian farmhouse ale
PPTX
Archive integration with RDF
PPTX
The Euro crisis in 10 minutes
PPTX
Linked Open Data for the Cultural Sector
PPTX
NoSQL databases, the CAP theorem, and the theory of relativity
PPTX
Bitcoin - digital gold
PPTX
Hops - the green gold
PPTX
Big data 101
PPTX
Linked Open Data
PPTX
Hafslund SESAM - Semantic integration in practice
PPTX
Approximate string comparators
PPTX
Experiments in genetic programming
PPTX
Semantisk integrasjon
JSLT: JSON querying and transformation
Data collection in AWS at Schibsted
Kveik - what is it?
Nature-inspired algorithms
Collecting 600M events/day
History of writing
NoSQL and Einstein's theory of relativity
Norwegian farmhouse ale
Archive integration with RDF
The Euro crisis in 10 minutes
Linked Open Data for the Cultural Sector
NoSQL databases, the CAP theorem, and the theory of relativity
Bitcoin - digital gold
Hops - the green gold
Big data 101
Linked Open Data
Hafslund SESAM - Semantic integration in practice
Approximate string comparators
Experiments in genetic programming
Semantisk integrasjon

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
Teaching material agriculture food technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
KodekX | Application Modernization Development
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
cuic standard and advanced reporting.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Teaching material agriculture food technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The AUB Centre for AI in Media Proposal.docx
KodekX | Application Modernization Development
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
cuic standard and advanced reporting.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Programs and apps: productivity, graphics, security and other tools

Using the search engine as recommendation engine

  • 1. Recommendations from the search engine Sesam Hackathon, Warsaw, 2014-03-23 Lars Marius Garshol, larsga@bouvet.no, http://twitter.com/larsga 1
  • 2. This whole presentation is about Ted Dunning’s proposed approach to recommendations Based on his 1993 paper (below) – references at the end Very simple method, dead easy to implement – seems to work pretty well 2 Inspiration
  • 3. Usually designed as prediction of ratings – Dunning believes this is the wrong approach – people’s ratings don’t necessarily reflect what they’ll buy – go by what people do rather than what they say You don’t want to recommend Bob Dylan – everyone’s already heard about him, and know what they think – you want to recommend things that are new to the user You don’t want to recommend things everyone likes 3 Thoughts on recommendations
  • 4. Step 1 – work out which things tend to occur together – that is, if you buy this, you’re likely to also buy this – however, we only want pairs which are statistically significant Step 2 – index up the significant pairs in a search engine – use search to produce the actual results 4 The actual approach
  • 6. User Item u1 i1 u1 i2 u2 i1 u3 i2 u3 i3 u3 i4 ... ... The starting point Some kind of log of user actions User has – bought a movie | album | book | ... – opened a document – ... From this raw material, we can work out what things tend to go together – and whether this is significant
  • 7. 7
  • 8. i1 i2 i3 i4 i5 i6 i7 i1 23 42 0 0 5 7 i2 23 6 1 129 2 10 i3 42 6 3 0 492 1 i4 0 1 3 2 3 1 i5 0 129 0 2 94 2 i6 5 2 492 3 94 1 i7 7 10 1 1 2 1 8 Item-to-item matrix
  • 9. k[0][0] = the number in the matrix on previous slide k[0][1] = the sum of that whole column minus k[0][0] k[1][0] = the sum of that whole row minus k[0][0] k[1][1] = the sum of the entire matrix minus k[0][0] minus k[1][0] minus k[0][1] 9 Producing the k 2x2 matrix How to compute the k matrix for a given cell in the matrix on the previous slide If the output of LLR(k) is above some threshold, the pair is considered significant.
  • 10. Check the Python code on – https://github.com/larsga/py- snippets/tree/master/machine-learning/llr – this requires a lot of memory and CPU Or just use Mahout – RowSimilarityJob does exactly this 10 Doing it for real
  • 11. Search engine as recommender Part the second
  • 12. Take all the items and index them up with the search engine in the usual way – that is, each title has an id, a title, a description, etc Then, add a “magic” field – put into it the IDs of all the items that appear in a significant pair with this item – let’s call this field “indicators” Now we’re ready to do recommendations 12 Indexing with the search engine
  • 13. Collect some set of items for which the user has expressed a preference – by buying them, looking at them, rating them, whatever The IDs of these items are your query – search the “indicators” field – the search results are your recommendations That’s it! – pack up, go home 13 Doing recommendations
  • 14. Imagine that you’re searching for movies, and you type “the godfather” – “the” appears in all documents, so documents matching that get a low relevance score – “godfather” appears in very few documents, so matches on that get a high score – this is basically TF/IDF in a nutshell Now, imagine you liked two movies: “The Godfather” and “The Daytrippers” – nearly all movies have “The Godfather” as an indicator – very few have “The Daytrippers” – the second will therefore influence recommendations much more 14 Why does it work?
  • 15. Trying it out for real Part the third
  • 16. Again, the code is on Github – very simple webapp based on web.py and Lucene – https://github.com/larsga/py- snippets/tree/master/machine-learning/llr The underlying data is the MovieLens dataset – 10 million ratings of 10,000 movies by 72,000 users – http://grouplens.org/datasets/movielens/ 16 Real demo with real data
  • 17. llr.py – this chews the data, producing the significant pairs – takes huge amount of memory and about 30 minutes – have made absolutely no attempts to optimize it llr_index.py – reads output of previous script, makes Lucene index recom-ui.py – the actual web application 17 Three scripts
  • 18. 18
  • 19. 19
  • 21. 21 Liked two movies Movies with highest llr scor together with this movie
  • 22. 22 Liked three movies Recommendations are actually now spot-on. At least for me.
  • 23. class Movie: def GET(self, movieid): nocache() doc = search.do_query('id', movieid)[0] #recoms = search.do_query('indicators', movieid) recoms = [search.do_query('id', movieid)[0] for movieid in doc.bets] if hasattr(session, 'liked'): youlike = search.do_query('indicators', session.liked) else: youlike = [] return render.movie(doc, recoms, youlike) 23 Complete code for movie page
  • 25. Tweak the parameters a bit to see what happens Can we support a “Dislike” button? Test it with more kinds of data Learn how to do this with Mahout 25 Things left to do
  • 26. 26 What is this? From Ted Dunning’s slides
  • 27. 27 And this? From Ted Dunning’s slides
  • 28. 28 And this? From Ted Dunning’s slides
  • 29. The original 1993 paper – http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14 .5962 Ebook with lots of background but little detail – http://www.mapr.com/practical-machine-learning Slides covering the same material – www.slideshare.net/tdunning/building-multimodal- recommendation-engines-using-search-engines Blog post with actual equations – http://tdunning.blogspot.com/2008/03/surprise-and- coincidence.html 29 References