SlideShare a Scribd company logo
Personalizing Web Search using  Long Term Browsing History Nicolaas Matthijs (University of Cambridge, UK) Filip Radlinski (Microsoft, Vancouver) WSDM 2011   10/02/2011
What is personalized web search ?
What is personalized web search ? Present each user a different ranking tailored to their personal interests and information need = Personalized web search
Many approaches Clickthrough-based approaches PClick (Dou et al., 2007) Promote URLs previously clicked by the same user for the same query Profile-based approaches Teevan et al., 2005 Rich model of user interests Built from search-related information, previously visited web sites, documents on hard-drive, e-mails, etc. Re-rank top returned search results Related Work
Improve on existing personalized web search  techniques Combine a profile-based approach with  a clickthrough-based approach Selection of new features Build an improved user representation  from long term browsing history Improve on the evaluation methodology Find out whether search personalization makes a difference in real life Improve search result ranking without  changing search environment Develop tool used by real people Goal
Search Personalization Process User Interest Extraction User Profile representing user’s interests List of weighted terms List of all visited URLs and number of visits List of all search queries and results clicked Result re-ranking Change order of results to better reflect  user’s interests Get first 50 results for query on Google Re-rank based on user profile by giving a score to each snippet
User Profile Extraction Step 1: Term List Generation Don’t treat web pages as normal flat documents but as structured documents Use different sources of input data Title unigrams Metadata description unigrams Metadata keywords Full text unigrams Extracted terms (Vu et al., 2008) Extracted noun phrases (Clark et al., 2007) Specify how important each data source is (weight vector) Combination of data sources => List of terms to be associated with the user
User Profile Extraction Step 2: Term List Filtering No filtering WordNet based POS filtering Google N-Gram corpus based filtering => Filtered list of terms
User Profile Extraction Step 3: Term Weighting TF = User Profile: list of terms and term weights TF-IDF pBM25 (Teevan et al., 2005)
Search Personalization Process User Interest Extraction User Profile representing user’s interests List of weighted terms List of all visited URLs and number of visits List of all search queries and results clicked Result re-ranking Change order of results to better reflect  user’s interests Get first 50 results for query on Google Re-rank based on user profile by giving a score to each snippet
Result re-ranking Step 1: Snippet scoring Step 2: Keep Google rank into account Step 3: Give extra weight to previously visited pages Matching Unique Matching Language Model
Evaluation Difficult problem Most previous work Small number of users evaluating relevance of small number of queries  (Teevan et al., 2005) Simulate personalized search setting using TREC query and document collection After-the-fact log based analysis (Dou et al., 2007) Wanted to find out whether it yields a real difference in real-life usage Ideally: real-life usage data from lots of users over long time Unfeasible: high number of parameters => 2 step evaluation process
Need users and data to work with Full browsing history Not publicly available Firefox add-on 41 users / 3 months 530,334 page visits / 39,838 Google searches Evaluation: Capturing Data
Step 1: Offline Relevance Judgments Identify most promising parameter configurations Offline evaluation session 6 users assess the relevance of the top 50 results for 12 queries Assess all possible combinations of all parameters Calculate NDCG score for each ranking (Jarvelin et al., 2000)
Step 1: Results 15,878 profile + re-ranking combinations investigated Compared to 3 baseline systems (Google, PClick and Teevan) 4,455 better than Google | 3,335 better than Teevan | 1,580 better than Pclick Identified 4 most promising personalization approaches
Step 1: Results Treating web pages as a flat document does not work.  Advanced NLP techniques and keyword focused approaches work best. One re-ranking method outperforms all of the other ones: LM extra weight to visited URLs keeping the Google rank into account
Step 2: Online Interleaved Evaluation Assess the selected personalization techniques Extend Firefox add-on to do personalization in user’s browser as they go Interleaved evaluation using Team-Draft Interleaving algorithm  (Radlinski et al., 2008) Shown to accurately reflect differences in ranking relevance  (Radlinski et al., 2010)
Step 2: Online Interleaved Evaluation Count which ranking is clicked most often Original ranking (Google) Personalized ranking Personalized ranking 1. Infrared - Wikipedia http://wikipedia.org/infrared 2. IRTech - Infrared technologies http://www.irtech.org 3. International Rectifier -  Stock Quotes http://finance.yahoo.co.uk/IRE 4. SIGIR - New York Conference http://www.sigir.org 5. About Us - International Rectifier http://www.inrect.com   1. SIGIR - New York Conference http://www.sigir.org 2. Information Retrieval - Wikipedia http://wikipedia.org/ir 3. IRTech - Infrared technologies http://www.irtech.org 4. Infrared - Wikipedia http://wikipedia.org/infrared 5. About Us - International Rectifier http://www.inrect.com P O O 1. SIGIR - New York Conference http://www.sigir.org (P) 2. Infrared - Wikipedia http://wikipedia.org/infrared (O) 3. IRTech - Infrared technologies http://www.irtech.org (O) 4. Information Retrieval -  Wikipedia http://wikipedia.org/ir (P) 5. International Rectifier -  Stock Quotes http://finance.yahoo.co.uk/IRE (O) Interleaved Ranking
Results 41 users / 2 weeks / 7,997 queries MaxNDCG significantly (p < 0.001) outperforms Google MaxBestPar significantly (p < 0.01) outperforms Google MaxQuer significantly (p < 0.05) outperforms Google Run on all queries: 70% of queries untouched, 20% improved, 10% worse Average improvement of 4 ranks. Average deterioration of 1 rank. One strategy is consistently the best:  TF-IDF, RTitle, RMKeyw, RCCParse, NoFilt - LM, Look At Rank, Visited
Future Work Expand set of parameters Learning optimal weight vector Using other fields Temporal information How much browsing history should be used? Decay weighting of older items Page visit duration Other behavioral information Use extracted profile for other purposes
Conclusion Outperform Google and previous best personalization strategies Build an improved user profile for personalization Not treat web pages as flat documents Use more advanced NLP techniques Improve upon the evaluation methodology First large online comparative evaluation of personalization techniques Investigate whether personalization makes difference in real life usage Done in academic setting, no large datasets available Tool that can be downloaded and used by everyone Code is open sourced, very clean and readable
Questions

More Related Content

PPTX
Contextualised Browsing in a Digital Library’s Living Lab
PPT
Owning the Discovery Experience for Your Patrons
PPT
Session 2 Slideshare
PPT
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
PDF
Perception Determined Constructing Algorithm for Document Clustering
PDF
Collaborative Information Retrieval: Concepts, Models and Evaluation
PPT
Federated Search: The Good, The Bad And The Ugly
PPTX
Dissertation Presentation
Contextualised Browsing in a Digital Library’s Living Lab
Owning the Discovery Experience for Your Patrons
Session 2 Slideshare
SE@M 2010: Automatic Keywords Extraction - a Basis for Content Recommendation
Perception Determined Constructing Algorithm for Document Clustering
Collaborative Information Retrieval: Concepts, Models and Evaluation
Federated Search: The Good, The Bad And The Ugly
Dissertation Presentation

What's hot (7)

PDF
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
PDF
`A Survey on approaches of Web Mining in Varied Areas
DOCX
an empirical performance evaluation of relational keyword search techniques
PPT
What Do We Know About IPL Users?
PDF
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
PPTX
Designing Guidelines for Visual Analytics System to Augment Organizational An...
PPT
Presentation federated search
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
`A Survey on approaches of Web Mining in Varied Areas
an empirical performance evaluation of relational keyword search techniques
What Do We Know About IPL Users?
COST-SENSITIVE TOPICAL DATA ACQUISITION FROM THE WEB
Designing Guidelines for Visual Analytics System to Augment Organizational An...
Presentation federated search
Ad

Viewers also liked (6)

PDF
Responsive presentation
PDF
Opera Mini, Mobile Browsing India By Sagar
PDF
Apache Mahout Tutorial - Recommendation - 2013/2014
PDF
Recommender Systems
PPTX
Ppt on serivcing in hotel
PPT
Recommendation system
Responsive presentation
Opera Mini, Mobile Browsing India By Sagar
Apache Mahout Tutorial - Recommendation - 2013/2014
Recommender Systems
Ppt on serivcing in hotel
Recommendation system
Ad

Similar to WSDM 2011 - Nicolaas Matthijs and Filip Radlinski (20)

PDF
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
DOC
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
PDF
Query Recommendation by using Collaborative Filtering Approach
PPT
Search Me: Designing Information Retrieval Experiences
PDF
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
PDF
Sweeny ux-seo om-cap 2014_v3
PPTX
Evaluation of eLearning
PPTX
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
PPTX
Summary of data citation synthesis activity & Review
PDF
Classification of search_engine
PPT
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
PDF
Pratical Deep Dive into the Semantic Web - #smconnect
PDF
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
PPT
EDRM LegalTech NY 2009 Luncheon Presentation
PPTX
Modelling Time-aware Search Tasks for Search Personalisation
PDF
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
PPS
Making IA Real: Planning an Information Architecture Strategy
PPT
Systems Analysis And Design 2
PPTX
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS A scientometric analysis of cloud c...
Query Recommendation by using Collaborative Filtering Approach
Search Me: Designing Information Retrieval Experiences
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
Sweeny ux-seo om-cap 2014_v3
Evaluation of eLearning
MUDROD - Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Me...
Summary of data citation synthesis activity & Review
Classification of search_engine
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
Pratical Deep Dive into the Semantic Web - #smconnect
IRJET- A Novel Technique for Inferring User Search using Feedback Sessions
EDRM LegalTech NY 2009 Luncheon Presentation
Modelling Time-aware Search Tasks for Search Personalisation
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Making IA Real: Planning an Information Architecture Strategy
Systems Analysis And Design 2
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015

More from Nicolaas Matthijs (10)

PDF
Apereo OAE - State of the project - Open Apereo 2015
PDF
Apereo OAE - Architectural overview
PDF
Apereo OAE - State of the project
PDF
Apereo OAE - Bootcamp
PDF
Apereo Mexico 2014 - Apereo OAE - State of the project
PDF
Apereo Europe - Apereo OAE
PDF
ESUP Days - Apereo OAE
PDF
Apereo OAE - Architectural overview
PDF
Apereo OAE - Bootcamp
PPT
Sakai 3 R&D
Apereo OAE - State of the project - Open Apereo 2015
Apereo OAE - Architectural overview
Apereo OAE - State of the project
Apereo OAE - Bootcamp
Apereo Mexico 2014 - Apereo OAE - State of the project
Apereo Europe - Apereo OAE
ESUP Days - Apereo OAE
Apereo OAE - Architectural overview
Apereo OAE - Bootcamp
Sakai 3 R&D

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
KodekX | Application Modernization Development
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Cloud computing and distributed systems.
PPT
Teaching material agriculture food technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Electronic commerce courselecture one. Pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation_ Review paper, used for researhc scholars
KodekX | Application Modernization Development
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
Modernizing your data center with Dell and AMD
Building Integrated photovoltaic BIPV_UPV.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Cloud computing and distributed systems.
Teaching material agriculture food technology
Unlocking AI with Model Context Protocol (MCP)
Electronic commerce courselecture one. Pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Digital-Transformation-Roadmap-for-Companies.pptx
Review of recent advances in non-invasive hemoglobin estimation
Advanced methodologies resolving dimensionality complications for autism neur...

WSDM 2011 - Nicolaas Matthijs and Filip Radlinski

  • 1. Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs (University of Cambridge, UK) Filip Radlinski (Microsoft, Vancouver) WSDM 2011 10/02/2011
  • 2. What is personalized web search ?
  • 3. What is personalized web search ? Present each user a different ranking tailored to their personal interests and information need = Personalized web search
  • 4. Many approaches Clickthrough-based approaches PClick (Dou et al., 2007) Promote URLs previously clicked by the same user for the same query Profile-based approaches Teevan et al., 2005 Rich model of user interests Built from search-related information, previously visited web sites, documents on hard-drive, e-mails, etc. Re-rank top returned search results Related Work
  • 5. Improve on existing personalized web search techniques Combine a profile-based approach with a clickthrough-based approach Selection of new features Build an improved user representation from long term browsing history Improve on the evaluation methodology Find out whether search personalization makes a difference in real life Improve search result ranking without changing search environment Develop tool used by real people Goal
  • 6. Search Personalization Process User Interest Extraction User Profile representing user’s interests List of weighted terms List of all visited URLs and number of visits List of all search queries and results clicked Result re-ranking Change order of results to better reflect user’s interests Get first 50 results for query on Google Re-rank based on user profile by giving a score to each snippet
  • 7. User Profile Extraction Step 1: Term List Generation Don’t treat web pages as normal flat documents but as structured documents Use different sources of input data Title unigrams Metadata description unigrams Metadata keywords Full text unigrams Extracted terms (Vu et al., 2008) Extracted noun phrases (Clark et al., 2007) Specify how important each data source is (weight vector) Combination of data sources => List of terms to be associated with the user
  • 8. User Profile Extraction Step 2: Term List Filtering No filtering WordNet based POS filtering Google N-Gram corpus based filtering => Filtered list of terms
  • 9. User Profile Extraction Step 3: Term Weighting TF = User Profile: list of terms and term weights TF-IDF pBM25 (Teevan et al., 2005)
  • 10. Search Personalization Process User Interest Extraction User Profile representing user’s interests List of weighted terms List of all visited URLs and number of visits List of all search queries and results clicked Result re-ranking Change order of results to better reflect user’s interests Get first 50 results for query on Google Re-rank based on user profile by giving a score to each snippet
  • 11. Result re-ranking Step 1: Snippet scoring Step 2: Keep Google rank into account Step 3: Give extra weight to previously visited pages Matching Unique Matching Language Model
  • 12. Evaluation Difficult problem Most previous work Small number of users evaluating relevance of small number of queries (Teevan et al., 2005) Simulate personalized search setting using TREC query and document collection After-the-fact log based analysis (Dou et al., 2007) Wanted to find out whether it yields a real difference in real-life usage Ideally: real-life usage data from lots of users over long time Unfeasible: high number of parameters => 2 step evaluation process
  • 13. Need users and data to work with Full browsing history Not publicly available Firefox add-on 41 users / 3 months 530,334 page visits / 39,838 Google searches Evaluation: Capturing Data
  • 14. Step 1: Offline Relevance Judgments Identify most promising parameter configurations Offline evaluation session 6 users assess the relevance of the top 50 results for 12 queries Assess all possible combinations of all parameters Calculate NDCG score for each ranking (Jarvelin et al., 2000)
  • 15. Step 1: Results 15,878 profile + re-ranking combinations investigated Compared to 3 baseline systems (Google, PClick and Teevan) 4,455 better than Google | 3,335 better than Teevan | 1,580 better than Pclick Identified 4 most promising personalization approaches
  • 16. Step 1: Results Treating web pages as a flat document does not work. Advanced NLP techniques and keyword focused approaches work best. One re-ranking method outperforms all of the other ones: LM extra weight to visited URLs keeping the Google rank into account
  • 17. Step 2: Online Interleaved Evaluation Assess the selected personalization techniques Extend Firefox add-on to do personalization in user’s browser as they go Interleaved evaluation using Team-Draft Interleaving algorithm (Radlinski et al., 2008) Shown to accurately reflect differences in ranking relevance (Radlinski et al., 2010)
  • 18. Step 2: Online Interleaved Evaluation Count which ranking is clicked most often Original ranking (Google) Personalized ranking Personalized ranking 1. Infrared - Wikipedia http://wikipedia.org/infrared 2. IRTech - Infrared technologies http://www.irtech.org 3. International Rectifier - Stock Quotes http://finance.yahoo.co.uk/IRE 4. SIGIR - New York Conference http://www.sigir.org 5. About Us - International Rectifier http://www.inrect.com 1. SIGIR - New York Conference http://www.sigir.org 2. Information Retrieval - Wikipedia http://wikipedia.org/ir 3. IRTech - Infrared technologies http://www.irtech.org 4. Infrared - Wikipedia http://wikipedia.org/infrared 5. About Us - International Rectifier http://www.inrect.com P O O 1. SIGIR - New York Conference http://www.sigir.org (P) 2. Infrared - Wikipedia http://wikipedia.org/infrared (O) 3. IRTech - Infrared technologies http://www.irtech.org (O) 4. Information Retrieval - Wikipedia http://wikipedia.org/ir (P) 5. International Rectifier - Stock Quotes http://finance.yahoo.co.uk/IRE (O) Interleaved Ranking
  • 19. Results 41 users / 2 weeks / 7,997 queries MaxNDCG significantly (p < 0.001) outperforms Google MaxBestPar significantly (p < 0.01) outperforms Google MaxQuer significantly (p < 0.05) outperforms Google Run on all queries: 70% of queries untouched, 20% improved, 10% worse Average improvement of 4 ranks. Average deterioration of 1 rank. One strategy is consistently the best: TF-IDF, RTitle, RMKeyw, RCCParse, NoFilt - LM, Look At Rank, Visited
  • 20. Future Work Expand set of parameters Learning optimal weight vector Using other fields Temporal information How much browsing history should be used? Decay weighting of older items Page visit duration Other behavioral information Use extracted profile for other purposes
  • 21. Conclusion Outperform Google and previous best personalization strategies Build an improved user profile for personalization Not treat web pages as flat documents Use more advanced NLP techniques Improve upon the evaluation methodology First large online comparative evaluation of personalization techniques Investigate whether personalization makes difference in real life usage Done in academic setting, no large datasets available Tool that can be downloaded and used by everyone Code is open sourced, very clean and readable

Editor's Notes

  • #2: Talking about paper “Say Title” which was done as part of my Master’s thesis at the Univ of Cambridge Supervised by Filip from Microsoft Research
  • #3: Search for IR = short, ambiguous query. For the search engine, that looks the same, even though information need is different =&gt; Physicist : more likely be interested in InfraRed =&gt; Attendee of the conference: more likely be interested in Information Retrieval =&gt; Stock broker: more likely be interested in stock information from International Rectifier All presented the same ranking =&gt; not optimal
  • #4: More emphasis on their interests
  • #5: Quite a lot of research in personalized web search, but in general we see 2 different approaches Pclick is the best approach within th clickthrough-based ones that we found and we compare to Teevan is best profile-based approach within profile-based ones that we found and compare to
  • #6: 3 major goals: - Improve personalization - Improve evaluation - Create tool that people can use
  • #7: Search personalization is a 2-step process: first one is extracting user’s interests and second is re-ranking search results User is represented by the following things Last 2 can be trivially extracted from browsing history User Profile =&gt; has to be learned
  • #8: Use structure encapsulated in HTML code Title, metadata description, full text, metadata keywords, extracted terms, noun phrases Specify how important each data source --&gt; limited ourselves to give each data source a weight of 0, 1 or relative
  • #9: WordNet: include only those of a given set of PoS tags N-Gram: only include those terms that appear more than a given number of times on the web
  • #10: Calculate a weight for each term Frequency vector = number of occurrences for the term in each of the data sources TF weighting: dot product of weight vector and frequency vector TF-IDF: divide by log of Document Frequency. Normally, the document frequency is calculated from browsing history --&gt; word that shows up a lot in your browsing history does actually mean it’s relevant relative to all information on internet --&gt; used the Google N-Gram information pBM25: N = number of docs on internet, derived from google n-gram, nti = number of documents with that term (N-Gram), R = number of docs in browsing hist, rti = number of docs in browsing hist have that term
  • #11: 2nd step: re-rank results given the user profile Previously shown, re-ranking snippets is just as good as they are less noisy and more keyword focused + more realistic implementation.
  • #12: Score is indication of how relevant the result is for the current user Matching: sum over all snippet terms of freq of term in snippet times times weight of term Unique matching: ignore multiple occurrences of the same term Language Model = probability of the snippet given the user profile Extra weight to previously visited pages = extension to the Pclick concept
  • #13: Difficult --&gt; show how the personalization impacts day-to-day search activity First step is an offline relevance judgments exercise in which we try to come up with some parameter configurations that work well Second step is a large scale online evaluation to check how well the parameter configurations generalize over unseen users and browsing history and whether it makes a difference in real life
  • #14: Choose implicitly =&gt; don’t want to require additional user actions Generate unique identifier for every user =&gt; anonymous On every page visit it would store URL / Length of HTML / Duration Visit / Time and Date Except for secure HTTPS pages Stored in database =&gt; Server would fetch the actual HTML
  • #15: Relevance: Not Relevant (0), Relevant (1), Very Relevant (2) Normalized Discounted Cumulative Gain == rank quality score
  • #16: MaxNDCG = Approach that yielded highest average NDCG score (0.568 over 0.506) MaxQuer = Approach that improved highest number of queries (52 out of 72) MaxBestParam = Obtained by greedily selecting each parameter in given order MaxNoRank = Best approach that doesn’t take the Google ranking into account --&gt; interesting that we were able to find an approach that outperformed Google on its own. Later we found that it’s probably a case of overfitting training data, didn’t generalize in the online evaluation.
  • #17: Using the entire list of words performed considerably worse
  • #18: Interleaved evaluation present single ranking that interleaves 2 rankings --&gt; evaluate which one is higher quality
  • #22: Better than anything published so far