SlideShare a Scribd company logo
Using Query Reformulation for  User Profiling Jim Jansen College of Information Sciences and Technology  The Pennsylvania State University  [email_address] Interested in how much  descriptive  information we can generate about a  people  by leveraging  search log data .
What Did We Find Out? We can tell quite a lot about a user! When combined with other information,  query reformulation  is a  revealing  searching characteristic.
The State of Web Search Why search data is important
The Power of Search and the Web  Search is  the   top online activity Search drives over  7 billion monthly  queries in the U.S. Online activity has a  huge impact  on people’s daily lives: 70 minutes less with family 30 minutes less TV 8.5 minutes less sleep Sources: comScore, U.S., Feb. ’06, Stanford Institute for the Quantitative Study of Society, Nov. ‘05
Analysis of Search Marketplace  Holding  fairly stable  over the last year or so, albeit with some  Bing flux
Search Logs Contains the  trace data  recorded when a person visits the search engine, submits a query, views results, etc On one hand, logs have been  criticized   for  not being rich enough  (i.e., only have behaviors but  not  the  ‘why ’ factors) On the other hand, logs have been  criticized  for  recording too much  about us (i.e., logging a lot of  personal  information about a person) search logs How much we can  learn  about a person from the data stored in search logs? Specifically, how rich of a searcher profile can we build of  what  a person is doing, of  why  they are doing it, and to  predict  what are they going to do next?
An illustrative example
How much can we tell from a single query?  ASIS&T  is an acronym for the American Society of Information Science and Technology  Good  probability  that this user is an  academic , a researcher, a librarian, or a student in one of these disciplines  Leveraging  demographic information : 57 percent female / 43 percent male probability  66.2 percent chance works in the information science field 55.6 percent probability this user has master’s degree
How much can we tell from a single query?  Leveraging  demographic information  (cont’d): 32.3 percent probability this user has a doctorate 53 percent likelihood works in academia.  Using  IP , we can locate the geographical area Based on  time , could infer that: this person is searching for the conference’s schedule (if the query is submitted prior to the meeting) for travel or looking for presentations or papers from the meeting (if the query is submitted after the conference).  Theoretically,  we can tell a lot ! However, with  billions of queries  per month, we can’t do the analysis  by hand  like this example. To develop user profiles, we need  automated methods . Research Question -  How complete of a  profile  can one develop for a Web search engine  user  from search  log  data?  [(a) what the user is doing, (b) what the user is interested in, and (c) what the user intends to do]
Specific aspects with automated methods …  Location  Geographical interest Topical interest Topical complexity Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification –  where the user is at –  where the user is going –  what the user is interested in –  how motivated is the user –  Info, Nav, Transactional –  eCommerce related –  getting ready to buy –  will user click on link - demographic targeting/personalization - specific user targeting
Automated methods using query reformulation Location  Geographical interest Topical interest Topical complexity  – n-grams pattern analysis Content desires Commercial intent Purchase intent Potential to click on a link   Gender User identification
Where to get full story?  The methodological implementation reported in paper in your ASIST proceedings: Jansen, B.J., Zhang, M., Booth, B. Park, D., Zhang, Y., Kathuria, A. and Bonner, P.  (2009)  To What Degree Can Log Data Profile a Web Searcher?  Proceedings of the American Society for Information Science and Technology 2009 Annual Meeting. Vancouver, British Columbia. 6-11 November.
Topical Complexity Number of  queries  by a  user  in a  session  on a  topic  can tell us many things: the  complexity  of the topic the user’s  motivation  for the need provide  prediction  of future action
Information Searching Probabilistic  user modeling increasingly important area allows computer systems to adapt to users Algorithmic techniques typically employ  state models Simple Bayesian Classifier Markov Modeling n-grams Note: not always ‘informational’ anymore. Many time people are searching for ‘ other things ’. Rose & Levinson (2004); Jansen, Booth, & Spink (2008).
Illustration of Probabilistic User Modeling Using n-grams Given these states … …  how accurately can we predict these? AC 5 A 4 ABCDE 3 ABCDE 2 ABCF 1 Search State Transitions User 40% D C 60% B A 100% E CD 66% D BC 1OO% C AB Accuracy Next State? Predictive Pattern
Example Using Search Log ~ 965,000 searching sessions ~ 1,500,000 queries 8 states focusing on query reformulation Similar results for other aspects of searching See - Qui (1993), Jansen (2005), Jansen & McNeese (2006) Maybe ‘states’ are not the correct paradigm? Jansen, B. J., Booth, D. L., & Spink, A. (2009).  Patterns of query modification during Web searching .  Journal of the American Society for Information Science and Technology . 10% improvement from 1 st  to 2 nd  order: okay, but would like to do better 0 1 st 2 nd 3 rd 4 th Order of the Model Accuracy of Prediction 0.1  0.2  0.3  0.4  0.5  0.6 0.28 0.40 0.47 0.44 0.44 0.60 Drop out rate (folks who don’t submit a query ~40%)
User Profiling Framework  Classify user aspects into two levels:  internal  and  external .  Internal  aspects refer to  attributes  of the users themselves.  External  aspects relate to the  behavior or interest  of the users.  Interaction  between  internal  and  external  aspects. Can  infer   external  aspects from  internal  aspects.  External  aspects  reflect   internal  aspects
Thank you! (open for questions and further discussion) Jim Jansen College of Information Sciences and Technology  The Pennsylvania State University  [email_address]
Search Logs has some common fields, such as time, queries, results, etc. We can enrich the log with additional fields. Back Back
Back
Back

More Related Content

PPT
Profiling a Person With Search Log Data
PDF
Summary of Paper : Taxonomy of websearch by Broder
PPTX
How to utilize ‘big data’ on SNS for academic purpose?
DOCX
1. explain the relationship among data mining, text mining, and sent
PPTX
Digital Demography - WWW'17 Tutorial - Part II
PPTX
Not-so-obvious Online Data Sources for Demographic Research
PPT
Social Media Analysis: Present and Future
PPTX
Social Media in Qualitative Research Final Copy
Profiling a Person With Search Log Data
Summary of Paper : Taxonomy of websearch by Broder
How to utilize ‘big data’ on SNS for academic purpose?
1. explain the relationship among data mining, text mining, and sent
Digital Demography - WWW'17 Tutorial - Part II
Not-so-obvious Online Data Sources for Demographic Research
Social Media Analysis: Present and Future
Social Media in Qualitative Research Final Copy

What's hot (8)

PDF
Opinion mining for social media
PPTX
Political Poster Edit
PDF
Computational methods for intelligent matchmaking for knowledge work
PPTX
Digital Trace Data for Demographic Research
PPTX
Crim 4384 statistics
PDF
Team CDTW Capstone Presentation
PDF
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
PPTX
Tinder Research Report
Opinion mining for social media
Political Poster Edit
Computational methods for intelligent matchmaking for knowledge work
Digital Trace Data for Demographic Research
Crim 4384 statistics
Team CDTW Capstone Presentation
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Tinder Research Report
Ad

Viewers also liked (20)

PPS
Il Re e l' Imperatore
PPT
Cold war (1)
DOC
Danh SáCh HọC Sinh YếU Thi LạI NăM HọC 2008
PPS
מקומות קסומים
PDF
Why Join RE/MAX Crest Westside as a Sales Partner Info Book
PPS
Awkward Family Photos Slide Show
PDF
Walton Boulevard Reconstruction, APWA Project of the Year
PDF
Adventures in freemium
PPT
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
PPTX
How I learned to stop worrying and love Oracle
PPT
United Teak International
PPT
Information Skills: 1. Planning & Mindmapping (Natural Sciences, Bangor Unive...
PPT
Informe Productes
PDF
Municipal Infrastructure: Managing Assets to Capital Improvement Plans
PPT
Lesson 15 When Where To Show Your Ads
PPTX
Gravador de Chamadas - Alternativas e Tipos
PPTX
Module 6: Bloggin in the Classroom
PPSX
MorfologíA Submarina Del Peru I
PPT
Wordpress To Go Democamp Mtl2009
PPT
Jenny, Katerina And Arynda
Il Re e l' Imperatore
Cold war (1)
Danh SáCh HọC Sinh YếU Thi LạI NăM HọC 2008
מקומות קסומים
Why Join RE/MAX Crest Westside as a Sales Partner Info Book
Awkward Family Photos Slide Show
Walton Boulevard Reconstruction, APWA Project of the Year
Adventures in freemium
lesson_03 Setting up Adwords Accounts, Adwords, and Selecting Businesses
How I learned to stop worrying and love Oracle
United Teak International
Information Skills: 1. Planning & Mindmapping (Natural Sciences, Bangor Unive...
Informe Productes
Municipal Infrastructure: Managing Assets to Capital Improvement Plans
Lesson 15 When Where To Show Your Ads
Gravador de Chamadas - Alternativas e Tipos
Module 6: Bloggin in the Classroom
MorfologíA Submarina Del Peru I
Wordpress To Go Democamp Mtl2009
Jenny, Katerina And Arynda
Ad

Similar to The Use of Query Reformulation to Predict Future User Actions (20)

PDF
PPT
DBLP-SSE: A DBLP Search Support Engine
PPT
David Nicholas, Ciber: Audience Analysis and Modelling, the case of CIBER and...
PDF
Ibrahim ramadan paper
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
WISE2019 presentation
PPTX
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
DOCX
RESEARCH ARTICLEEXPECTING THE UNEXPECTED EFFECTS OF DATA.docx
PPTX
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
PPTX
CSC315_LECTURE on database design and management
DOCX
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...
PDF
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
PDF
Data Collection Tool Used For Information About Individuals
PDF
Optimizing Search Interactions within Professional Social Networks (thesis p...
PDF
Ac02411221125
PPTX
INFORMATION RETRIEVAL Anandraj.L
PDF
G017415465
PDF
Smashing SIlos: UX is the New SEO
DOCX
httpowl.english.purdue.eduowlresource54401 The Pur
DBLP-SSE: A DBLP Search Support Engine
David Nicholas, Ciber: Audience Analysis and Modelling, the case of CIBER and...
Ibrahim ramadan paper
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
WISE2019 presentation
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
RESEARCH ARTICLEEXPECTING THE UNEXPECTED EFFECTS OF DATA.docx
Learning from Complex Online Behavior with Andy Edmonds - Big Brains
CSC315_LECTURE on database design and management
Alejandro Arizpe - Artificial Intelligence, Machine Learning, and Databases i...
Semantics-aware Techniques for Social Media Analysis, User Modeling and Recom...
Data Collection Tool Used For Information About Individuals
Optimizing Search Interactions within Professional Social Networks (thesis p...
Ac02411221125
INFORMATION RETRIEVAL Anandraj.L
G017415465
Smashing SIlos: UX is the New SEO
httpowl.english.purdue.eduowlresource54401 The Pur

More from Jim Jansen (13)

PPTX
Networked Consumers: How networked and how important?
PPT
Web analytics presentation
PPTX
Jjansen networked consumer_2011
PPT
Web analytics webinar
PPT
Twitter and EWOM Branding
PPT
Lesson_04_ist402_google_adwords_02
PPT
Lesson 13 Writing Good Ads 02
PPT
Lesson 11 Writing Good Ads
PPT
Lesson 07 Ist402 Keywords Take 02
PPT
Lesson 06 Ist402 Keywords 02
PPT
Lesson 05 Three Course Requirements
PPT
Ist402 Google Marketing Challenge V02
PPT
What Is Log Analyis
Networked Consumers: How networked and how important?
Web analytics presentation
Jjansen networked consumer_2011
Web analytics webinar
Twitter and EWOM Branding
Lesson_04_ist402_google_adwords_02
Lesson 13 Writing Good Ads 02
Lesson 11 Writing Good Ads
Lesson 07 Ist402 Keywords Take 02
Lesson 06 Ist402 Keywords 02
Lesson 05 Three Course Requirements
Ist402 Google Marketing Challenge V02
What Is Log Analyis

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Machine learning based COVID-19 study performance prediction
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity

The Use of Query Reformulation to Predict Future User Actions

  • 1. Using Query Reformulation for User Profiling Jim Jansen College of Information Sciences and Technology The Pennsylvania State University [email_address] Interested in how much descriptive information we can generate about a people by leveraging search log data .
  • 2. What Did We Find Out? We can tell quite a lot about a user! When combined with other information, query reformulation is a revealing searching characteristic.
  • 3. The State of Web Search Why search data is important
  • 4. The Power of Search and the Web Search is the top online activity Search drives over 7 billion monthly queries in the U.S. Online activity has a huge impact on people’s daily lives: 70 minutes less with family 30 minutes less TV 8.5 minutes less sleep Sources: comScore, U.S., Feb. ’06, Stanford Institute for the Quantitative Study of Society, Nov. ‘05
  • 5. Analysis of Search Marketplace Holding fairly stable over the last year or so, albeit with some Bing flux
  • 6. Search Logs Contains the trace data recorded when a person visits the search engine, submits a query, views results, etc On one hand, logs have been criticized for not being rich enough (i.e., only have behaviors but not the ‘why ’ factors) On the other hand, logs have been criticized for recording too much about us (i.e., logging a lot of personal information about a person) search logs How much we can learn about a person from the data stored in search logs? Specifically, how rich of a searcher profile can we build of what a person is doing, of why they are doing it, and to predict what are they going to do next?
  • 8. How much can we tell from a single query? ASIS&T is an acronym for the American Society of Information Science and Technology Good probability that this user is an academic , a researcher, a librarian, or a student in one of these disciplines Leveraging demographic information : 57 percent female / 43 percent male probability 66.2 percent chance works in the information science field 55.6 percent probability this user has master’s degree
  • 9. How much can we tell from a single query? Leveraging demographic information (cont’d): 32.3 percent probability this user has a doctorate 53 percent likelihood works in academia. Using IP , we can locate the geographical area Based on time , could infer that: this person is searching for the conference’s schedule (if the query is submitted prior to the meeting) for travel or looking for presentations or papers from the meeting (if the query is submitted after the conference). Theoretically, we can tell a lot ! However, with billions of queries per month, we can’t do the analysis by hand like this example. To develop user profiles, we need automated methods . Research Question - How complete of a profile can one develop for a Web search engine user from search log data? [(a) what the user is doing, (b) what the user is interested in, and (c) what the user intends to do]
  • 10. Specific aspects with automated methods … Location Geographical interest Topical interest Topical complexity Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification – where the user is at – where the user is going – what the user is interested in – how motivated is the user – Info, Nav, Transactional – eCommerce related – getting ready to buy – will user click on link - demographic targeting/personalization - specific user targeting
  • 11. Automated methods using query reformulation Location Geographical interest Topical interest Topical complexity – n-grams pattern analysis Content desires Commercial intent Purchase intent Potential to click on a link Gender User identification
  • 12. Where to get full story? The methodological implementation reported in paper in your ASIST proceedings: Jansen, B.J., Zhang, M., Booth, B. Park, D., Zhang, Y., Kathuria, A. and Bonner, P. (2009) To What Degree Can Log Data Profile a Web Searcher? Proceedings of the American Society for Information Science and Technology 2009 Annual Meeting. Vancouver, British Columbia. 6-11 November.
  • 13. Topical Complexity Number of queries by a user in a session on a topic can tell us many things: the complexity of the topic the user’s motivation for the need provide prediction of future action
  • 14. Information Searching Probabilistic user modeling increasingly important area allows computer systems to adapt to users Algorithmic techniques typically employ state models Simple Bayesian Classifier Markov Modeling n-grams Note: not always ‘informational’ anymore. Many time people are searching for ‘ other things ’. Rose & Levinson (2004); Jansen, Booth, & Spink (2008).
  • 15. Illustration of Probabilistic User Modeling Using n-grams Given these states … … how accurately can we predict these? AC 5 A 4 ABCDE 3 ABCDE 2 ABCF 1 Search State Transitions User 40% D C 60% B A 100% E CD 66% D BC 1OO% C AB Accuracy Next State? Predictive Pattern
  • 16. Example Using Search Log ~ 965,000 searching sessions ~ 1,500,000 queries 8 states focusing on query reformulation Similar results for other aspects of searching See - Qui (1993), Jansen (2005), Jansen & McNeese (2006) Maybe ‘states’ are not the correct paradigm? Jansen, B. J., Booth, D. L., & Spink, A. (2009). Patterns of query modification during Web searching . Journal of the American Society for Information Science and Technology . 10% improvement from 1 st to 2 nd order: okay, but would like to do better 0 1 st 2 nd 3 rd 4 th Order of the Model Accuracy of Prediction 0.1 0.2 0.3 0.4 0.5 0.6 0.28 0.40 0.47 0.44 0.44 0.60 Drop out rate (folks who don’t submit a query ~40%)
  • 17. User Profiling Framework Classify user aspects into two levels: internal and external . Internal aspects refer to attributes of the users themselves. External aspects relate to the behavior or interest of the users. Interaction between internal and external aspects. Can infer external aspects from internal aspects. External aspects reflect internal aspects
  • 18. Thank you! (open for questions and further discussion) Jim Jansen College of Information Sciences and Technology The Pennsylvania State University [email_address]
  • 19. Search Logs has some common fields, such as time, queries, results, etc. We can enrich the log with additional fields. Back Back
  • 20. Back
  • 21. Back