SlideShare a Scribd company logo
Techniques for 
Deep Query Understanding 
“Beware of the man who knows the answer before he understands the 
question” 
Guided By: 
Dr. Dhaval Patel, 
Assistant Professor, 
Department Of CSE, 
IIT Roorkee. 
Presented By: 
Abhay Prakash, 
En. No. – 10211002, 
CSI, V Year, 
IIT Roorkee.
(Source: Google) 
Introduction: Query Understanding 
 Purpose: 
 To understand what exactly the user is searching for – his precise intent 
 To correct mistakes and guide user to formulate a precise intended query 
Query Refinement 
Why only this phrase in Bold? 
(Source: Google) Query Suggestion
Emerging Variety of Queries 
 Natural Language Queries instead of Keyword Represented Queries 
 “who is the best classical singer in India” instead of “best classical singer India” 
 Use of NL Queries increasing (Makoto et.al in [1]) 
 Local Search Queries 
 “Where can I eat cheesecake right now?” 
 Context Dependent Queries (Interactive Question Answering 
(Source: Bing – location set as US)
Background: How results are generated 
Query Understanding 
(which index parameters to 
be used) 
Review: Hotel ABC, Civil Lines: 
I ate cheesecake, which was really 
awesome. (4/5 star) 
High Level Architecture of Search Mechanism (Source: Self Made) 
INDEX 
(Knowledge Base) 
Document Understanding 
(What and how to Index) 
User Query 
Results Ranking 
Entities: Hotel ABC, cheesecake 
Location: Civil Lines 
Quality: 0.8 
Time: 8:15 PM 
“where can I eat cheesecake 
right now?” 
Data 
(Text documents, User 
Reviews, Blogs, Tweets, 
Linkedin …) 
[Time: 8:15 PM] 
Intent: Hotel Search 
Search for: cheesecake 
Location: Civil lines 
Time: 8:20 PM
Background: QU & Adv. In Search (Weotta in [3]) 
1. Basic Search 
 Direct text match based retrieval of documents 
 Restrict search space using facet values provided by user 
 Current day example: Online shopping sites 
Mechanism in Basic Search (Source: Self Made) 
Example of Facets (Source: Flipkart.com)
Background: QU & Adv. In Search (Weotta in [3]) 
2. Advanced Search 
 Ranking of result documents based on: 
 TF-IDF to identify more relevant documents 
 Website authority and popularity 
 Keyword weighting 
 Not Considered: 
 Context, NLP for semantic understanding 
 Location of query, time of query 
 Example: Google as was in its early stage
Background: QU & Adv. In Search (Weotta in [3]) 
3. Deep Search 
 What difference does it bring? 
 Requirements: 
 Semantic Understanding of Query 
 Knowledge of Context, previous tasks 
 User Understanding and Personalization
Architecture: Query Understanding Module 
Query 
Query 
Suggestion 
Query 
Correction 
Query 
Expansion 
Query 
Classification 
Semantic 
Tagging 
2. Query Refinement 3. Query Intent Detection 
QUERY UNDERSTASNDING MODULE 
ANSWER 
GENERATION 
MODULE 
1. Query Suggestion 
Result 
Components of Query Understanding Module (Source: Self Made)
Architecture: Query Understanding Module 
Query 
i) michael jordan berkley 
ii) michael jordan NBA 
Query 
Suggestion 
Query 
Correction 
Query 
Expansion 
i) michael jordan berkley: academic 
ii) michael l. Jordan Berkley: academic 
Query 
Classification 
Semantic 
Tagging 
Example of purpose of each Component (Source: Self Made) 
michal jrdan 
michael jordan 
i) michael jordan berkley 
ii) michael l. jordan berkley 
i) [michael jordan: PersonName] 
[berkley: Location]: academic 
ii) [michael l. jordan: PersonName] 
[berkley: Location]: academic
Query Correction 
 Reformulates the ill-formed (mistaken) search queries 
 ex. Macine learning  Machine Learning 
 Refinements: 
 Spelling error, Two words Merged together, One word separated 
 Phrase segmentation (machine + learning  machine learning) 
 Acronym Expansion (CSE  Computer Science & Engineering) 
 Refinement may be mutually dependent 
 “lectures on machne learn” 
 learn is a correct term, but should have been learning 
 Hence, different terms need to be addressed simultaneously
 Problem Modeled by Jiafeng et.al in [10] as 
 Original Query (푥 = 푥1 푥2 . . . 푥푛)  Corrected Query (푦 = 푦1 푦2 . . . 푦푛) 
 Get y(complete sequence) which has maximum probability of occurrence, given the 
sequence x. 
 Simple Technique 
 Assume terms independent  take 푦푖 with max Pr 푦푖 푥푖 
 Prime Disadvantage: 
 Reality deviates a lot from assumption 
 Ex. “Lectures on machine learning” 
Independent Corrections 
Query Correction
Query Correction 
Using Conventional CRF 
 What is CRF? 
 Probabilistic graphical model, models conditional distribution 
of unobserved state sequences 
 Trained on given observation sequence 
 Trained for getting Pr 푦 푠푒푞푢푒푛푐푒 푥 
 Why use CRF? Conditioned on? 
 Sequence of words matters (learning machine?) 
 푦푖 conditioned on other 푦푖s as well, along with 푥푖 
 Corrections are mutually dependent (e.g. machine learning) 
 Disadvantage: 
 Will require very large amount of data, 푦푖 candidates’ domain open 
Conventional CRF
 Restricting space of y for the given x 
 conditioned 푦푖 on operation also 
 표 = 표1 표2 … 표푛, such that 표푖 required to get 푦푖 from 푥푖 
 표푖 is operation like deletion, insertion of characters, etc. 
 Learning and Prediction 
 Dataset of (푥(1), 푦(1), 표(1)), . . . , (푥(푁), 푦(푁), 표(푁)) 
 Features 
 log Pr(푦푖−1|푦푖 ), where the prob. calculated using corpus 
 Whether 푦푖 푖푠 표푏푡푎푖푛푒푑 푓푟표푚 푥푖 푎푓푡푒푟 표푝푒푟푎푡푖표푛 표푖 --{0|1} 
Basic CRF-QR Model 
Query Correction 
Basic CRF-QR Model (Jiafeng et. al in [10])
 What is new? 
 Handles scenario with more than one refinements 
 Machine learm  learn  learning 
 Sequence of (sequence of operation) 
 표 = 표푖,1, 표푖,2, . . . 표푖,푛 i.e. multiple operations on each word 
 Intermediate results: 푧푖 = 푧푖,1푧푖,2 . . . 푧푖,푚−1 
Extended CRF-QR 
Query Correction 
Extended CRF-QR Model (Jiafeng et. al in [10])
Query Suggestion 
 Purpose: 
 Suggest similar queries 
 Query auto-completion 
 Requirements 
 Context consideration [7] 
 Identifying Interleaved Tasks [9] 
 Personalized suggestion [2] 
Suggestions on “iit r..”
Context aware Query Suggestion (Huanhuan et.al in [7]) 
Query Suggestion Mechanism (Source: [7]) 
Query Suggestion 
 Query – mapped  Concept 
 Concept Suffix tree from log 
 Suggestion time: Transition on tree with each query’s concept 
 Suggest top queries of that state
Concept Suffix Tree 
 Concept Discovery 
 Queries clustered using set of clicked URLs 
 Feature vector 푞푖 = 
푛표푟푚 푤푖푗 푖푓 푒푑푔푒 푒푖푗 푒푥푖푠푡푠 
0 표푡ℎ푒푟푤푖푠푒 
 Each identified cluster is taken as a Concept 
 Concept Suffix Tree 
 Vertex: state after transition through a sequence of 
concepts (of queries) 
 Transition in a session 
 C2C3C1: transition Beginning  C1  C3  C2 
Click-Through Bipartite 
Query Suggestion 
Context aware Query Suggestion (Huanhuan et.al in [7])
Query Suggestion 
Task aware Query Suggestion (Allan et.al in [9]) 
 Why task identification Important? 
 Considering Off-Task query in context adversely affect quality of recommendation 
 30% sessions contained multiple tasks (Zhen et.al in [8]) 
 5% sessions have Interleaved tasks (Zhen et.al in [8]) 
 Identify similar previous queries as On-Task 
 consider only On-Task queries as context 
Effect of On-Task and Off-Task 
queries
Query Suggestion 
Task aware Query Suggestion (Allan et.al in [9]) 
 Measures to evaluate similarity between two queries 
 Lexical Score: captures similarity at word level directly. Average of: 
 Jaccard Coefficient between trigrams from the two queries: how many common trigrams? 
 (1 - Levenshtein Edit Distance), which shows closeness at word level 
 Semantic Score: maximum of the following two 
 푠푤푖푘푖푝푒푑푖푒푎(푞푖 , 푞푗 ): cosine similarity of vector of tf-idf score of Wikipedia documents w.r.t the 
two queries. 
 푠푤푖푘푡푖표푛푎푟푦 (푞푖 , 푞푗 ): similar to above on Wiktionary entries 
 Final Similarity(풒풊, 풒풋) = 휶 . Lexical Score + (1-휶) . Semantic Score 
 If Similarity(푞푖, Reference_q) greater than threshold  푞푖 is On-Task Query
Query Suggestion 
Personalization in Query Suggestion (Milad et.al in [2]) 
 On character hit of ‘i’ 
 “Instagram” more popular for female below 25 
 “Imdb” more popular for male in 25-44. 
 Candidate queries generated by prior general method 
 Personalization by re-ranking candidate queries 
 Features for feedback earlier global rank 
 Original position 
 Original score 
 Short History Features 
 3-Gram similarity with just previous query 
 Avg. 3-gram similarity with all previous queries in the session
Query Suggestion 
Personalization in Query Suggestion (Source: [2]) 
 Long History Features 
 No. of times candidate query issued in past 
 Avg. 3-gram similarity with all previous queries in the past 
 Demographic Features 
 Candidate query frequency over queries by same age group 
 Candidate query likelihood -- same age group 
 Candidate query frequency -- same gender group 
 Candidate query likelihood -- same gender group 
 Candidate query frequency -- same region group 
 Candidate query likelihood -- same region group
Query Expansion 
 Sending more words (should generate similar result) to tackle term-miss 
 Ex. “Tutorial lecture on ABC”  “Video Lecture on ABC” 
 Expansion Tasks: 
 Adding synonyms of words 
 Morphological words by stemming 
 Naïve Approach 
 Exhaustive lookup in thesaurus 
 Time taking 
 Still miss terms of similar intent (terms even semantically far)
Query Expansion 
Path Constrained Random Walk (Jianfeng et.al in [11]) 
 Exploiting search logs for identifying terms having similar end result 
 Search log data of <Query, Document> clicks 
 Graph Representation 
 Node Q: seed query 
Nodes Q’: queries in search log 
Nodes D: documents 
Nodes W: words that occur in queries and documents 
 Word nodes are the candidate expansion terms 
 Edges have scoring function 
 Represents probability of transition from start node 
to end node 
Search Log as Graph
Query Expansion 
Path Constrained Random Walk (Jiafeng et.al in [11]) 
 Probability of using w as an expansion word? 
 Product of probabilities in Paths starting at 
node Q and ending at w 
 Top probable words picked, obtained from 
random walk 
Search Log as Graph
Query Classification 
 Classifying given query in a predefined Intent Class 
 Ex. michael Jordan berkley: academic 
 Precise intent by sequence of nodes from root to leaf 
 More challenging than document classification 
 Short length 
 Keyword representation, makes more ambiguous 
 Ex. query “brazil germany” 
 Older basic techniques 
Example Taxonomy (Source: [6]) 
 Considering single query  statistical techniques like 2-gram/3-gram inference
Query Classification 
Context aware Query Classification (Huanhuan et.al in [6]) 
 Resolving ambiguity using context 
 Previous Queries ∈ sports, then “Michael Jordan”  sports (Basketball Player) 
 Previous Queries ∈ academic, then “Michael Jordan”  academic (ML professor) 
 Use of CRF (because training and prediction on sequence) 
 Local Features 
 Query Terms: Each 푞푡 supports a target category 
 Pseudo Feedback: 
 푞푡 with concept 푐푡, submitted to an external web directory 
 How many of top M results have 푐푡 concept? 
 Implicit Feedback: 
 Instead of Top M results – only the clicked documents taken
Query Classification 
Context aware Query Classification (Huanhuan et.al in [6]) 
 Contextual Features 
 Direct association between adjacent labels 
 Number of occurrences of adjacent labels < 푐푡−1, 푐푡 > 
 Higher weight  higher probability of transit from 푐푡−1to 푐푡 
 Taxonomy-based association between adjacent labels 
 Given pair of adjacent labels < 푐푡−1, 푐푡 > at level n 
 n-1 features of taxonomy-based association between 푐푡−1, 푐푡 considered 
 e.g. Computer/Software related to Computer/Hardware, matching at (n-1)th level  
Computer
Semantic Tagging 
 Identifies the semantic concepts of a word or phrase 
 [michael jordan: PersonName] [berkley: Location]: academic 
 Useful only if phrases in documents also tagged 
 Shallow Parsing Methods 
 Part of Speech Tags: e.g. Clubbing consecutive nouns for Named Entity Recognition 
 Disadvantage: Sentence Level Long Segments can’t be identified
Semantic Tagging 
 Hierarchical Parsing Structures 
 Trained a semi-Markov CRF on segments 
 Features 
 Syntactic Features 
Parse tree of sentence 
Plot
Semantic Tagging 
 Semantic Dependency Features 
 leverage the information about dependencies among different segments 
 Ex. “show me a funny movie starring Johnny and featuring Carribbean Pirates” 
 ‘Featuring’ takes arguments – “funny movie” and “Carribbean Pirates” 
 long distance semantic dependency between the object “movie” and attribute <Plot>
Conclusion & Future Work 
 End-to-End Discussion of Query Understanding Module Tasks 
 Semantic Understanding of queries for intent detection has lot of scope 
 Use of NL (grammatically correct) queries rising 
 Understanding at the structure level 
 User community detection for its application in Query Suggestion 
 Based on search behavior 
 Community/Topic specific temporal trending of search query
References 
[1] Makoto P. Kato, Takehiro Yamamoto, Hiroaki Ohshima and Katsumi Tanaka, "Cognitive 
Search Intents Hidden Behind Queries: A User Study on Query Formulations," in WWW 
Companion, Seoul, Korea, 2014. 
[2] Milad Shokouhi, "Learning to Personalize Query Auto-Completion," in SIGIR, Dublin, 
Ireland, 2013. 
[3] Weotta, "Deep Search," 10 6 2014. [Online]. Available: 
http://streamhacker.com/2014/06/10/deepsearch/. [Accessed 6 8 2014]. 
[4] W. Bruce Croft, Michael Bendersky, Hang Li and Gu Xu, "Query Understanding and 
Representation," SIGIR Forum, vol. 44, no. 2, pp. 48-53, 2010. 
[5] Jingjing Liu, Panupong Pasupat, Yining Wang, Scott Cyphers and Jim Glass, "Query 
Understanding Enhanced by Hierarchical Parsing Structures," in ASRU, 2013. 
[6] Huanhuan Cao, Derek Hao Hu, Dou Shen and Daxin Jiang, "Context-Aware Query 
Classification," in SIGIR, Boston, Massachusetts, USA, 2009.
References (Continued…) 
[7] Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, Hang Li, 
"Context-Aware Query Suggestion by Mining Click-Through and Session Data," in 
KDD, Las Vegas, Nevada, USA, 2008. 
[8] Zhen Liao, Yang Song, Li-wei He and Yalou Huang, "Evaluating the Effectiveness of 
Search Task Trails," in WWW, Lyon, France, 2012. 
[9] Allan, Henry Feild and James, "Task-Aware Query Recommendation," in SIGIR, 
Dublin, Ireland, 2013. 
[10] Jiafeng Guo, Gu Xu, Hang Li and Xueqi Cheng, "A Unified and Discriminative 
Model for Query Refinement,“ in SIGIR, Singapore, 2008. 
[11] Jianfeng Gao, Gu Xu and Jinxi Xu, "Query Expansion Using Path-Constrained 
Random Walks," in SIGIR, Dublin, Ireland, 2013.
Techniques For Deep Query Understanding

More Related Content

PDF
Query Understanding at LinkedIn [Talk at Facebook]
PPTX
Better Search Through Query Understanding
PDF
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
PPTX
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
PPTX
Learn to Rank search results
PDF
AI, Search, and the Disruption of Knowledge Management
PPTX
Reflected Intelligence: Lucene/Solr as a self-learning data system
PDF
Instant search - A hands-on tutorial
Query Understanding at LinkedIn [Talk at Facebook]
Better Search Through Query Understanding
Fast, Lenient, and Accurate – Building Personalized Instant Search Experience...
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Learn to Rank search results
AI, Search, and the Disruption of Knowledge Management
Reflected Intelligence: Lucene/Solr as a self-learning data system
Instant search - A hands-on tutorial

What's hot (20)

PPTX
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
PDF
Haystack- Learning to rank in an hourly job market
PPTX
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
PPTX
Interleaving, Evaluation to Self-learning Search @904Labs
PPTX
Dice.com Bay Area Search - Beyond Learning to Rank Talk
PDF
Optimizing Search User Interfaces and Interactions within Professional Social...
PDF
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
PPTX
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
PPTX
Understanding Queries through Entities
PDF
Enhancing relevancy through personalization & semantic search
PDF
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
PDF
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
PDF
Semantic & Multilingual Strategies in Lucene/Solr
PDF
Haystacks slides
PDF
In3415791583
PDF
Enterprise Search – How Relevant Is Relevance?
PDF
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
PPTX
The Apache Solr Semantic Knowledge Graph
PDF
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
PDF
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Haystack- Learning to rank in an hourly job market
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Interleaving, Evaluation to Self-learning Search @904Labs
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Optimizing Search User Interfaces and Interactions within Professional Social...
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Haystack 2018 - Algorithmic Extraction of Keywords Concepts and Vocabularies
Understanding Queries through Entities
Enhancing relevancy through personalization & semantic search
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Semantic & Multilingual Strategies in Lucene/Solr
Haystacks slides
In3415791583
Enterprise Search – How Relevant Is Relevance?
Rated Ranking Evaluator Enterprise: the next generation of free Search Qualit...
The Apache Solr Semantic Knowledge Graph
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Ad

Viewers also liked (20)

PDF
Location aware keyword query suggestion based on document proximity
PDF
Bondia Lleida 10112011
PPTX
C++ TRAINING IN AMBALA CANTT! BATRA COMPUTER CENTER
PDF
Nuevos caudales cinta de riego aqua-traxx
PPS
Hipertension
PDF
OCSJX-14 Fortifier Version 2
PDF
Biciplan Monterrey - diagnóstico-biciplan
PPT
Cambios psicobiológicos en la adolescencia
PPTX
2010 - Developer look at the Client Object Model
PDF
Open Source CRM Systeme im Vergleich - echolot digital worx
PDF
Catálego VC Farma
PDF
EET Specifikace projektu final_v22
PDF
Escala gencatmanualcast
PPT
PDF
Rutas de Cantabria: Sendero de Labra
PDF
whiteWfd in vietnam
PDF
PDF
New York Times en Espanol Edicion Prensa Libre
XLS
Directorio instancias municipales 2011
Location aware keyword query suggestion based on document proximity
Bondia Lleida 10112011
C++ TRAINING IN AMBALA CANTT! BATRA COMPUTER CENTER
Nuevos caudales cinta de riego aqua-traxx
Hipertension
OCSJX-14 Fortifier Version 2
Biciplan Monterrey - diagnóstico-biciplan
Cambios psicobiológicos en la adolescencia
2010 - Developer look at the Client Object Model
Open Source CRM Systeme im Vergleich - echolot digital worx
Catálego VC Farma
EET Specifikace projektu final_v22
Escala gencatmanualcast
Rutas de Cantabria: Sendero de Labra
whiteWfd in vietnam
New York Times en Espanol Edicion Prensa Libre
Directorio instancias municipales 2011
Ad

Similar to Techniques For Deep Query Understanding (20)

PPTX
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
PPTX
Towards Automatic Analysis of Online Discussions among Hong Kong Students
PDF
Répondre à la question automatique avec le web
PDF
Answer Extraction for how and why Questions in Question Answering Systems
PPTX
PPTX
Query formulation process
PDF
Question Focus Recognition in Question Answering Systems
PDF
Information_Retrieval_Models_Nfaoui_El_Habib
PDF
DBPEDIA BASED FACTOID QUESTION ANSWERING SYSTEM
PDF
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
PDF
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
PDF
Question Classification using Semantic, Syntactic and Lexical features
PDF
Question Classification using Semantic, Syntactic and Lexical features
PPTX
Neural Models for Information Retrieval
PPTX
Machine Learned Relevance at A Large Scale Search Engine
PDF
Learning To Rank User Queries to Detect Search Tasks
PPT
CIKM Tutorial 2008
PDF
E017252831
PDF
Extraction of Data Using Comparable Entity Mining
PPTX
Question Answering System using machine learning approach
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Towards Automatic Analysis of Online Discussions among Hong Kong Students
Répondre à la question automatique avec le web
Answer Extraction for how and why Questions in Question Answering Systems
Query formulation process
Question Focus Recognition in Question Answering Systems
Information_Retrieval_Models_Nfaoui_El_Habib
DBPEDIA BASED FACTOID QUESTION ANSWERING SYSTEM
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
A Review on Novel Scoring System for Identify Accurate Answers for Factoid Qu...
Question Classification using Semantic, Syntactic and Lexical features
Question Classification using Semantic, Syntactic and Lexical features
Neural Models for Information Retrieval
Machine Learned Relevance at A Large Scale Search Engine
Learning To Rank User Queries to Detect Search Tasks
CIKM Tutorial 2008
E017252831
Extraction of Data Using Comparable Entity Mining
Question Answering System using machine learning approach

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Institutional Correction lecture only . . .
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Classroom Observation Tools for Teachers
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Pharma ospi slides which help in ospi learning
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
RMMM.pdf make it easy to upload and study
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Lesson notes of climatology university.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
master seminar digital applications in india
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
VCE English Exam - Section C Student Revision Booklet
Institutional Correction lecture only . . .
O7-L3 Supply Chain Operations - ICLT Program
Abdominal Access Techniques with Prof. Dr. R K Mishra
Classroom Observation Tools for Teachers
GDM (1) (1).pptx small presentation for students
Pharma ospi slides which help in ospi learning
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
RMMM.pdf make it easy to upload and study
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Lesson notes of climatology university.
Final Presentation General Medicine 03-08-2024.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
master seminar digital applications in india
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Cell Structure & Organelles in detailed.
O5-L3 Freight Transport Ops (International) V1.pdf
Microbial diseases, their pathogenesis and prophylaxis

Techniques For Deep Query Understanding

  • 1. Techniques for Deep Query Understanding “Beware of the man who knows the answer before he understands the question” Guided By: Dr. Dhaval Patel, Assistant Professor, Department Of CSE, IIT Roorkee. Presented By: Abhay Prakash, En. No. – 10211002, CSI, V Year, IIT Roorkee.
  • 2. (Source: Google) Introduction: Query Understanding  Purpose:  To understand what exactly the user is searching for – his precise intent  To correct mistakes and guide user to formulate a precise intended query Query Refinement Why only this phrase in Bold? (Source: Google) Query Suggestion
  • 3. Emerging Variety of Queries  Natural Language Queries instead of Keyword Represented Queries  “who is the best classical singer in India” instead of “best classical singer India”  Use of NL Queries increasing (Makoto et.al in [1])  Local Search Queries  “Where can I eat cheesecake right now?”  Context Dependent Queries (Interactive Question Answering (Source: Bing – location set as US)
  • 4. Background: How results are generated Query Understanding (which index parameters to be used) Review: Hotel ABC, Civil Lines: I ate cheesecake, which was really awesome. (4/5 star) High Level Architecture of Search Mechanism (Source: Self Made) INDEX (Knowledge Base) Document Understanding (What and how to Index) User Query Results Ranking Entities: Hotel ABC, cheesecake Location: Civil Lines Quality: 0.8 Time: 8:15 PM “where can I eat cheesecake right now?” Data (Text documents, User Reviews, Blogs, Tweets, Linkedin …) [Time: 8:15 PM] Intent: Hotel Search Search for: cheesecake Location: Civil lines Time: 8:20 PM
  • 5. Background: QU & Adv. In Search (Weotta in [3]) 1. Basic Search  Direct text match based retrieval of documents  Restrict search space using facet values provided by user  Current day example: Online shopping sites Mechanism in Basic Search (Source: Self Made) Example of Facets (Source: Flipkart.com)
  • 6. Background: QU & Adv. In Search (Weotta in [3]) 2. Advanced Search  Ranking of result documents based on:  TF-IDF to identify more relevant documents  Website authority and popularity  Keyword weighting  Not Considered:  Context, NLP for semantic understanding  Location of query, time of query  Example: Google as was in its early stage
  • 7. Background: QU & Adv. In Search (Weotta in [3]) 3. Deep Search  What difference does it bring?  Requirements:  Semantic Understanding of Query  Knowledge of Context, previous tasks  User Understanding and Personalization
  • 8. Architecture: Query Understanding Module Query Query Suggestion Query Correction Query Expansion Query Classification Semantic Tagging 2. Query Refinement 3. Query Intent Detection QUERY UNDERSTASNDING MODULE ANSWER GENERATION MODULE 1. Query Suggestion Result Components of Query Understanding Module (Source: Self Made)
  • 9. Architecture: Query Understanding Module Query i) michael jordan berkley ii) michael jordan NBA Query Suggestion Query Correction Query Expansion i) michael jordan berkley: academic ii) michael l. Jordan Berkley: academic Query Classification Semantic Tagging Example of purpose of each Component (Source: Self Made) michal jrdan michael jordan i) michael jordan berkley ii) michael l. jordan berkley i) [michael jordan: PersonName] [berkley: Location]: academic ii) [michael l. jordan: PersonName] [berkley: Location]: academic
  • 10. Query Correction  Reformulates the ill-formed (mistaken) search queries  ex. Macine learning  Machine Learning  Refinements:  Spelling error, Two words Merged together, One word separated  Phrase segmentation (machine + learning  machine learning)  Acronym Expansion (CSE  Computer Science & Engineering)  Refinement may be mutually dependent  “lectures on machne learn”  learn is a correct term, but should have been learning  Hence, different terms need to be addressed simultaneously
  • 11.  Problem Modeled by Jiafeng et.al in [10] as  Original Query (푥 = 푥1 푥2 . . . 푥푛)  Corrected Query (푦 = 푦1 푦2 . . . 푦푛)  Get y(complete sequence) which has maximum probability of occurrence, given the sequence x.  Simple Technique  Assume terms independent  take 푦푖 with max Pr 푦푖 푥푖  Prime Disadvantage:  Reality deviates a lot from assumption  Ex. “Lectures on machine learning” Independent Corrections Query Correction
  • 12. Query Correction Using Conventional CRF  What is CRF?  Probabilistic graphical model, models conditional distribution of unobserved state sequences  Trained on given observation sequence  Trained for getting Pr 푦 푠푒푞푢푒푛푐푒 푥  Why use CRF? Conditioned on?  Sequence of words matters (learning machine?)  푦푖 conditioned on other 푦푖s as well, along with 푥푖  Corrections are mutually dependent (e.g. machine learning)  Disadvantage:  Will require very large amount of data, 푦푖 candidates’ domain open Conventional CRF
  • 13.  Restricting space of y for the given x  conditioned 푦푖 on operation also  표 = 표1 표2 … 표푛, such that 표푖 required to get 푦푖 from 푥푖  표푖 is operation like deletion, insertion of characters, etc.  Learning and Prediction  Dataset of (푥(1), 푦(1), 표(1)), . . . , (푥(푁), 푦(푁), 표(푁))  Features  log Pr(푦푖−1|푦푖 ), where the prob. calculated using corpus  Whether 푦푖 푖푠 표푏푡푎푖푛푒푑 푓푟표푚 푥푖 푎푓푡푒푟 표푝푒푟푎푡푖표푛 표푖 --{0|1} Basic CRF-QR Model Query Correction Basic CRF-QR Model (Jiafeng et. al in [10])
  • 14.  What is new?  Handles scenario with more than one refinements  Machine learm  learn  learning  Sequence of (sequence of operation)  표 = 표푖,1, 표푖,2, . . . 표푖,푛 i.e. multiple operations on each word  Intermediate results: 푧푖 = 푧푖,1푧푖,2 . . . 푧푖,푚−1 Extended CRF-QR Query Correction Extended CRF-QR Model (Jiafeng et. al in [10])
  • 15. Query Suggestion  Purpose:  Suggest similar queries  Query auto-completion  Requirements  Context consideration [7]  Identifying Interleaved Tasks [9]  Personalized suggestion [2] Suggestions on “iit r..”
  • 16. Context aware Query Suggestion (Huanhuan et.al in [7]) Query Suggestion Mechanism (Source: [7]) Query Suggestion  Query – mapped  Concept  Concept Suffix tree from log  Suggestion time: Transition on tree with each query’s concept  Suggest top queries of that state
  • 17. Concept Suffix Tree  Concept Discovery  Queries clustered using set of clicked URLs  Feature vector 푞푖 = 푛표푟푚 푤푖푗 푖푓 푒푑푔푒 푒푖푗 푒푥푖푠푡푠 0 표푡ℎ푒푟푤푖푠푒  Each identified cluster is taken as a Concept  Concept Suffix Tree  Vertex: state after transition through a sequence of concepts (of queries)  Transition in a session  C2C3C1: transition Beginning  C1  C3  C2 Click-Through Bipartite Query Suggestion Context aware Query Suggestion (Huanhuan et.al in [7])
  • 18. Query Suggestion Task aware Query Suggestion (Allan et.al in [9])  Why task identification Important?  Considering Off-Task query in context adversely affect quality of recommendation  30% sessions contained multiple tasks (Zhen et.al in [8])  5% sessions have Interleaved tasks (Zhen et.al in [8])  Identify similar previous queries as On-Task  consider only On-Task queries as context Effect of On-Task and Off-Task queries
  • 19. Query Suggestion Task aware Query Suggestion (Allan et.al in [9])  Measures to evaluate similarity between two queries  Lexical Score: captures similarity at word level directly. Average of:  Jaccard Coefficient between trigrams from the two queries: how many common trigrams?  (1 - Levenshtein Edit Distance), which shows closeness at word level  Semantic Score: maximum of the following two  푠푤푖푘푖푝푒푑푖푒푎(푞푖 , 푞푗 ): cosine similarity of vector of tf-idf score of Wikipedia documents w.r.t the two queries.  푠푤푖푘푡푖표푛푎푟푦 (푞푖 , 푞푗 ): similar to above on Wiktionary entries  Final Similarity(풒풊, 풒풋) = 휶 . Lexical Score + (1-휶) . Semantic Score  If Similarity(푞푖, Reference_q) greater than threshold  푞푖 is On-Task Query
  • 20. Query Suggestion Personalization in Query Suggestion (Milad et.al in [2])  On character hit of ‘i’  “Instagram” more popular for female below 25  “Imdb” more popular for male in 25-44.  Candidate queries generated by prior general method  Personalization by re-ranking candidate queries  Features for feedback earlier global rank  Original position  Original score  Short History Features  3-Gram similarity with just previous query  Avg. 3-gram similarity with all previous queries in the session
  • 21. Query Suggestion Personalization in Query Suggestion (Source: [2])  Long History Features  No. of times candidate query issued in past  Avg. 3-gram similarity with all previous queries in the past  Demographic Features  Candidate query frequency over queries by same age group  Candidate query likelihood -- same age group  Candidate query frequency -- same gender group  Candidate query likelihood -- same gender group  Candidate query frequency -- same region group  Candidate query likelihood -- same region group
  • 22. Query Expansion  Sending more words (should generate similar result) to tackle term-miss  Ex. “Tutorial lecture on ABC”  “Video Lecture on ABC”  Expansion Tasks:  Adding synonyms of words  Morphological words by stemming  Naïve Approach  Exhaustive lookup in thesaurus  Time taking  Still miss terms of similar intent (terms even semantically far)
  • 23. Query Expansion Path Constrained Random Walk (Jianfeng et.al in [11])  Exploiting search logs for identifying terms having similar end result  Search log data of <Query, Document> clicks  Graph Representation  Node Q: seed query Nodes Q’: queries in search log Nodes D: documents Nodes W: words that occur in queries and documents  Word nodes are the candidate expansion terms  Edges have scoring function  Represents probability of transition from start node to end node Search Log as Graph
  • 24. Query Expansion Path Constrained Random Walk (Jiafeng et.al in [11])  Probability of using w as an expansion word?  Product of probabilities in Paths starting at node Q and ending at w  Top probable words picked, obtained from random walk Search Log as Graph
  • 25. Query Classification  Classifying given query in a predefined Intent Class  Ex. michael Jordan berkley: academic  Precise intent by sequence of nodes from root to leaf  More challenging than document classification  Short length  Keyword representation, makes more ambiguous  Ex. query “brazil germany”  Older basic techniques Example Taxonomy (Source: [6])  Considering single query  statistical techniques like 2-gram/3-gram inference
  • 26. Query Classification Context aware Query Classification (Huanhuan et.al in [6])  Resolving ambiguity using context  Previous Queries ∈ sports, then “Michael Jordan”  sports (Basketball Player)  Previous Queries ∈ academic, then “Michael Jordan”  academic (ML professor)  Use of CRF (because training and prediction on sequence)  Local Features  Query Terms: Each 푞푡 supports a target category  Pseudo Feedback:  푞푡 with concept 푐푡, submitted to an external web directory  How many of top M results have 푐푡 concept?  Implicit Feedback:  Instead of Top M results – only the clicked documents taken
  • 27. Query Classification Context aware Query Classification (Huanhuan et.al in [6])  Contextual Features  Direct association between adjacent labels  Number of occurrences of adjacent labels < 푐푡−1, 푐푡 >  Higher weight  higher probability of transit from 푐푡−1to 푐푡  Taxonomy-based association between adjacent labels  Given pair of adjacent labels < 푐푡−1, 푐푡 > at level n  n-1 features of taxonomy-based association between 푐푡−1, 푐푡 considered  e.g. Computer/Software related to Computer/Hardware, matching at (n-1)th level  Computer
  • 28. Semantic Tagging  Identifies the semantic concepts of a word or phrase  [michael jordan: PersonName] [berkley: Location]: academic  Useful only if phrases in documents also tagged  Shallow Parsing Methods  Part of Speech Tags: e.g. Clubbing consecutive nouns for Named Entity Recognition  Disadvantage: Sentence Level Long Segments can’t be identified
  • 29. Semantic Tagging  Hierarchical Parsing Structures  Trained a semi-Markov CRF on segments  Features  Syntactic Features Parse tree of sentence Plot
  • 30. Semantic Tagging  Semantic Dependency Features  leverage the information about dependencies among different segments  Ex. “show me a funny movie starring Johnny and featuring Carribbean Pirates”  ‘Featuring’ takes arguments – “funny movie” and “Carribbean Pirates”  long distance semantic dependency between the object “movie” and attribute <Plot>
  • 31. Conclusion & Future Work  End-to-End Discussion of Query Understanding Module Tasks  Semantic Understanding of queries for intent detection has lot of scope  Use of NL (grammatically correct) queries rising  Understanding at the structure level  User community detection for its application in Query Suggestion  Based on search behavior  Community/Topic specific temporal trending of search query
  • 32. References [1] Makoto P. Kato, Takehiro Yamamoto, Hiroaki Ohshima and Katsumi Tanaka, "Cognitive Search Intents Hidden Behind Queries: A User Study on Query Formulations," in WWW Companion, Seoul, Korea, 2014. [2] Milad Shokouhi, "Learning to Personalize Query Auto-Completion," in SIGIR, Dublin, Ireland, 2013. [3] Weotta, "Deep Search," 10 6 2014. [Online]. Available: http://streamhacker.com/2014/06/10/deepsearch/. [Accessed 6 8 2014]. [4] W. Bruce Croft, Michael Bendersky, Hang Li and Gu Xu, "Query Understanding and Representation," SIGIR Forum, vol. 44, no. 2, pp. 48-53, 2010. [5] Jingjing Liu, Panupong Pasupat, Yining Wang, Scott Cyphers and Jim Glass, "Query Understanding Enhanced by Hierarchical Parsing Structures," in ASRU, 2013. [6] Huanhuan Cao, Derek Hao Hu, Dou Shen and Daxin Jiang, "Context-Aware Query Classification," in SIGIR, Boston, Massachusetts, USA, 2009.
  • 33. References (Continued…) [7] Huanhuan Cao, Daxin Jiang, Jian Pei, Qi He, Zhen Liao, Enhong Chen, Hang Li, "Context-Aware Query Suggestion by Mining Click-Through and Session Data," in KDD, Las Vegas, Nevada, USA, 2008. [8] Zhen Liao, Yang Song, Li-wei He and Yalou Huang, "Evaluating the Effectiveness of Search Task Trails," in WWW, Lyon, France, 2012. [9] Allan, Henry Feild and James, "Task-Aware Query Recommendation," in SIGIR, Dublin, Ireland, 2013. [10] Jiafeng Guo, Gu Xu, Hang Li and Xueqi Cheng, "A Unified and Discriminative Model for Query Refinement,“ in SIGIR, Singapore, 2008. [11] Jianfeng Gao, Gu Xu and Jinxi Xu, "Query Expansion Using Path-Constrained Random Walks," in SIGIR, Dublin, Ireland, 2013.

Editor's Notes

  • #3: Things that happen Corrected mistaken term
  • #4: Intent and target entity is dependent on the context
  • #5: Add the animated structure for deep indexing and deep querying over that
  • #6: Text match Restricted document facets
  • #10: Mistaken query Corrected suggested picked first expanded …. Segments tagged with location, person name
  • #11: Reformulates by correction Segmentation  collectively have meaning learn is correct but …ing
  • #12: Original sequence of terms x and have to obtain another sequence of corrected terms y, such that prob(y|x)
  • #13: Models cond. Dist of unobserved sequence based on observed sequences Learning machine mean something else
  • #14: Restrict by allowed operations – insertion, deletion So defined another sequence o, in which oi is required to get yi from xi
  • #15: Extended handle scenarios multiple correction
  • #17: Query mapped to a concept Learn Concept Trail from log
  • #18: How mapping of query to concept
  • #19: Why task id imp? For that let us see ----- on task ------ off task Because allan zhen gave stats that ----- and allan showed that performance ……..
  • #20: Lexical score ----------- semantic score Lexical : jaccard coeff of 3gram…… closeness by leven.. edit distance Smantic: Wikipedia ….. wiktionary
  • #21: Ex by milad Candidate by general ranked for personalize Original ----- position score Personal ----- short 3 g prev, 3 g with all prev
  • #22: Long hist count for same 3g with all Demographic --- identify group behaviour
  • #23: Similar words
  • #25: Prob by multiply all scores on path
  • #26: Predefined classes
  • #27: Huanhuan showed use of context to classify
  • #28: Software and Hardware not same, but same parent so must be related
  • #30: Root Node for segment
  • #31: understand