Part-of-speech Tagging for Web Search Queries Using a Large-scale Web Corpus

◯Atsushi Keyaki†, Jun Miyazaki†
†: Tokyo Institute of Technology,
Japan
Part-‐‑of-‐‑speech Tagging for
Web Search Queries using a
Large-‐‑scale Web Corpus
SAC2017 IAR

Objective
•  Accurate part-of-speech (POS) tagging to Web
queries
o POS tags are beneficial in accurate IR
•  Different search strategy per POS tag [1]
•  Identifying unnecessary data with POS tags [2]
o Example
•  Query: “discovery channel”
•  Doc: “Victim’s discovery is broadcasted by the channel”
2
[1] Crestani et al.: “Short Queries, Natural Language and Spoken Document
Retrieval: Experiments at Glasgow University”, TREC-‐‑6, 1998.
[2] Chowdhury and Mccabe: “Improving Information Retrieval Systems using
Part of Speech Tagging”, Univ. of Maryland, 1993.
POS tag mismatch may cause false positive
TV program (proper nouns)
common noun common
noun

Diﬃculty in query POS tagging
•  Characteristics of Web query
o  Length is short (composed of a few words)
o  Capitalization is missing
o  Word order is fairly free
•  Solution of related work [3][4]
o  Utilizing the results of sentence-level morphological analysis
•  Sentences are based on natural language grammar
•  Results of sentence-level morphological analysis are accurate
3
Diﬃcult to correctly identify POS tags
with existing morphological analysis tool
[3] Bendersky et al.: "ʺStructural Annotation of Search Queries Using Pseudo
Relevance Feedback"ʺ, CIKM2010.
[4] K. Ganchev et al.: "ʺUsing Search-‐‑Logs to Improve Query Tagging"ʺ, ACL2012.
developed for
natural language
Sentence: “We stayed at Rif Carlton.”
Query : “rif carlton”

4
developed for
natural language
pronoun verb particle proper noun
Query : “rif carlton”

5
proper nounQuery : “rif carlton”
developed for
natural language

6
proper nounQuery : “rif carlton”
developed for
natural language
Frequently
assigned POS tag
is employed

Our approach
•  Related study
o Using sentence-level morphological analysis of
•  Search results [3]
•  Snippet from search logs [4]
o Considering just freq. of assigned POS tags
•  Our approach
o Taking account of global statistics from large corpus
•  Easily available, considering long tail
o Considering co-occurrence of query terms
April 5, 2017SAC2017 IAR 7
A small number of highly relevant information
User feedback/search log is not always available

Preliminary investigation
•  Morphological analysis to Web queries
o Queries
•  TREC Web track topics (200 queries from 2009-2012)
o  Oracle POS tags are annotated by three assessors
o  Referring to description (information need)
o Morphological analysis tool
•  Stanford Log-linear Part-Of-Speech Tagger [5]
o Model
•  Default model
•  Caseless model
o  Not consider capitalization information during training
o  Try to solve “Capitalization is missing” problem
[5] Toutanova et al.: "ʺFeature-‐‑Rich Part-‐‑of-‐‑Speech Tagging
with a Cyclic Dependency Network"ʺ, NAACL 2003.
High agreement
Kappa: 0.98

Summary of error analysis
•  Default model
o  Only half of query terms were assigned correct POS tags
o  Almost all of proper nouns were NOT identified
•  72% of proper nouns are mistakenly assigned as common nouns
•  Error: “obama”, “india”, “ritz carlton”, “discovery channel”
•  Caseless model
o  Around 75% of query terms were assigned correct POS
tags
o  Many proper nouns were identified
•  Common nouns are mistakenly identified as proper nouns
•  Errors caused by a partial grammatical rule
o  “lower heart rate”
o  “gs pay rate”
verb adjective
common noun verb
: Adjectives come before common nouns
: Verbs come after a subject

Proposed POS tagging
•  Summary of the error analysis
o  Proper nouns/common nouns cannot be identified
•  Problem1: Capitalization is missing
o  Grammatical rules are mistakenly applied
•  Problem2: Word order is fairly free
•  Related study
o  A small num. of highly relevant information
•  Problem3: User feedback and user log are not always available
•  Approach
o  Sol-P1: Sentence-level morphological analysis
o  Sol-P2: Proposing a POS tagging not based on word order
o  Sol-P3: Large-scale Web corpus (easily available)
o  Building the term-POS database (TPDB)
•  Morphological analysis are applied offline

Processing ﬂow
Large-scale
Web corpus
S1 tA/P1 tB/P2 tC/P3tA tB tC
tA tC tD
tC tE tA tF
tA/P1 tC/P4 tD/P5
tC/P3 tE/P1 tA/P2 tF/P1
tB tD tB/P2 tD/P3
Morphological
analysis
S2
S3
S4
S1
S2
S3
S4
TPDB
tA/P1 tB/P2 tC/P3
tA/P1 tC/P4 tD/P5
tC/P3 tE/P1 tA/P2 tA/P1
S1
S2
S3
tA tC Query
tA/P1 tC/P3
tA/P1 tC/P4
Scoring
method
Oﬄine Online
Insert

Scoring for POS tagging
•  Design principle
o  Frequently appearing POS tags in the corpus are assigned to queries
o  POS tags of a sentence are emphasized when the sentence contains
more kinds of query terms
•  Co-occurrence of query terms is a useful clue
•  Step of scoring
o  Retrieving entries which contain query terms from TPDB
o  Braking down into pairs of query terms
•  Query: “tA tB tC”
o  Counting entries per the term-POS pairs for each query term pair
•  Query term pair: {tA tB}
o  Scoring with three proposed methods
April 5, 2017 12
{tA tB} {tA tC} {tB tC}
tA/P1 tB/P2 5 0.33 (5/15)
tA/P1 tB/P3 3 0.20 (3/15)
tA/P2 tB/P4 7 0.47 (7/15)
freq. normalized freq. num. of entries
containing
tA/P1 and tB/P2

Three proposed methods
•  MaxFreq
o  The most frequently appearing
POS tag (highest freq.) is assigned
•  MostLikelihood
o  The highest normalized freq. is
assigned
o  MaxFreq may be affected by
frequently appearing terms
•  AllCombi
o  POS tag of the highest sum of the
term-POS frequency is assigned
o  MaxFreq and MostLikelihood
only focus on a POS tag with the
highest frequency/normalized
frequency
o  More diversified context including
long tail can be considered
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.

•  MaxFreq
assigned
•  AllCombi
frequency
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
tA/P2
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57

•  MaxFreq
assigned
•  AllCombi
frequency
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57

•  MaxFreq
assigned
•  AllCombi
frequency
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
tA/P3
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57

•  MaxFreq
assigned
•  AllCombi
frequency
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
tA/P1
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57

•  MaxFreq
assigned
•  AllCombi
frequency
tB:tC
tB/P1 tC/P2 5 0.5
tB/P2 tC/P2 5 0.5
freq.
normalized
freq.
tA/P1
Query:
“tA tB tC”
tA:tB
tA/P1 tB/P2 5 0.33
tA/P1 tB/P3 3 0.20
tA/P2 tB/P4 7 0.47
tA:tC
tA/P1 tC/P2 3 0.43
tA/P3 tC/P3 4 0.57
11

Experiment
•  Datasets
o  TREC Web track topics
•  200 queries from 2009-2012
o  MS-251
•  Microsoft search log used in related studies [3][4]
•  Large-scale Web corpus
o  ClueWeb09 Category B
•  50 million Web documents
•  Evaluation methods
o  Proposed methods: MaxFreq, MostLikelihood, AllCombi
o  Existing methods: Stanford, Caseless, SingleFreq
The most frequently appearing POS tag is assigned
Skip because the trend is the same

POS-‐‑tagged Web track topics
•  AllCombi: the highest for all terms, common noun, proper noun
o  Good at judging nouns
o  Considering more diversified context is useful
•  Global statistics from large-scale Web corpus is useful
•  MaxFreq and MostLikelihood: the highest for common noun, verb,
adjective
•  Every proposed method significantly outperformed (VS Caseless)
Precision All query
terms
Common
noun
Proper
noun
Verb Adjective sign test with
Caseless
MaxFreq .814 .825 .833 .769 .647 p < 0.05
MostLikelihood .814 .825 .833 .769 .647 p < 0.05
AllCombi .821 .825 .860 .714 .629 p < 0.01
Caseless .763 .789 .751 .733 .690
SingleFreq .702 .775 .670 .533 .581
Stanford .547 .550 1.0 .722 .451

Eﬀect of the proposed method
•  AllCombi correctly identified many query terms
•  Some errors by partial grammatical rules still remain
•  Negative effects of the proposed method
o  “president” in the corpus are often identified as proper
nouns
•  Need to normalize term weights
Query Stanford AllCombi
obama
india
rif carlton
lower heart rate
gs pay rate
president united states

Conclusion
•  POS tagging to Web queries
o  Results of sentence-level morphological analysis
o  Large-scale Web corpus
o  Proposed three scoring methods
•  Experiments
o  Considering more diversified context is useful
o  The best proposed method differs by POS tag
o  Overwhelmed existing tools and existing studies
•  Future work
o  Combination of proposed methods may improve accuracy
o  Database schema design for fast POS tagging

Default model
POS tags Precision Recall
Common noun .550 .985
Proper noun 1.0 .010
Verb .722 .867
Adjective .451 .958
All query terms .547 .547
•  Nearly half of query terms
were assigned correct POS tags
•  Almost all of proper nouns
were not identified
o  72% of proper nouns are
mistakenly assigned as common
nouns
o  Error: “obama”, “india”, “ritz
carlton”, “discovery channel”
•  Errors caused by a partial grammatical rule
o  “lower heart rate”
o  “gs pay rate”
verb adjective
common noun verb
: Adjectives come before common nouns
: Verbs come after a subject

Caseless model
•  Precision and recall improved overall
•  Many proper nouns were identified
o  31% of proper nouns are mistakenly assigned as common nouns
o  Precision is decreased
•  Harm of partial grammatical rules still exist
o  “discovery channel store”
common noun proper noun
POS tags Precision Recall
Common noun .789 .769
Proper noun .751 .640
Verb .733 .733
Adjective .690 .833
All query terms .763 .763

MS-‐‑251
•  The trend of the proposed methods is the same
o The ratio of POS tags affected the order
•  AllCombi
•  MaxFreq, MostLikelihood
o The proposed methods are better than [4]
Precision
MaxFreq .890
MostLikelihood .895
AllCombi .893
the best method in [4] .858
Good at judging nouns
Good at judging verb, adjective

Part-of-speech Tagging for Web Search Queries Using a Large-scale Web Corpus

More Related Content

What's hot (17)

Similar to Part-of-speech Tagging for Web Search Queries Using a Large-scale Web Corpus (20)

Recently uploaded (20)

Part-of-speech Tagging for Web Search Queries Using a Large-scale Web Corpus