Error analysis of Word Sense Disambiguation

Error analysis ofWord
Sense Disambiguation
Ruben Izquierdo
Marten Postma
PiekVossen
Izquierdo,PostmaandVossen
VUAmsterdam

Motivation
 Word Sense Disambiguation is still an unsolved problem
2 Izquierdo, Postma and Vossen VU Amsterdam

Error Analysis
 Perform error analysis on previousWSD evaluations to prove
our hypothesis
 Senseval-2: all-words task
 Senseval-3: all-words task
 Semeval2007: all-words task (#17)
 Semeval2010: all-words on specific domain (#17)
 Semeval2013: multilingual all-wordsWSD and entity linking
(#12)

Motivation
 Some “propagated” errors
 Errors on monosemous
 Errors because pos-tags
 Multiwords and phrasal verbs
 Little attention has been paid to the real problem
 WSD is not 1 problem but N problems
 Our hypothesis
 Context is not modeled properly in general
 System rely too much on the most frequent sense

Monosemous errors

Monosemous errors
Competition Monosemous Wrong Examples
Senseval2 499 (20.9%) 37.5% gene.n (suppressor_gene.n), chance.a
(chance.n) next.r (next.a)
Senseval3 334 (16.6%) 44.1% Datum.n (data.n) making.n (make.v)
out_of_sight (sight)
Semeval2007 25 (5.5%) 11.1% get_stuck.v, lack.v, write_about.v
Semeval2010 31 (2.2%) 97.9% Tidal_zone.n pine_marten.n roe_deer.n
cordgrass.n
Semeval2013
(lemmas)
348 (21.1%) 1.9% Private_enterprise, developing_country,
narrow_margin

Most Frequent Sense

Most Frequent Sense
 When the correct sense is NOT the most frequent sense
 Systems still assign mostly the MFS
 Senseval2
 799 tokens are not MFS
 84% systems still assign the MFS
 Most “failed” words due to MFS bias
 Senseval2, senseval3
 Say.v find.v take.v have.v cell.n church.n
 Semeval2010
 Area.n nature.n connection.n water.n population.n

Analysis per PoS-tag

Analysis per polysemy class
2Senses
Poly. C.
6 15
Low Medium High

Analysis per frequency class

Most difficult words

Expected vs. Observed
difficulties
 Calculate per sentence
 The “expected” difficulty
 Average polysemy, sentence length, average word length

 Average polysemy, sentence length, average word length
difficulties

 Average polysemy, sentence length, average wor length
 The “observed” difficulty
 From the real participant outputs, average error rate
 We should expect:
harder sentences  higher error rate
easier sentences   lower error rate
difficulties

difficulties

• The context is not (probably) exploited properly
• Expected “easy” sentences SHOULD show low error rates
• Occurrences of the same word in different contexts have similar error
rate
• The difficulty of a word depends more on its polysemy than on the
context where it appears
difficulties

WSD Corpora
http://github.com/rubenIzquierdo/wsd_corpora

WSD Corpora

System Outputs
https://github.com/rubenIzquierdo/sval_systems

System Outputs

Error analysis of
Word Sense Disambiguation
Ruben Izquierdo
Marten Postma
PiekVossen
ruben.izquierdobevia@vu.nl
http://github.com/rubenIzquierdo/wsd_corpora
http://github.com/rubenIzquierdo/sval_systems
23

Analysis per PoS-tag

Error analysis of Word Sense Disambiguation

More Related Content

Viewers also liked (20)

More from Rubén Izquierdo Beviá (17)

Recently uploaded (20)

Error analysis of Word Sense Disambiguation

Editor's Notes