SlideShare a Scribd company logo
Recognising and Interpreting
Named Temporal Expressions
Matteo Brucato
Leon Derczynski
Hector Llorens
Kalina Bontcheva
Christian S. Jensen
How do we talk about times?
● Calendar
● Closed class of terms
– tomorrow | today | yesterday
– [next | last ] [ week | month | year]
– [1 - 31] [January – December]
● Really deterministic
Wow, it's super-deterministic!
Wow, it's super-deterministic!
Credit: Kevin Knight
… sometimes
● TempEval-2 timex recall: 66 – 88 %
● TempEval-2 normalisation: 55 – 85 %
● ~150 rules needed to get to 81% (Angeli &
Uszkoreit '13)
● We can get the structured expressions OK
● But what about the rest?
Unstructured time mentions
– Christmas
– Michelmas
– Halloween
– Easter
● Can we learn how to recognise these?
Time expression diversity
● Current corpora too small to hold much linguistic variation
● Note characteristic knee in distribution (cf. Montemurro)
Named Temporal Expressions
● New class of timexes
– Doesn't look like a timex
– Doesn't sound like a timex
– … is, in fact, a timex
X
How can we mine and extract NTEs?
● Expensive to annotate and hope they appear
● Prefer an automated approach
– > Let's mine Wikipedia!
● 432 English NTEs found
NTEs in Wikipedia
● Gives term and text description
● Problem: no good as a gazetteer, some entries
are polysemous (e.g. Carnival)
● Problem: recall limited with gazetteers
● Solution: build statistical tagger
Building statistical NTE tagger
● Use list of NTEs to annotate sentences
– CoNLL format, I/O binary labels
● Only use monosemous expressions
● Visit linked data searching for expressions
● If many entities found, expression is polysemous
– SELECT DISTINCT ?r {?r rdfs:label "carnival"@en}
– Not monosemous
Building statistical NTE tagger
● If a sentence contains a monosemous NTE,
also annotate any polysemous NTEs
● Assume that they will occur in temporal sense
While it might not have the retail significance
of Christmas, Halloween or Secretary's Day,
Groundhog Day remains perhaps the weirdest
American holiday.
NTE recognition results
● Baseline: gazetteer of timexes in existing
resources
● 2:1 train:eval split, strict matching evaluation
● Also found new NTEs!
– European Cup
– Dayton Peace Agreement
How do we normalise NTEs?
● Target representation: TIMEX3
– January 2nd, 1980 → 1980-01-02
– Summer 2012 → 2012-SU
– now → PRESENT REF
● Statistical learning won't manage
● Use dedicated tool, TIMEN
– Open normalisation toolkit
– Anyone can contribute
– SotA normalisation performance
– Takes a document with entity boundaries marked
Using NTE descriptions
● We have semi-structured descriptions
– “six weeks after Easter”
– “last Friday in June”
– “end of week 17”
– “tenth day of Tishrei”
● How to convert these to rules?
NTE normalisation rule extraction
● Create simple parser to cover majority of NTEs
– “June 25th”
– “Last Sunday in March”
● Covers 70.3% of NTE descriptions
● Remainder of rules may be added manually
Normalisation + NTEs
● Evaluation
● Two corpora:
– SotA (TempEval-3)
– Purpose built to be hard to normalise (TimenEval)
● On TempEval-3 (restricted newswire):
0.7% error reduction
● On TimenEval (varied genre):
4.3% error reduction
Outstanding issues:
Spatial variation
● Labo[u]r Day
– May 1 in much of the world
– first Monday in May in Australia's QLD and NT
● Summer
– Official vs. informal
– North vs. south
Outstanding issues:
Easter
● Commonly used as an
offset
● Non-trivial to determine
● “Computus”
Outstanding issues:
Multiple calendars
● Gregorian (Quite popular)
– Not particularly rational in the first place
● Lunar (China)
● Astrological
● Hebrew
● .. and so on
Outstanding issues:
Forms of expression
● Orthographic variation:
– Martin Luther King Day
– MLK Day
● Regional variation:
– autumn
– fall
Resources provided
● Corpus of NTEs
● Rules integrated into TIMEN in next release
– around November 2013
Thank you for your time!
Do you have any questions?

More Related Content

ODP
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
PDF
Joint Rumour Stance and Veracity
PDF
State of Tools for NLP in Danish: 2018
ODP
RumourEval
PDF
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
PDF
Handling and Mining Linguistic Variation in UGC
ODP
Efficient named entity annotation through pre-empting
PDF
Leveraging the Power of Social Media
Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Rec...
Joint Rumour Stance and Veracity
State of Tools for NLP in Danish: 2018
RumourEval
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Handling and Mining Linguistic Variation in UGC
Efficient named entity annotation through pre-empting
Leveraging the Power of Social Media

More from Leon Derczynski (19)

PDF
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
PDF
Starting to Process Social Media
ODP
Christmas Presentation at Aarhus: What I do
PPT
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
PDF
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
PDF
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
PDF
Determining the Types of Temporal Relations in Discourse
PDF
Microblog-genre noise and its impact on semantic annotation accuracy
PDF
Empirical Validation of Reichenbach’s Tense Framework
PDF
Towards Context-Aware Search and Analysis on Social Media Data
PDF
Determining the Types of Temporal Relations in Discourse
PDF
TIMEN: An Open Temporal Expression Normalisation Resource
PPT
Review of: Challenges of migrating to agile methodologies
PPT
A data driven approach to query expansion in question answering
PDF
A Corpus-based Study of Temporal Signals
PDF
Word Sense Disambiguation and Induction
PDF
Using signals to improve automatic classification of temporal relations
PDF
An Annotation Scheme for Reichenbach's Verbal Tense Structure
PDF
RTMBank: Capturing Verbs with Reichenbach's Tense Model
Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines
Starting to Process Social Media
Christmas Presentation at Aarhus: What I do
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Determining the Types of Temporal Relations in Discourse
Microblog-genre noise and its impact on semantic annotation accuracy
Empirical Validation of Reichenbach’s Tense Framework
Towards Context-Aware Search and Analysis on Social Media Data
Determining the Types of Temporal Relations in Discourse
TIMEN: An Open Temporal Expression Normalisation Resource
Review of: Challenges of migrating to agile methodologies
A data driven approach to query expansion in question answering
A Corpus-based Study of Temporal Signals
Word Sense Disambiguation and Induction
Using signals to improve automatic classification of temporal relations
An Annotation Scheme for Reichenbach's Verbal Tense Structure
RTMBank: Capturing Verbs with Reichenbach's Tense Model
Ad

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
Teaching material agriculture food technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Encapsulation_ Review paper, used for researhc scholars
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks
Teaching material agriculture food technology
Dropbox Q2 2025 Financial Results & Investor Presentation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Programs and apps: productivity, graphics, security and other tools
Reach Out and Touch Someone: Haptics and Empathic Computing
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Ad

Recognising and Interpreting Named Temporal Expressions

  • 1. Recognising and Interpreting Named Temporal Expressions Matteo Brucato Leon Derczynski Hector Llorens Kalina Bontcheva Christian S. Jensen
  • 2. How do we talk about times? ● Calendar ● Closed class of terms – tomorrow | today | yesterday – [next | last ] [ week | month | year] – [1 - 31] [January – December] ● Really deterministic
  • 5. … sometimes ● TempEval-2 timex recall: 66 – 88 % ● TempEval-2 normalisation: 55 – 85 % ● ~150 rules needed to get to 81% (Angeli & Uszkoreit '13) ● We can get the structured expressions OK ● But what about the rest?
  • 6. Unstructured time mentions – Christmas – Michelmas – Halloween – Easter ● Can we learn how to recognise these?
  • 7. Time expression diversity ● Current corpora too small to hold much linguistic variation ● Note characteristic knee in distribution (cf. Montemurro)
  • 8. Named Temporal Expressions ● New class of timexes – Doesn't look like a timex – Doesn't sound like a timex – … is, in fact, a timex X
  • 9. How can we mine and extract NTEs? ● Expensive to annotate and hope they appear ● Prefer an automated approach – > Let's mine Wikipedia! ● 432 English NTEs found
  • 10. NTEs in Wikipedia ● Gives term and text description ● Problem: no good as a gazetteer, some entries are polysemous (e.g. Carnival) ● Problem: recall limited with gazetteers ● Solution: build statistical tagger
  • 11. Building statistical NTE tagger ● Use list of NTEs to annotate sentences – CoNLL format, I/O binary labels ● Only use monosemous expressions ● Visit linked data searching for expressions ● If many entities found, expression is polysemous – SELECT DISTINCT ?r {?r rdfs:label "carnival"@en} – Not monosemous
  • 12. Building statistical NTE tagger ● If a sentence contains a monosemous NTE, also annotate any polysemous NTEs ● Assume that they will occur in temporal sense While it might not have the retail significance of Christmas, Halloween or Secretary's Day, Groundhog Day remains perhaps the weirdest American holiday.
  • 13. NTE recognition results ● Baseline: gazetteer of timexes in existing resources ● 2:1 train:eval split, strict matching evaluation ● Also found new NTEs! – European Cup – Dayton Peace Agreement
  • 14. How do we normalise NTEs? ● Target representation: TIMEX3 – January 2nd, 1980 → 1980-01-02 – Summer 2012 → 2012-SU – now → PRESENT REF ● Statistical learning won't manage ● Use dedicated tool, TIMEN – Open normalisation toolkit – Anyone can contribute – SotA normalisation performance – Takes a document with entity boundaries marked
  • 15. Using NTE descriptions ● We have semi-structured descriptions – “six weeks after Easter” – “last Friday in June” – “end of week 17” – “tenth day of Tishrei” ● How to convert these to rules?
  • 16. NTE normalisation rule extraction ● Create simple parser to cover majority of NTEs – “June 25th” – “Last Sunday in March” ● Covers 70.3% of NTE descriptions ● Remainder of rules may be added manually
  • 17. Normalisation + NTEs ● Evaluation ● Two corpora: – SotA (TempEval-3) – Purpose built to be hard to normalise (TimenEval) ● On TempEval-3 (restricted newswire): 0.7% error reduction ● On TimenEval (varied genre): 4.3% error reduction
  • 18. Outstanding issues: Spatial variation ● Labo[u]r Day – May 1 in much of the world – first Monday in May in Australia's QLD and NT ● Summer – Official vs. informal – North vs. south
  • 19. Outstanding issues: Easter ● Commonly used as an offset ● Non-trivial to determine ● “Computus”
  • 20. Outstanding issues: Multiple calendars ● Gregorian (Quite popular) – Not particularly rational in the first place ● Lunar (China) ● Astrological ● Hebrew ● .. and so on
  • 21. Outstanding issues: Forms of expression ● Orthographic variation: – Martin Luther King Day – MLK Day ● Regional variation: – autumn – fall
  • 22. Resources provided ● Corpus of NTEs ● Rules integrated into TIMEN in next release – around November 2013
  • 23. Thank you for your time! Do you have any questions?