SlideShare a Scribd company logo
University of Helsinki                                                                                                    Department of Computer Scien




          Utilizing Temporal Information in
            Topic Detection and Tracking
                                       Juha Makkonen and Helena Ahonen–Myka
                                           {jamakkon,hahonen}@cs.helsinki.fi


                            University of Helsinki – Department of Computer Science




Juha Makkonen and Helena Ahonen–Myka            Utilizing Temporal Information in Topic Detection and Tracking – p.1/15                     2003-08-1
University of Helsinki                                                                                              Department of Computer Scien




    Outline
                         Introduction
                         Topic Detection and Tracking
                         Resolving temporal expressions
                            Recognition
                            Formalization
                            Comparison
                         Experiments
                         Future Work




Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.2/15                     2003-08-1
University of Helsinki                                                                                              Department of Computer Scien




    Introduction
                         Temporal expressions are often omitted.
                            their extraction requires tools,
                            they have to be formalized in order to be of any use,
                            comparing formalizations is sometimes tricky.
                         By no means a novel idea
                            in AI to form chronologies of events,
                            in question answering to extract a fact,
                            in databases, diagnosing systems, dialog systems . . .
                         We want to measure the temporal similarity of two
                         documents.

Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.3/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Topic Detection and Tracking
                         TDT system monitors news broadcasts in order to
                             detect new, previously unreported events, and to
                             track the development of the detected events.
                         The focus is on news events: something untrivial taking
                         place at a specific time and place.
                         A topic is understood as as is an event or an activity,
                         along with all related events and activities.
                         The news stream that is monitored in intrinsically
                         sensitive to time.



Juha Makkonen and Helena Ahonen–Myka       Utilizing Temporal Information in Topic Detection and Tracking – p.4/15                     2003-08-1
University of Helsinki                                                                                                Department of Computer Scien




    Resolving Temporal Expressions
                         An expression can be
                             explicit: “the 19th of August 2003”,
                             implicit: “today”, “Tuesday afternoon”, or
                             vague: “since April”, “a couple of weeks ago” .
                         The evaluation is based on a point of reference. “The
                         winter of 1974 was cold. The next winter will be colder.”
                         “The winter of 1974 was cold. The next winter was colder.”
                         Resolving the meaning of the latter winter requires
                             the reference time or the utterance time and
                             the tense of the relevant verb.

Juha Makkonen and Helena Ahonen–Myka        Utilizing Temporal Information in Topic Detection and Tracking – p.5/15                     2003-08-1
University of Helsinki                                                                                                             Department of Computer Scien




    Recognition
                         The relevant terms are split into categories.
              category                 terms
              baseterm                 day, week, weekday, month, monthname, quarter, season, year, decade
              indexical                yesterday, today, tomorrow
              internal                 beginning, end, early, late, middle
              determiner               this, last, next, previous, the
              temporal                 in, on, by, during, after, until, since, before, later
              postmodifier              of, to
              numeral                  one, two, . . .
              ordinal                  first, second, . . .
              adverb                   ago
              meta                     throughout
              vague                    some, few, several
              recurrence               every, per
              source                   from
Juha Makkonen and Helena Ahonen–Myka                     Utilizing Temporal Information in Topic Detection and Tracking – p.6/15                     2003-08-1
University of Helsinki                                                                                                         Department of Computer Scien




    Recognition
                         The categories are used to build automata.

                                                                 postmodifier
                                                                                                 determiner

                                               ordinal          postmodifier                       determiner
                                                                                                                        year
                                       determiner
                                init
                                                                     monthname
                                           determiner

                                                                    internal
                                         temporal
                                                                                                       numeral
                                                                        internal                       ordinal

                         “The strike started on the 15th of May 1919. It lasted until
                         the end of June, although there was still turmoil in late
                         January next year”.
Juha Makkonen and Helena Ahonen–Myka                Utilizing Temporal Information in Topic Detection and Tracking – p.7/15                      2003-08-1
University of Helsinki                                                                                                     Department of Computer Scien




    Formalization
                         We map the expressions onto a calendar
                             a time-line – points with precedence relation,
                             a set of granularities (year, month, week, . . . )
                             note: March, Thursday and weekend are also granularities.

                             a set of conversion functions between granularities.
                         The expressions are mapped as intervals [tstart , tend ] of the
                         bottom granularity which in our case is day.




Juha Makkonen and Helena Ahonen–Myka             Utilizing Temporal Information in Topic Detection and Tracking – p.8/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Formalization
                         The baseterm of the expression defines interval.
                         The non-baseterms are interpreted as shift and span
                         functions that modify the start and end points.
                            shift: this, next, last, 3 weeks ago, etc.
                            span: until, before, after, from, etc.
                         the length of the interval is modified by internals
                            in the beginning of 1970s, late May, etc.




Juha Makkonen and Helena Ahonen–Myka       Utilizing Temporal Information in Topic Detection and Tracking – p.9/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Comparison
                         We want to measure the temporal similarity of two
                         documents, i.e., how much the references overlap.
                         When comparing the intervals of two documents
                            compare pairwise all intervals
                            similarity = 2 * overlap / size of the intervals
                            take the average of the best matches for each interval.
                         The outcome measures how well the references of one
                         document cover those of the other.




Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.10/15                     2003-08-1
University of Helsinki                                                                                                  Department of Computer Scien




    Experiments
                         Data: transcribed TV and radio broadcasts and online
                         news.
                             8595 documents from the TDT2 corpus.
                             2383 documents were labeled to one of 35 events.
                         Temporal expression recognition with 1417 sentences

                                         type     freq          recognition                      canonization
                                       simple     326                            0.98                                  0.93
                                  composite       209                            0.85                                  0.66
                         Verbs like to schedule , to plan or to expect gave hard time.
                         “The meeting was scheduled for Monday.” Which one?
Juha Makkonen and Helena Ahonen–Myka        Utilizing Temporal Information in Topic Detection and Tracking – p.11/15                      2003-08-1
University of Helsinki                                                                                                Department of Computer Scien




    Experiments
                         The distribution of temporal relations

                                                                     same event
                                       relation                         yes                 no
                                       before                     0.761              0.831
                                       meets                      0.001              0.000
                                       overlaps                   0.016              0.008
                                       begins                     0.010              0.006
                                       falls within               0.168              0.122
                                       finishes                    0.010              0.008
                                       exact                      0.072              0.056
Juha Makkonen and Helena Ahonen–Myka       Utilizing Temporal Information in Topic Detection and Tracking – p.12/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Experiments
                         Temporal similarity is higher when documents are
                         relevant.

             average of                 same event different event ratio of yes/no
             sum of pairwise                 0.0034                                 0.0023                               1.4783
             max of pairwise                 0.0059                                 0.0040                               1.4750


                         Finding the best-match for each interval does not pay off.
                         A better accuracy on formalization would help.
                         What is the meaning of “three years ago?”
                         How to represent informativeness?
Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.13/15                     2003-08-1
University of Helsinki                                                                                               Department of Computer Scien




    Future Work
                         Improvement of the composite expression processing
                            more work on the automata
                         Introduction of vagueness:
                            an expression would formalized as probability
                            distributions on the timeline
                            similarity could be Kullback-Leibler, for instance.
                         Survey of the behaviour of the temporal expressions
                            how the references distribute per medium?
                            the first story compared to the following ones?



Juha Makkonen and Helena Ahonen–Myka      Utilizing Temporal Information in Topic Detection and Tracking – p.14/15                     2003-08-1
University of Helsinki                                                                                            Department of Computer Scien




    The End



                                       Thank you




Juha Makkonen and Helena Ahonen–Myka   Utilizing Temporal Information in Topic Detection and Tracking – p.15/15                     2003-08-1

More Related Content

PDF
Naturalized Epistemology North American Computing and Philosophy 2007
PPT
Latent Semantics & Social Interaction
PDF
Topic detection & tracking
PDF
Topic detection and tracking
PPTX
Generating Storylines (Literature Survey)
DOCX
Unit IV Knowledge representation we.docx
PPTX
KRR Unit-IV for btech Students helpful.pptx
PDF
A Corpus-based Study of Temporal Signals
Naturalized Epistemology North American Computing and Philosophy 2007
Latent Semantics & Social Interaction
Topic detection & tracking
Topic detection and tracking
Generating Storylines (Literature Survey)
Unit IV Knowledge representation we.docx
KRR Unit-IV for btech Students helpful.pptx
A Corpus-based Study of Temporal Signals

Similar to Utilizing temporal information in topic detection and tracking (20)

PPTX
Temporal Web Dynamics and Implications for Information Retrieval
PDF
Determining the Types of Temporal Relations in Discourse
PDF
Temporal Web Dynamics: Implications from Search Perspective
PDF
Temporal expressions identification in biomedical texts
PPTX
Prepositions erick jardin
PPTX
Prepositions erick jardin
PPT
Pasttime
PDF
Exploiting temporal information in retrieval of archived documents (doctoral ...
PDF
My research taster project
PDF
503 Final Presentation
PDF
Temporal Semantic Techniques for Text Analysis and Applications
PPTX
Past continuous
PDF
Topic Tracking for Punjabi Language
PDF
Searching the Temporal Web: Challenges and Current Approaches
PPS
Present perfect
PPTX
Present perfect simple por English learners
PPTX
Past Tense2
DOCX
English Reviewer
PPT
Tense and aspect -clt framework
PPTX
Present perfect with for and since in English.pptx
Temporal Web Dynamics and Implications for Information Retrieval
Determining the Types of Temporal Relations in Discourse
Temporal Web Dynamics: Implications from Search Perspective
Temporal expressions identification in biomedical texts
Prepositions erick jardin
Prepositions erick jardin
Pasttime
Exploiting temporal information in retrieval of archived documents (doctoral ...
My research taster project
503 Final Presentation
Temporal Semantic Techniques for Text Analysis and Applications
Past continuous
Topic Tracking for Punjabi Language
Searching the Temporal Web: Challenges and Current Approaches
Present perfect
Present perfect simple por English learners
Past Tense2
English Reviewer
Tense and aspect -clt framework
Present perfect with for and since in English.pptx
Ad

More from George Ang (20)

PDF
Wrapper induction construct wrappers automatically to extract information f...
PDF
Opinion mining and summarization
PPT
Huffman coding
PPT
Do not crawl in the dust 
different ur ls similar text
PPT
大规模数据处理的那些事儿
PPT
腾讯大讲堂02 休闲游戏发展的文化趋势
PPT
腾讯大讲堂03 qq邮箱成长历程
PPT
腾讯大讲堂04 im qq
PPT
腾讯大讲堂05 面向对象应对之道
PPT
腾讯大讲堂06 qq邮箱性能优化
PPT
腾讯大讲堂07 qq空间
PPT
腾讯大讲堂08 可扩展web架构探讨
PPT
腾讯大讲堂09 如何建设高性能网站
PPT
腾讯大讲堂01 移动qq产品发展历程
PPT
腾讯大讲堂10 customer engagement
PPT
腾讯大讲堂11 拍拍ce工作经验分享
PPT
腾讯大讲堂14 qq直播(qq live) 介绍
PPT
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
PPTX
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
PPT
腾讯大讲堂16 产品经理工作心得分享
Wrapper induction construct wrappers automatically to extract information f...
Opinion mining and summarization
Huffman coding
Do not crawl in the dust 
different ur ls similar text
大规模数据处理的那些事儿
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂04 im qq
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂07 qq空间
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂10 customer engagement
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂16 产品经理工作心得分享
Ad

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Tartificialntelligence_presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPT
Teaching material agriculture food technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Big Data Technologies - Introduction.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Spectroscopy.pptx food analysis technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
“AI and Expert System Decision Support & Business Intelligence Systems”
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Tartificialntelligence_presentation.pptx
Empathic Computing: Creating Shared Understanding
Teaching material agriculture food technology
Programs and apps: productivity, graphics, security and other tools
Big Data Technologies - Introduction.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
A comparative analysis of optical character recognition models for extracting...
Spectroscopy.pptx food analysis technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine learning based COVID-19 study performance prediction
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Utilizing temporal information in topic detection and tracking

  • 1. University of Helsinki Department of Computer Scien Utilizing Temporal Information in Topic Detection and Tracking Juha Makkonen and Helena Ahonen–Myka {jamakkon,hahonen}@cs.helsinki.fi University of Helsinki – Department of Computer Science Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.1/15 2003-08-1
  • 2. University of Helsinki Department of Computer Scien Outline Introduction Topic Detection and Tracking Resolving temporal expressions Recognition Formalization Comparison Experiments Future Work Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.2/15 2003-08-1
  • 3. University of Helsinki Department of Computer Scien Introduction Temporal expressions are often omitted. their extraction requires tools, they have to be formalized in order to be of any use, comparing formalizations is sometimes tricky. By no means a novel idea in AI to form chronologies of events, in question answering to extract a fact, in databases, diagnosing systems, dialog systems . . . We want to measure the temporal similarity of two documents. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.3/15 2003-08-1
  • 4. University of Helsinki Department of Computer Scien Topic Detection and Tracking TDT system monitors news broadcasts in order to detect new, previously unreported events, and to track the development of the detected events. The focus is on news events: something untrivial taking place at a specific time and place. A topic is understood as as is an event or an activity, along with all related events and activities. The news stream that is monitored in intrinsically sensitive to time. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.4/15 2003-08-1
  • 5. University of Helsinki Department of Computer Scien Resolving Temporal Expressions An expression can be explicit: “the 19th of August 2003”, implicit: “today”, “Tuesday afternoon”, or vague: “since April”, “a couple of weeks ago” . The evaluation is based on a point of reference. “The winter of 1974 was cold. The next winter will be colder.” “The winter of 1974 was cold. The next winter was colder.” Resolving the meaning of the latter winter requires the reference time or the utterance time and the tense of the relevant verb. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.5/15 2003-08-1
  • 6. University of Helsinki Department of Computer Scien Recognition The relevant terms are split into categories. category terms baseterm day, week, weekday, month, monthname, quarter, season, year, decade indexical yesterday, today, tomorrow internal beginning, end, early, late, middle determiner this, last, next, previous, the temporal in, on, by, during, after, until, since, before, later postmodifier of, to numeral one, two, . . . ordinal first, second, . . . adverb ago meta throughout vague some, few, several recurrence every, per source from Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.6/15 2003-08-1
  • 7. University of Helsinki Department of Computer Scien Recognition The categories are used to build automata. postmodifier determiner ordinal postmodifier determiner year determiner init monthname determiner internal temporal numeral internal ordinal “The strike started on the 15th of May 1919. It lasted until the end of June, although there was still turmoil in late January next year”. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.7/15 2003-08-1
  • 8. University of Helsinki Department of Computer Scien Formalization We map the expressions onto a calendar a time-line – points with precedence relation, a set of granularities (year, month, week, . . . ) note: March, Thursday and weekend are also granularities. a set of conversion functions between granularities. The expressions are mapped as intervals [tstart , tend ] of the bottom granularity which in our case is day. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.8/15 2003-08-1
  • 9. University of Helsinki Department of Computer Scien Formalization The baseterm of the expression defines interval. The non-baseterms are interpreted as shift and span functions that modify the start and end points. shift: this, next, last, 3 weeks ago, etc. span: until, before, after, from, etc. the length of the interval is modified by internals in the beginning of 1970s, late May, etc. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.9/15 2003-08-1
  • 10. University of Helsinki Department of Computer Scien Comparison We want to measure the temporal similarity of two documents, i.e., how much the references overlap. When comparing the intervals of two documents compare pairwise all intervals similarity = 2 * overlap / size of the intervals take the average of the best matches for each interval. The outcome measures how well the references of one document cover those of the other. Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.10/15 2003-08-1
  • 11. University of Helsinki Department of Computer Scien Experiments Data: transcribed TV and radio broadcasts and online news. 8595 documents from the TDT2 corpus. 2383 documents were labeled to one of 35 events. Temporal expression recognition with 1417 sentences type freq recognition canonization simple 326 0.98 0.93 composite 209 0.85 0.66 Verbs like to schedule , to plan or to expect gave hard time. “The meeting was scheduled for Monday.” Which one? Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.11/15 2003-08-1
  • 12. University of Helsinki Department of Computer Scien Experiments The distribution of temporal relations same event relation yes no before 0.761 0.831 meets 0.001 0.000 overlaps 0.016 0.008 begins 0.010 0.006 falls within 0.168 0.122 finishes 0.010 0.008 exact 0.072 0.056 Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.12/15 2003-08-1
  • 13. University of Helsinki Department of Computer Scien Experiments Temporal similarity is higher when documents are relevant. average of same event different event ratio of yes/no sum of pairwise 0.0034 0.0023 1.4783 max of pairwise 0.0059 0.0040 1.4750 Finding the best-match for each interval does not pay off. A better accuracy on formalization would help. What is the meaning of “three years ago?” How to represent informativeness? Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.13/15 2003-08-1
  • 14. University of Helsinki Department of Computer Scien Future Work Improvement of the composite expression processing more work on the automata Introduction of vagueness: an expression would formalized as probability distributions on the timeline similarity could be Kullback-Leibler, for instance. Survey of the behaviour of the temporal expressions how the references distribute per medium? the first story compared to the following ones? Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.14/15 2003-08-1
  • 15. University of Helsinki Department of Computer Scien The End Thank you Juha Makkonen and Helena Ahonen–Myka Utilizing Temporal Information in Topic Detection and Tracking – p.15/15 2003-08-1