SlideShare a Scribd company logo
Generating Event Storylinesfrom
Microblogs
CIKM’12
ABSTRACT
 we explore the problem of generating storylines

from microblogs for user input queries.
 Given a query of an ongoing event, we propose
to sketch the real-time storyline of the event by a
two-level solution.
1. propose a language model with dynamic
pseudo relevance feedback to obtain relevant
tweets
2. Generate storylines via graph optimization
INTRODUCTION
 Generating Event Storyline from Microblogs

(GESM)
INTRODUCTION
 differences between GESM and prior studies:

Well edited facts ---- short noisy text
2. GESM provides personalized service
3. A two-level framework is necessary: at the low
level, finding all relevant tweets through the
time-line of the event by a retrieve model; and
at the high level, summarizing relevant tweets
and the latent structure to produce a storyline.
1.
INTRODUCTION
 Challenges

1、the dynamic and sparse nature of microblogs
——How to match the underlying event expressed
by the vague event query to potential relevant
tweets which possibly not contain any query terms
2、Numerous duplicate tweets and direct and
undirect re-tweets
INTRODUCTION
 contributions

generating event storylines from microblogs
2. A dynamic pseudo relevance feedback (DPRF)
language
model
3. a graph-based optimization problem and is
solved by approximation algorithms of
minimum-weight dominating set and directed
Steiner tree
1.
THE FRAMEWORK OVERVIEW
 generated storyline should be a graph structure
 Node is labeled by a summary
 Edge represents causal relationship between two

phases
 Offline layer
 Online layers
THE RETRIEVAL MODEL
 Preliminaries
 the original query is usually short and vague
 Query expansion
 In a pseudo relevance manner, suppose the few top

ranked documents d + by the initial query Q builds a
relevant model θ F , we can set the new query to be
a linear combination of original query Q and
relevant model θF
THE RETRIEVAL MODEL
 Dynamic Pseudo Relevance Feedback
 K burst periods
 Assume that the prior probability of relevant

document d + is dependent on the distance of td+
to the centroid
of burst periods, denoted as Φ = { φ 1 ··· φ K }
 three probability functions to model the effective
range of burst period, decay coefficient and
skewness.
1. Mixture Gaussian Distribution
2. Local Power Distribution
3. Skewed Linear Distribution
THE RETRIEVAL MODEL
 Mixture Gaussian Distribution

 Local Power Distribution

 Skewed Linear Distribution
THE RETRIEVAL MODEL
 Burst Period Detection

appear more frequently than usual
2. be continuously frequent around the time point.
 detect burst periods of the event by
1. for each query term, finding the time intervals
with arbitrary length in which the query term
appears constantly frequent;
2. picking the time points within these intervals
with the
largest sum of frequencies over all query terms.
1.
THE RETRIEVAL MODEL
 “bursty score”

 find time interval Tw,j = <st, et, LS, RS> with the

maximal cumulative burst score B ( w, Tw,j )
 Compute the score of any query term q at each

time point

 Rank each time point by ∑q∈QH ( q,t )and choose

the largest K time point φk .
STORYLINE GENERATION
Representative tweets
2. Depict the evolving structure of the event
3. an optimistic connection
 a multi-view tweet graph is constructed
 a minimum dominant set on the tweet graph
 a minimum steiner tree
1.
STORYLINE GENERATION

 three non negative real parameters α, τ1, τ2 , τ1<

τ2 .
 define E : text similarity > α
 define A : τ1 ≤ t j − t i ≤ τ2
 w(vi ) = 1 − score ( Q,vi ).
STORYLINE GENERATION
 A subset S of the vertex set of an undirected

graph is a
dominating set if for each vertex u ,either u is in
S or is adjacent to a vertex in S .
STORYLINE GENERATION
 greedy algorithm
STORYLINE GENERATION
 A Steiner tree of a graph G with respect to a

vertex subset S is the edge-induced sub-tree of G
that contains all the vertices of S having the
minimum total cost, where the cost is
the total weight of the vertices.
STORYLINE GENERATION
STORYLINE GENERATION
EXPERIMENTS
 Data Set
EXPERIMENTS
 Tweet Retrieval
 49 queries
 evaluation metric :
 precision at top 30 tweets(P@30)
 mean average precision(MAP)

 precision at top 100 tweets(P@100)
 R-precision (R-PREC)
EXPERIMENTS
 Comparative Study
EXPERIMENTS
 Parameter Tuning
EXPERIMENTS
 Summarization Capability
EXPERIMENTS
 Parameter Tuning
EXPERIMENTS
 A User Study
CONCLUSION
 The proposed dynamic pseudo relevance

feedback model
 minimum weighted Steiner tree on a dominant set
 充分的实验
OMG, I Have to Tweet That!
A Study of Factors that Influence Tweet
Rates
Abstract
 key limitation :
 it depends on people self reporting their own

behaviors
and observations.
 a large scale quantitative analysis of some of

the factors that influence self reporting bias.
 the daily variations in tweet rates about weather
events
Introduction
 treating social media as a signal to measure the

relative real-world occurrence of events
 critical challenge :
 the bias introduced by the self-reported nature of

social media
 What is it about an event that makes it more or

less “tweetable”?
 A first large-scale, quantitative analysis of some
of the factors that influence self-reporting bias by
comparing a year of tweets about weather
events in cities across the United States and
Canada to ground-truth knowledge about actual
weather occurrences.
Introduction
 three potential factors :

How extreme is the weather?
2. How expected is the weather given the time-ofyear?
3. How much did the weather change?
1.
Data Preparation
 Jun 1, 2010 and Jun 30, 2011
 56 different metropolitan areas
 historical weather data provided by the National

Oceanic and Atmospheric Administration of the
United States.
Identifying Weather-related Tweets
 discovering the rate of weather-related tweets

that occurred per-day across metropolitan areas
1. filtering the full archive of tweets for tweets that
contain at least 1 weather-related word from a
list of 179 weather-related words and phrases
2. build a classifier for weather-related tweets
 a simple classifier that estimates the probability

of a tweet being weather related as
Identifying the Location of Tweets
 geo-coded
 the textual user- provided location field in a user’s

Twitter
profile
 normalize the textual
 arbitrary user-provided location information into

concrete
geo-coded coordinates
a mapping from user-provided location fields to
latitude-longitude coordinates.
2. merge location fields with similar geo-mappings
together to create clusters for roughly metropolitansized areas
1.
Identifying the Location of Tweets
Historical Weather Data
 calculate daily summaries
 For each daily summary of weather data at a

location:
 Expectation: how normal the observed weather
is at a location
 Extremeness : how extreme the weather is on a
particular day
 Change: how different the observed weather data
is from previous days’ weather
Analysis and Results
 Tweet Rates and Weather Reports
Analysis and Results
 Linear Regression
 the relationship between a set of weather-derived

features and the daily rate of weather-related
tweets
Analysis and Results
 Correlating Basic Weather Data and Tweet

Rates
Analysis and Results
 Correlating Expectation and Tweet Rates
 expectation measure adds little information about

likely tweet rates beyond what is already
contained in basic weather data
 Correlating Extremeness and Tweet Rates
 extremeness can independently explain more of
the variation in weather-related tweet rates than
basic weather alone
 Correlating Delta Change and Tweet Rates
 there is little difference in the amount of
information gained from building these deltachange models
 Combining Extremeness, Expectation, and
Delta
Change Models
Analysis and Results
 Per-Location Models
Generating event storylines from microblogs
Discussion
 Additional Factors Likely to Effect Tweet

Rates
 Sentiment
 Privacy concerns, embarrassments and safety:
 Population segments :
 Mobile devices
 Time-of-Day, day-of-week, holiday, and other
effects of time:
Conclusions
 the correlation between daily tweet

rates and the expectation, extremeness, and the
change in
observed weather.
 global models
 location-specific models
 Extremeness>change>expectation

More Related Content

PPTX
Event summarization using tweets
PPTX
Fire and geodemographics - Tessa Anderson
PDF
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
PDF
Master Thesis Presentation
PPTX
Presentation-for-broker
PPTX
Finding bursty topics from microblogs
PPTX
Is it time for a career switch
PPTX
Magnet community identification on social networks
Event summarization using tweets
Fire and geodemographics - Tessa Anderson
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Master Thesis Presentation
Presentation-for-broker
Finding bursty topics from microblogs
Is it time for a career switch
Magnet community identification on social networks

Viewers also liked (13)

PPTX
Using content and interactions for discovering communities in
PPTX
Doppler
PDF
HSI_Intro_Short
PPTX
Questions about questions
PPTX
Topical keyphrase extraction from twitter
PPTX
Презентация проекта MegaStrahovka.ru
PPTX
Презентация CarScan24.ru
PPTX
Presentation2
PPTX
When relevance is not enough
PPT
Modul pengukuran. aliran fluida.
PPTX
Exploring social influence via posterior effect of word of-mouth
PPTX
Accounting principles
PPTX
Doppler effect experiment and applications
Using content and interactions for discovering communities in
Doppler
HSI_Intro_Short
Questions about questions
Topical keyphrase extraction from twitter
Презентация проекта MegaStrahovka.ru
Презентация CarScan24.ru
Presentation2
When relevance is not enough
Modul pengukuran. aliran fluida.
Exploring social influence via posterior effect of word of-mouth
Accounting principles
Doppler effect experiment and applications
Ad

Similar to Generating event storylines from microblogs (20)

PPTX
Earthquake shakes twitter users real-time event detection by social sensors
PDF
Surfacing Real-World Event Content on Twitter
DOCX
Discovering emerging topics in social streams via link anomaly detection
PDF
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
PDF
An adaptive clustering and classification algorithm for Twitter data streamin...
PDF
Pre-defense_talk
PDF
Evolving Swings (topics) from Social Streams using Probability Model
PDF
Twitter as a personalizable information service ii
PDF
GeospatialDataAnalysis
PDF
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
PDF
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
PDF
Classification of Disastrous Tweets on Twitter using BERT Model
PDF
Analyzing Social media’s real data detection through Web content mining using...
PDF
Analyzing-Threat-Levels-of-Extremists-using-Tweets
PDF
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
DOCX
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
DOCX
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
PDF
Twitris - Web Information System 2011 Course
PDF
Twaster final project report
Earthquake shakes twitter users real-time event detection by social sensors
Surfacing Real-World Event Content on Twitter
Discovering emerging topics in social streams via link anomaly detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
An adaptive clustering and classification algorithm for Twitter data streamin...
Pre-defense_talk
Evolving Swings (topics) from Social Streams using Probability Model
Twitter as a personalizable information service ii
GeospatialDataAnalysis
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
Mining Twitter Data with Resource Constraints - IEEE/ACM Conference on Web In...
Classification of Disastrous Tweets on Twitter using BERT Model
Analyzing Social media’s real data detection through Web content mining using...
Analyzing-Threat-Levels-of-Extremists-using-Tweets
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
Twitris - Web Information System 2011 Course
Twaster final project report
Ad

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
Approach and Philosophy of On baking technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
KodekX | Application Modernization Development
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Encapsulation theory and applications.pdf
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Approach and Philosophy of On baking technology
The AUB Centre for AI in Media Proposal.docx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Dropbox Q2 2025 Financial Results & Investor Presentation
Programs and apps: productivity, graphics, security and other tools
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MYSQL Presentation for SQL database connectivity
KodekX | Application Modernization Development
MIND Revenue Release Quarter 2 2025 Press Release
Reach Out and Touch Someone: Haptics and Empathic Computing
Review of recent advances in non-invasive hemoglobin estimation
Encapsulation theory and applications.pdf
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Weekly Chronicles - August'25 Week I

Generating event storylines from microblogs

  • 2. ABSTRACT  we explore the problem of generating storylines from microblogs for user input queries.  Given a query of an ongoing event, we propose to sketch the real-time storyline of the event by a two-level solution. 1. propose a language model with dynamic pseudo relevance feedback to obtain relevant tweets 2. Generate storylines via graph optimization
  • 3. INTRODUCTION  Generating Event Storyline from Microblogs (GESM)
  • 4. INTRODUCTION  differences between GESM and prior studies: Well edited facts ---- short noisy text 2. GESM provides personalized service 3. A two-level framework is necessary: at the low level, finding all relevant tweets through the time-line of the event by a retrieve model; and at the high level, summarizing relevant tweets and the latent structure to produce a storyline. 1.
  • 5. INTRODUCTION  Challenges 1、the dynamic and sparse nature of microblogs ——How to match the underlying event expressed by the vague event query to potential relevant tweets which possibly not contain any query terms 2、Numerous duplicate tweets and direct and undirect re-tweets
  • 6. INTRODUCTION  contributions generating event storylines from microblogs 2. A dynamic pseudo relevance feedback (DPRF) language model 3. a graph-based optimization problem and is solved by approximation algorithms of minimum-weight dominating set and directed Steiner tree 1.
  • 7. THE FRAMEWORK OVERVIEW  generated storyline should be a graph structure  Node is labeled by a summary  Edge represents causal relationship between two phases  Offline layer  Online layers
  • 8. THE RETRIEVAL MODEL  Preliminaries  the original query is usually short and vague  Query expansion  In a pseudo relevance manner, suppose the few top ranked documents d + by the initial query Q builds a relevant model θ F , we can set the new query to be a linear combination of original query Q and relevant model θF
  • 9. THE RETRIEVAL MODEL  Dynamic Pseudo Relevance Feedback  K burst periods  Assume that the prior probability of relevant document d + is dependent on the distance of td+ to the centroid of burst periods, denoted as Φ = { φ 1 ··· φ K }  three probability functions to model the effective range of burst period, decay coefficient and skewness. 1. Mixture Gaussian Distribution 2. Local Power Distribution 3. Skewed Linear Distribution
  • 10. THE RETRIEVAL MODEL  Mixture Gaussian Distribution  Local Power Distribution  Skewed Linear Distribution
  • 11. THE RETRIEVAL MODEL  Burst Period Detection appear more frequently than usual 2. be continuously frequent around the time point.  detect burst periods of the event by 1. for each query term, finding the time intervals with arbitrary length in which the query term appears constantly frequent; 2. picking the time points within these intervals with the largest sum of frequencies over all query terms. 1.
  • 12. THE RETRIEVAL MODEL  “bursty score”  find time interval Tw,j = <st, et, LS, RS> with the maximal cumulative burst score B ( w, Tw,j )  Compute the score of any query term q at each time point  Rank each time point by ∑q∈QH ( q,t )and choose the largest K time point φk .
  • 13. STORYLINE GENERATION Representative tweets 2. Depict the evolving structure of the event 3. an optimistic connection  a multi-view tweet graph is constructed  a minimum dominant set on the tweet graph  a minimum steiner tree 1.
  • 14. STORYLINE GENERATION  three non negative real parameters α, τ1, τ2 , τ1< τ2 .  define E : text similarity > α  define A : τ1 ≤ t j − t i ≤ τ2  w(vi ) = 1 − score ( Q,vi ).
  • 15. STORYLINE GENERATION  A subset S of the vertex set of an undirected graph is a dominating set if for each vertex u ,either u is in S or is adjacent to a vertex in S .
  • 17. STORYLINE GENERATION  A Steiner tree of a graph G with respect to a vertex subset S is the edge-induced sub-tree of G that contains all the vertices of S having the minimum total cost, where the cost is the total weight of the vertices.
  • 21. EXPERIMENTS  Tweet Retrieval  49 queries  evaluation metric :  precision at top 30 tweets(P@30)  mean average precision(MAP)  precision at top 100 tweets(P@100)  R-precision (R-PREC)
  • 27. CONCLUSION  The proposed dynamic pseudo relevance feedback model  minimum weighted Steiner tree on a dominant set  充分的实验
  • 28. OMG, I Have to Tweet That! A Study of Factors that Influence Tweet Rates
  • 29. Abstract  key limitation :  it depends on people self reporting their own behaviors and observations.  a large scale quantitative analysis of some of the factors that influence self reporting bias.  the daily variations in tweet rates about weather events
  • 30. Introduction  treating social media as a signal to measure the relative real-world occurrence of events  critical challenge :  the bias introduced by the self-reported nature of social media  What is it about an event that makes it more or less “tweetable”?  A first large-scale, quantitative analysis of some of the factors that influence self-reporting bias by comparing a year of tweets about weather events in cities across the United States and Canada to ground-truth knowledge about actual weather occurrences.
  • 31. Introduction  three potential factors : How extreme is the weather? 2. How expected is the weather given the time-ofyear? 3. How much did the weather change? 1.
  • 32. Data Preparation  Jun 1, 2010 and Jun 30, 2011  56 different metropolitan areas  historical weather data provided by the National Oceanic and Atmospheric Administration of the United States.
  • 33. Identifying Weather-related Tweets  discovering the rate of weather-related tweets that occurred per-day across metropolitan areas 1. filtering the full archive of tweets for tweets that contain at least 1 weather-related word from a list of 179 weather-related words and phrases 2. build a classifier for weather-related tweets
  • 34.  a simple classifier that estimates the probability of a tweet being weather related as
  • 35. Identifying the Location of Tweets  geo-coded  the textual user- provided location field in a user’s Twitter profile  normalize the textual  arbitrary user-provided location information into concrete geo-coded coordinates a mapping from user-provided location fields to latitude-longitude coordinates. 2. merge location fields with similar geo-mappings together to create clusters for roughly metropolitansized areas 1.
  • 37. Historical Weather Data  calculate daily summaries  For each daily summary of weather data at a location:  Expectation: how normal the observed weather is at a location  Extremeness : how extreme the weather is on a particular day  Change: how different the observed weather data is from previous days’ weather
  • 38. Analysis and Results  Tweet Rates and Weather Reports
  • 39. Analysis and Results  Linear Regression  the relationship between a set of weather-derived features and the daily rate of weather-related tweets
  • 40. Analysis and Results  Correlating Basic Weather Data and Tweet Rates
  • 41. Analysis and Results  Correlating Expectation and Tweet Rates  expectation measure adds little information about likely tweet rates beyond what is already contained in basic weather data  Correlating Extremeness and Tweet Rates  extremeness can independently explain more of the variation in weather-related tweet rates than basic weather alone  Correlating Delta Change and Tweet Rates  there is little difference in the amount of information gained from building these deltachange models  Combining Extremeness, Expectation, and Delta Change Models
  • 42. Analysis and Results  Per-Location Models
  • 44. Discussion  Additional Factors Likely to Effect Tweet Rates  Sentiment  Privacy concerns, embarrassments and safety:  Population segments :  Mobile devices  Time-of-Day, day-of-week, holiday, and other effects of time:
  • 45. Conclusions  the correlation between daily tweet rates and the expectation, extremeness, and the change in observed weather.  global models  location-specific models  Extremeness>change>expectation