Generating event storylines from microblogs

Generating Event Storylinesfrom
Microblogs
CIKM’12

ABSTRACT
 we explore the problem of generating storylines

from microblogs for user input queries.
 Given a query of an ongoing event, we propose
to sketch the real-time storyline of the event by a
two-level solution.
1. propose a language model with dynamic
pseudo relevance feedback to obtain relevant
tweets
2. Generate storylines via graph optimization

INTRODUCTION
 Generating Event Storyline from Microblogs

(GESM)

INTRODUCTION
 differences between GESM and prior studies：

Well edited facts ---- short noisy text
2. GESM provides personalized service
3. A two-level framework is necessary: at the low
level, finding all relevant tweets through the
time-line of the event by a retrieve model; and
at the high level, summarizing relevant tweets
and the latent structure to produce a storyline.
1.

INTRODUCTION
 Challenges

1、the dynamic and sparse nature of microblogs
——How to match the underlying event expressed
by the vague event query to potential relevant
tweets which possibly not contain any query terms
2、Numerous duplicate tweets and direct and
undirect re-tweets

INTRODUCTION
 contributions

generating event storylines from microblogs
2. A dynamic pseudo relevance feedback (DPRF)
language
model
3. a graph-based optimization problem and is
solved by approximation algorithms of
minimum-weight dominating set and directed
Steiner tree
1.

THE FRAMEWORK OVERVIEW
 generated storyline should be a graph structure
 Node is labeled by a summary
 Edge represents causal relationship between two

phases
 Offline layer
 Online layers

THE RETRIEVAL MODEL
 Preliminaries
 the original query is usually short and vague
 Query expansion
 In a pseudo relevance manner, suppose the few top

ranked documents d + by the initial query Q builds a
relevant model θ F , we can set the new query to be
a linear combination of original query Q and
relevant model θF

THE RETRIEVAL MODEL
 Dynamic Pseudo Relevance Feedback
 K burst periods
 Assume that the prior probability of relevant

document d + is dependent on the distance of td+
to the centroid
of burst periods, denoted as Φ = { φ 1 ··· φ K }
 three probability functions to model the effective
range of burst period, decay coefficient and
skewness.
1. Mixture Gaussian Distribution
2. Local Power Distribution
3. Skewed Linear Distribution

THE RETRIEVAL MODEL
 Mixture Gaussian Distribution

 Local Power Distribution

 Skewed Linear Distribution

THE RETRIEVAL MODEL
 Burst Period Detection

appear more frequently than usual
2. be continuously frequent around the time point.
 detect burst periods of the event by
1. for each query term, finding the time intervals
with arbitrary length in which the query term
appears constantly frequent;
2. picking the time points within these intervals
with the
largest sum of frequencies over all query terms.
1.

THE RETRIEVAL MODEL
 “bursty score”

 find time interval Tw,j = <st, et, LS, RS> with the

maximal cumulative burst score B ( w, Tw,j )
 Compute the score of any query term q at each

time point

 Rank each time point by ∑q∈QH ( q,t )and choose

the largest K time point φk .

STORYLINE GENERATION
Representative tweets
2. Depict the evolving structure of the event
3. an optimistic connection
 a multi-view tweet graph is constructed
 a minimum dominant set on the tweet graph
 a minimum steiner tree
1.


 three non negative real parameters α, τ1, τ2 , τ1<

τ2 .
 define E : text similarity > α
 deﬁne A : τ1 ≤ t j − t i ≤ τ2
 w(vi ) = 1 − score ( Q,vi ).

 A subset S of the vertex set of an undirected

graph is a
dominating set if for each vertex u ,either u is in
S or is adjacent to a vertex in S .

 greedy algorithm

 A Steiner tree of a graph G with respect to a

vertex subset S is the edge-induced sub-tree of G
that contains all the vertices of S having the
minimum total cost, where the cost is
the total weight of the vertices.

EXPERIMENTS
 Tweet Retrieval
 49 queries
 evaluation metric :
 precision at top 30 tweets(P@30)
 mean average precision(MAP)

 precision at top 100 tweets(P@100)
 R-precision (R-PREC)

EXPERIMENTS
 Comparative Study

EXPERIMENTS
 Parameter Tuning

EXPERIMENTS
 Summarization Capability

CONCLUSION
 The proposed dynamic pseudo relevance

feedback model
 minimum weighted Steiner tree on a dominant set
 充分的实验

OMG, I Have to Tweet That!
A Study of Factors that Influence Tweet
Rates

Abstract
 key limitation ：
 it depends on people self reporting their own

behaviors
and observations.
 a large scale quantitative analysis of some of

the factors that influence self reporting bias.
 the daily variations in tweet rates about weather
events

Introduction
 treating social media as a signal to measure the

relative real-world occurrence of events
 critical challenge ：
 the bias introduced by the self-reported nature of

social media
 What is it about an event that makes it more or

less “tweetable”?
 A first large-scale, quantitative analysis of some
of the factors that influence self-reporting bias by
comparing a year of tweets about weather
events in cities across the United States and
Canada to ground-truth knowledge about actual
weather occurrences.

Introduction
 three potential factors ：

How extreme is the weather?
2. How expected is the weather given the time-ofyear?
3. How much did the weather change?
1.

Data Preparation
 Jun 1, 2010 and Jun 30, 2011
 56 different metropolitan areas
 historical weather data provided by the National

Oceanic and Atmospheric Administration of the
United States.

Identifying Weather-related Tweets
 discovering the rate of weather-related tweets

that occurred per-day across metropolitan areas
1. filtering the full archive of tweets for tweets that
contain at least 1 weather-related word from a
list of 179 weather-related words and phrases
2. build a classifier for weather-related tweets

 a simple classifier that estimates the probability

of a tweet being weather related as

Identifying the Location of Tweets
 geo-coded
 the textual user- provided location field in a user’s

Twitter
profile
 normalize the textual
 arbitrary user-provided location information into

concrete
geo-coded coordinates
a mapping from user-provided location fields to
latitude-longitude coordinates.
2. merge location fields with similar geo-mappings
together to create clusters for roughly metropolitansized areas
1.

Identifying the Location of Tweets

Historical Weather Data
 calculate daily summaries
 For each daily summary of weather data at a

location：
 Expectation: how normal the observed weather
is at a location
 Extremeness : how extreme the weather is on a
particular day
 Change: how different the observed weather data
is from previous days’ weather

Analysis and Results
 Tweet Rates and Weather Reports

 Linear Regression
 the relationship between a set of weather-derived

features and the daily rate of weather-related
tweets

 Correlating Basic Weather Data and Tweet

Rates

 Correlating Expectation and Tweet Rates
 expectation measure adds little information about

likely tweet rates beyond what is already
contained in basic weather data
 Correlating Extremeness and Tweet Rates
 extremeness can independently explain more of
the variation in weather-related tweet rates than
basic weather alone
 Correlating Delta Change and Tweet Rates
 there is little difference in the amount of
information gained from building these deltachange models
 Combining Extremeness, Expectation, and
Delta
Change Models

 Per-Location Models

Generating event storylines from microblogs

Discussion
 Additional Factors Likely to Effect Tweet

Rates
 Sentiment
 Privacy concerns, embarrassments and safety:
 Population segments :
 Mobile devices
 Time-of-Day, day-of-week, holiday, and other
effects of time:

Conclusions
 the correlation between daily tweet

rates and the expectation, extremeness, and the
change in
observed weather.
 global models
 location-specific models
 Extremeness>change>expectation

Generating event storylines from microblogs

More Related Content

Viewers also liked (13)

Similar to Generating event storylines from microblogs (20)

Recently uploaded (20)

Generating event storylines from microblogs