SlideShare a Scribd company logo
Event
Summarization
using Tweets
Deepayan Chakrabarti and
KunalPunera
Yahoo!Research
Abstract
 For

some highly structured and recurring
events, such as sports, it is better to use more
sophisticated techniques to summarize the
relevant tweets.
 A solution based on learning the underlying
hidden state representation of the event via
Hidden Markov Models.
Introduction
 one-shot

events
 Have “structure” or are long-running
 (a)the most recent tweets could
be repeating the same information about
the event
 (b)most users would be interested in a
summary of the occurrences in the game
so far.
Introduction
 Our

goal:to extract a few tweets that
best describe the chain of interesting
occurrences in that event

A
1.
2.

two-step process:
Segment the event time-line
pick key tweets to describe each
segment
Introduction
 challenges

:
 Events are typically “bursty”
 Separate sub-events may not be temporally
far apart
 Previous instances of similar events are
available.
 Tweets are noisy
 Strong empirical results.
Characteristics of Sports Coverage
in Tweets
Characteristics of Sports
Coverage in Tweets
Characteristics of Sports
Coverage in Tweets
 Some
1.
2.

issues of this data:
sub-events are marked by increased
frequency of tweets.
Boundaries of sub-events also result in a
change in vocabulary of tweets.
Algorithms
 Baseline:

SUMMALLTEXT
 associate with each tweet a vector of the
TF-logIDF of its constituent words
 Cosine distance
 Select those tweets which are closest to
all other tweets from theevent.
Algorithms
Algorithms
 Several
1.
2.

defects:
O ( |Z|2) computations
heavily biased towards the most popular
sub-event
Algorithms
 Baseline:
1.
2.

SUMMTIMEINT
Split up the duration into equal-sized
time intervals
Select the key tweets from each interval

 Two
1.
2.

extra parameters:
a segmentation TS of the duration of the
event into equal-time windows
the minimum activity threshold l
Algorithms
Algorithms
 Defects:

Burstiness of tweet volume:
 Multiple sub-events in the same burst:
 “Cold Start” :

Algorithms
 Our

Approach: SUMMHMM
 BACKGROUND ON HMMS:
 N states labeled S1 ,…, SN ,
 A set of observation symbols v1 ,…, vM
 bi(k)
 a ij
πi
Algorithms
 Each

state: one class of sub-events
 The symbols: the words used in tweets
 The variation in symbol probabilities
across different states: the different
“language models” used by the Twitter
users
 The transitions between states models the
chain of sub-events over time
Algorithms
 Our

Modifications
 OUTPUTS PER TIME – STEP: a multiset of
symbols
 DETECTING BURSTS IN TWEET VOLUME:
 COMBINING INFORMATION FROM
MULTIPLE EVENTS
Algorithms
 three

sets of symbol probabilities:
 (1)θ( s ) , which is specific to each state but
is the same for all events,
 (2) θ( sg ) , which is specific to a particular
state for a particular game
 (3) θ( bg ) , which is a background
distribution of symbols over all states
and games.
Algorithms
 Algorithm

Summary
 Input: multiple events of the same type
 Learns the model parameters that bestfit
the data. (EM algorithm)
 the optimal segmentation (standard V
iterbi algorithm)
Algorithms
 standard

Viterbi algorithm:
Algorithms
Experiments
 Experimental

Setup
 professional American Football
 Sep 12th, 2010 to Jan 24th, 2011
 over 440K tweets over 150 games for an
average of around 1760 tweets per
game.
Experiments
 MANUAL

GROUND TRUTH CONSTRUCTION .
 Each output tweet was matched with the
happenings in the game and labeled as
Comment-Play , Comment-Game , or
Comment-General .
Experiments
 Play-by-Play

Performance
 RECALL
 PRECISION

Summary Construction
 EVALUATION

AT OPERATING POINT .
Event summarization using tweets
Event summarization using tweets
Event summarization using tweets
conclusion
 We

proposed an approach based on
learning an underlying hidden state
representation of an event .
Towards Twitter
Context
Summarization
with User
Influence
Models
ABSTRACT
 Traditional

summarization techniques only
consider text information.
 We study how user influence
models, which project user interaction
information onto a Twitter context
tree, can help Twitter context
summarization within a supervised
learning framework.
INTRODUCTION
A

Twitter context tree is defined as a tree
structure of tweets which are connected
with reply relationship, and the root of a
context tree is its original tweet.
 two types of user influence models, called
pair-wise user influence model and global
user influence model.
 Granger Causality influence model
 PageRank algorithm
TWITTER CONTEXT TREE ANALYSIS
 The

temporal growth of the Twitter
context tree
TWITTER CONTEXT TREE ANALYSIS
 Whether

the tree structure can help
the summarization task
USER INFLUENCE MODELS
 Granger

Causality Influence Model
 A time series data x is to Granger cause
another time series data y ,If and only if
regressing for y in terms of both past
values of y and x is statistically significantly
more accurate than regressing for y in
terms of past values of y only. Let
USER INFLUENCE MODELS


Lasso-Granger method



Lag ( X,T )to denote the lagged version of
data X ;
FullyConnectedFeatureGraph ( X ) denotes
the fully connected graph defined over the
features;
Lasso ( y, Xlag )denotes the set of temporal
variables receiving a non-zero co-efficient by
the Lasso algorithm.




Event summarization using tweets
USER INFLUENCE MODELS
 Pagerank

Influence Model
 For each user u , it has a directed edge to
each user v if u has a reply or a retweet to
v ’s tweet and we can have a global user
graph G .
SUMMARIZATION METHOD
 Text-based
 TFIDF

Signals
SUMMARIZATION METHOD
 Popularity

Signals
 Number of replies, number of retweets,
and number of followers for a given
tweet’s author.
SUMMARIZATION METHOD
 Temporal
1.
2.

Signals
fit the age of tweets in a context tree
into an exponential distribution.
for each tweet, we compute its
temporal signal as the likelihood of
sampling its age from the fitted
exponential distribution.
Supervised Learning Framework
 Gradient

algorithm

Boosted Decision Tree(GBDT)
EDITORIAL DATA SET
 10

Twitter context trees from March 7th
to March 20th,2011
 4 are initiated by Lady Gaga
 6 are initiated by Justin Bieber
1. read the root tweet
2. Scans through all candidate tweets
3. Selects 5 to 10 tweets
EDITORIAL DATA SET
Event summarization using tweets
EXPERIMENTS
 Evaluation

Metrics
Methods for Comparison













Centroid:
SimToRoot:
Linear:
Mead:
LexRank
SVD:
ContentOnly
ContentAttribute:
AllNoGranger:
All:
Experimental Results
 Overall

Comparison
Event summarization using tweets
Event summarization using tweets
Event summarization using tweets
Event summarization using tweets
Event summarization using tweets
CONCLUSION
 User

influence information is very helpful to
generate a high quality summary for each
Twitter context tree.
 All signals are converted into features, and
we cast Twitter context summarization into a
supervised learning problem.

More Related Content

PPT
Earthquake shakes twitter users
PPTX
Earthquake shakes twitter users real-time event detection by social sensors
PDF
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
PPTX
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
PPTX
Semantic Twitter Analyzing Tweets For Real Time Event Notification
DOCX
Outsourcing privacy preserving social networks to a cloud
PPTX
Generating event storylines from microblogs
PDF
Group-13 Project 15 Sub event detection on social media
Earthquake shakes twitter users
Earthquake shakes twitter users real-time event detection by social sensors
SENTIMENT ANALYSIS AND GEOGRAPHICAL ANALYSIS FOR ENHANCING SECURITY
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Outsourcing privacy preserving social networks to a cloud
Generating event storylines from microblogs
Group-13 Project 15 Sub event detection on social media

What's hot (13)

PDF
529 199-206
DOCX
Spam email filtering
PDF
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
PDF
Comparison of Secret Splitting, Secret Sharing and Recursive Threshold Visual...
PDF
Review on key predistribution schemes in wireless sensor networks
PDF
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
PDF
Identifying Emotions in Tweets related to the Brazilian Stock Market
PDF
A TRADEOFF-BASED SECURITY MODEL AGAINST CLICK SPAM ORIGINATED BY SINGLE IP AD...
PDF
Epidemiological Modeling of News and Rumors on Twitter
PDF
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
DOCX
Hop by hop message authentication chapter 1
PPTX
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
PDF
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
529 199-206
Spam email filtering
Protected Data Collection In WSN by Filtering Attackers Influence (Published ...
Comparison of Secret Splitting, Secret Sharing and Recursive Threshold Visual...
Review on key predistribution schemes in wireless sensor networks
Crowdsourcing the Annotation of Rumourous Conversations in Social Media
Identifying Emotions in Tweets related to the Brazilian Stock Market
A TRADEOFF-BASED SECURITY MODEL AGAINST CLICK SPAM ORIGINATED BY SINGLE IP AD...
Epidemiological Modeling of News and Rumors on Twitter
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Hop by hop message authentication chapter 1
10 Reasons Why Data-driven App Design Needs Social Science | Julian Runge
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Ad

Similar to Event summarization using tweets (20)

PDF
On Summarization and Timeline Generation for Evolutionary Tweet Streams
PDF
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
PDF
Tweet Summarization and Segmentation: A Survey
PDF
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
PDF
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
PDF
Mining Twitter for Real-Time Trend and Information Discovery
PDF
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
PDF
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
PDF
IRJET- Event Detection and Text Summary by Disaster Warning
PPTX
Twitter Sub-event Detection Project Presentation
PPTX
Self Trending a Tweet - Cluster and Topic Analysis on Tweets
PDF
Twitter as a personalizable information service ii
PPTX
Twitter_Sentiment_analysis.pptx
PDF
Surfacing Real-World Event Content on Twitter
PPTX
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
PDF
Analyzing Social media’s real data detection through Web content mining using...
PPTX
DC_NLP_June2015_Meetup_Twitter_Trending_Topic_Detection
PDF
Sensing Trending Topics in Twitter for Greater Jakarta Area
PDF
Event detection and summarization based on social networks and semantic query...
PDF
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
On Summarization and Timeline Generation for Evolutionary Tweet Streams
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Tweet Summarization and Segmentation: A Survey
IRJET- An Experimental Evaluation of Mechanical Properties of Bamboo Fiber Re...
IRJET- Tweet Segmentation and its Application to Named Entity Recognition
Mining Twitter for Real-Time Trend and Information Discovery
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
IRJET- Event Detection and Text Summary by Disaster Warning
Twitter Sub-event Detection Project Presentation
Self Trending a Tweet - Cluster and Topic Analysis on Tweets
Twitter as a personalizable information service ii
Twitter_Sentiment_analysis.pptx
Surfacing Real-World Event Content on Twitter
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
Analyzing Social media’s real data detection through Web content mining using...
DC_NLP_June2015_Meetup_Twitter_Trending_Topic_Detection
Sensing Trending Topics in Twitter for Greater Jakarta Area
Event detection and summarization based on social networks and semantic query...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
Ad

More from moresmile (8)

PPTX
When relevance is not enough
PPTX
Using content and interactions for discovering communities in
PPTX
Topical keyphrase extraction from twitter
PPTX
Questions about questions
PPTX
Magnet community identification on social networks
PPTX
Is it time for a career switch
PPTX
Finding bursty topics from microblogs
PPTX
Exploring social influence via posterior effect of word of-mouth
When relevance is not enough
Using content and interactions for discovering communities in
Topical keyphrase extraction from twitter
Questions about questions
Magnet community identification on social networks
Is it time for a career switch
Finding bursty topics from microblogs
Exploring social influence via posterior effect of word of-mouth

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Spectroscopy.pptx food analysis technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Cloud computing and distributed systems.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation theory and applications.pdf
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Unlocking AI with Model Context Protocol (MCP)
Spectroscopy.pptx food analysis technology
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Encapsulation_ Review paper, used for researhc scholars
Cloud computing and distributed systems.
Programs and apps: productivity, graphics, security and other tools
Review of recent advances in non-invasive hemoglobin estimation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation theory and applications.pdf

Event summarization using tweets