SlideShare a Scribd company logo
+
Effective Event Identification in
Social Media
2014/4/27(Mon.)
Chang Wei-Yuan @ MakeLab Lab Meeting
Fotis Psallidas, Hila Becker
DEB’13
+
Outline
n Introduction
n Method
n Known-Event Identification
n Unknown-Event Identification
n Improving Identification Effectiveness
n Experimental Evaluation
n Conclusion
n Thought
2
+
Introduction
nOnline social media are extensively
distribute content related to real-world
events.
n event: something that occurs at a certain time
in a certain place
3
+
Introduction
nOnline social media are extensively
distribute content related to real-world
events.
n event: something that occurs at a certain time
in a certain place
4
Goal:Identifying Events and
Associated Social Media Documents
+
Introduction
nGeneral approach: group similar
documents via clustering
n Each cluster corresponds to one event and
its associated social media documents
5
+
Introduction
nChallenges
n Uneven data quality
n Highly heterogeneous
n Dynamic data stream of event information
n Number of events unknown
6
+
Event Identification
nKnown-Event Identification
nUnknown-Event Identification
nImproving Identification Effectiveness
7
+
Known-Event Identification
8
+
Known-Event Identification
nSocial media content related to known
events
n reside in multiple social media sites, each
contributing different information
nTo retrieve cross-site social media
documents for same event
n miss many relevant event documents
9
+
Known-Event Identification
nIn the first step, using the known event
properties to achieve high-precision
results.
nIn the second step, using term extraction
and frequency analysis to improve
recall.
10
+
Unknown-Event Identification
11
na Twitter stream may contain many
tweets related to an event
n with messages related to other events
n with messages unrelated to events
+
Unknown-Event Identification
nThe proposed online clustering
framework
n leverages the multiple features to decide
when two social media documents
correspond to the same event
12
Social Media Document Clustering
Framework
Document  feature
representation
Social  media
documents
Event  clusters
13
Ensemble Algorithm
nThe proposed online clustering framework
n deployed ensemble learning methods to learn and
associate each feature with a weight and a
threshold that capture the importance of the
features
14
Consensus Function:
combine ensemble
similarities
Wtitle
Wtags
Wtime
15
f(C,W)
Ctitle
Ctag
s
Ctime
Ensemble
clustering
solution
Learned in a
training step
Ensemble Algorithm
Event Classification
nThe proposed online clustering framework
n deployed event classification to distinguish
between event-related clusters and non-event
ones
16
Event Classification 17
Ensemble
clustering
solution
Event
unrelated to events
related to an event
event classification
+
Improving Identification
Effectiveness
nHow events behave over time have a
significant impact on the effectiveness
of the document clustering procedure?
nHow to refine the clustering procedure
to benefit from these factors is a
challenging task?
18
+
URLs
nURLs in event-related social streams
are ubiquitous. Individuals use them to
share meaningful event-related external
content.
19
+
Bursty Vocabulary
nThe social media content related to an
event tends to revolve around a central
topic.
n this central topic is expressed by a set of
terms that is significantly more frequent
n span a wide time range exhibit a different set
of these bursty terms at different points of
their lifetime.
20
+
Bursty Vocabulary
nThe social media content related to an
event tends to revolve around a central
topic.
21
+
Time Decay
na time decay function to the clustering
framework
n penalizes clusters that have been inactive for
a long time.
n re-triggers events that have been inactive for
some time if the similarity score without the
time-decay factor is strong enough.
22
+
Experimental Evaluation
nData
n Upcoming dataset
n 273,842 multi-featured Flickr photos that
correspond to 9,613 real-world events from
the Upcoming event.
nthe BurstyV + TimeDec technique
obtained the highest quality results.
23
+
Conclusion
nThis article discussed the event
identification task under two different
scenarios, known- and.
nWe showed how to identify event
content effectively
n how we can exploit rich features of the social
media documents
n revealing temporal patterns of the relevant
content
24
+
Thought
25
+
Thanks for listening.
2014 / 4 / 27(Mon.) @ MakeLab Group Meeting
v123582@gmail.com

More Related Content

PPT
Event Identification in Social Media
ODP
Social Tags and Music Information Retrieval (Part I)
PDF
Surfacing Real-World Event Content on Twitter
PDF
Kafka and Storm - event processing in realtime
PDF
Learning Similarity Metrics for Event Identification in Social Media
PDF
Pre-defense_talk
PPTX
Identification and Characterization of Events in Social Media
PPTX
Presentationcasestudy eventdetectionintwitter-140912160454-phpapp01
Event Identification in Social Media
Social Tags and Music Information Retrieval (Part I)
Surfacing Real-World Event Content on Twitter
Kafka and Storm - event processing in realtime
Learning Similarity Metrics for Event Identification in Social Media
Pre-defense_talk
Identification and Characterization of Events in Social Media
Presentationcasestudy eventdetectionintwitter-140912160454-phpapp01

Similar to Effective Event Identification in Social Media (20)

PPTX
Presentation, case study_event detection in twitter
PDF
final_nlp
PDF
Carneval in Rio or St. Patricks Day? Detecting Events in Social Media
PDF
UPC at MediaEval Social Event Detection 2013
PPTX
A Framework for Collecting, Extracting and Managing Event Identity Informatio...
PDF
Atu media eval_sed2014
PPT
CERTH @ MediaEval 2014 Social Event Detection Task
PDF
IRJET- Event Detection and Text Summary by Disaster Warning
PPTX
A Model of Events for Integrating Event-based Information in Complex Socio-te...
PDF
Event detection and summarization based on social networks and semantic query...
PDF
Event detection in twitter using text and image fusion
PPTX
DC_NLP_June2015_Meetup_Twitter_Trending_Topic_Detection
PDF
Major project presentation
PPTX
Hila wsdm12-final
PDF
eventdemo2016
PDF
Pestle based event detection and classification
PPTX
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
PDF
Temporal models for mining, ranking and recommendation in the Web
PPT
Earthquake shakes twitter users
PDF
UKSG Conference 2017 Breakout - Crossref Event Data: tools for DIY analyses o...
Presentation, case study_event detection in twitter
final_nlp
Carneval in Rio or St. Patricks Day? Detecting Events in Social Media
UPC at MediaEval Social Event Detection 2013
A Framework for Collecting, Extracting and Managing Event Identity Informatio...
Atu media eval_sed2014
CERTH @ MediaEval 2014 Social Event Detection Task
IRJET- Event Detection and Text Summary by Disaster Warning
A Model of Events for Integrating Event-based Information in Complex Socio-te...
Event detection and summarization based on social networks and semantic query...
Event detection in twitter using text and image fusion
DC_NLP_June2015_Meetup_Twitter_Trending_Topic_Detection
Major project presentation
Hila wsdm12-final
eventdemo2016
Pestle based event detection and classification
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
Temporal models for mining, ranking and recommendation in the Web
Earthquake shakes twitter users
UKSG Conference 2017 Breakout - Crossref Event Data: tools for DIY analyses o...
Ad

More from Wei-Yuan Chang (20)

PDF
Python Fundamentals - Basic
PDF
Data Analysis with Python - Pandas | WeiYuan
PDF
Data Crawler using Python (I) | WeiYuan
PDF
Learning to Use Git | WeiYuan
PDF
Scientific Computing with Python - NumPy | WeiYuan
PDF
Basic Web Development | WeiYuan
PDF
資料視覺化 - D3 的第一堂課 | WeiYuan
PDF
JavaScript Beginner Tutorial | WeiYuan
PDF
Python fundamentals - basic | WeiYuan
PDF
Introduce to PredictionIO
PDF
Analysis and Classification of Respiratory Health Risks with Respect to Air P...
PDF
Forecasting Fine Grained Air Quality Based on Big Data
PDF
On the Coverage of Science in the Media a Big Data Study on the Impact of th...
PDF
On the Ground Validation of Online Diagnosis with Twitter and Medical Records
PDF
Eears (earthquake alert and report system) a real time decision support syst...
PDF
Fine Grained Location Extraction from Tweets with Temporal Awareness
PPTX
Practical Lessons from Predicting Clicks on Ads at Facebook
PDF
How many folders do you really need ? Classifying email into a handful of cat...
PDF
Extending faceted search to the general web
PDF
Discovering human places of interest from multimodal mobile phone data
Python Fundamentals - Basic
Data Analysis with Python - Pandas | WeiYuan
Data Crawler using Python (I) | WeiYuan
Learning to Use Git | WeiYuan
Scientific Computing with Python - NumPy | WeiYuan
Basic Web Development | WeiYuan
資料視覺化 - D3 的第一堂課 | WeiYuan
JavaScript Beginner Tutorial | WeiYuan
Python fundamentals - basic | WeiYuan
Introduce to PredictionIO
Analysis and Classification of Respiratory Health Risks with Respect to Air P...
Forecasting Fine Grained Air Quality Based on Big Data
On the Coverage of Science in the Media a Big Data Study on the Impact of th...
On the Ground Validation of Online Diagnosis with Twitter and Medical Records
Eears (earthquake alert and report system) a real time decision support syst...
Fine Grained Location Extraction from Tweets with Temporal Awareness
Practical Lessons from Predicting Clicks on Ads at Facebook
How many folders do you really need ? Classifying email into a handful of cat...
Extending faceted search to the general web
Discovering human places of interest from multimodal mobile phone data
Ad

Recently uploaded (20)

PDF
Transcultural that can help you someday.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
annual-report-2024-2025 original latest.
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Microsoft Core Cloud Services powerpoint
PDF
Business Analytics and business intelligence.pdf
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Managing Community Partner Relationships
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PDF
Introduction to the R Programming Language
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
CYBER SECURITY the Next Warefare Tactics
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
DOCX
Factor Analysis Word Document Presentation
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
Transcultural that can help you someday.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
STERILIZATION AND DISINFECTION-1.ppthhhbx
annual-report-2024-2025 original latest.
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Microsoft Core Cloud Services powerpoint
Business Analytics and business intelligence.pdf
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Managing Community Partner Relationships
ISS -ESG Data flows What is ESG and HowHow
importance of Data-Visualization-in-Data-Science. for mba studnts
Introduction to the R Programming Language
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
CYBER SECURITY the Next Warefare Tactics
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Factor Analysis Word Document Presentation
Optimise Shopper Experiences with a Strong Data Estate.pdf
Qualitative Qantitative and Mixed Methods.pptx

Effective Event Identification in Social Media

Editor's Notes

  • #2: Fotis Psallidas: Columbia University Hila Becker: Google, Inc.
  • #9: For instance, YouTube might contain videos for the Super Bowl event, whereas Twitter users might discuss the event by sharing short text messages, or tweets. Such highly specific queries tend to retrieve event-related documents with high precision but with low recall.
  • #15: We can cluster out document collection according to the variety of feature reps. discussed, each would have its own