SlideShare a Scribd company logo
@MikeMayer
What is Twitter?
                             • Twitter is categorized as a
                               microblogging service.
                             • Twitter users post small
                               blurbs of text that are 140
                               characters or less called
                               tweets.
                             • With url shorteners and
                               services tailored for Twitter a
                               lot of information can be
                               conveyed in that small
                               space.
                             • Twitter is very free-form and
                               still ways to categorize
                               tweets have emerged.
             Fusion Search     (hashtags)

@MikeMayer
How is Twitter useful as a sensor?
      Twitter users will often report their status, however relevant
       or irrelevant, to the interest of others
      This means that the public timeline is full of noise
      The timeline is updated in real-time, faster than a blog,
       faster than a “static” document
      Tweets are faster than traditional news and users select
       from a buffet of other users to customize their news
      However, if the tweets are carefully selected there can be a
       great deal of useful information found
      Tweets contain a great deal of metadata

@MikeMayer
JSON
representation
of a single
Tweet




@MikeMayer   Source: http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php
“Each Twitter user is a sensor and
         each Tweet is sensory information”
      Of course context must be considered… more on that
       soon

      A bag of words approach isn’t good enough for
       detecting earthquakes
          “My dryer is shaking like crazy”
          “Didn’t they used to have a ride at carnivals called
           Earthquake?”

      The paper suggests a machine learning approach to
       determining the context


@MikeMayer
Event Detection
      The primary focus of the paper is to determine the
       means to detect an event using so called social
       sensors

      Events are “arbitrary classifications of space/time
       regions”

      Targeted events are natural occurrences (weather,
       earthquakes, etc.) and human made (traffic jams,
       crime, etc.)




@MikeMayer
Semantic Analysis for Tweets
      As said before, a bag of words is simply not good
       enough

      To detect and target events they use a SVM (support
       vector machine), a widely used machine-learning
       algorithm

      They classify Tweets into three components
             A. Statistical features (number of words…)
             B. Keyword features
             C. Word context features (words around a “query word”)



@MikeMayer
Support Vector Machine
      Support vector machines (SVMs) are a set of related
       supervised learning methods used for classification
       and regression. In simple words, given a set of
       training examples, each marked as belonging to one
       of two categories, an SVM training algorithm builds a
       model that predicts whether a new example falls into
       one category or the other.1

      Very mathy- basically a way to classify data better



     1. http://en.wikipedia.org/w/index.php?title=Special:Cite&page=Support_vector_machine&id=361629294

@MikeMayer
Tweets as sensory values
      Assumption 1 – “Each twitter user is regarded as a
       sensor…”
          Twitter has over 100 million users1
          That’s enough sensors to make up for the ones not
           operating correctly (asleep, tweeting gibberish, busy
           doing something else…)


      Assumption 2 – “Each tweet is associated with a time
       and location…”
          The location is the most fundamental requirement for tweets
           as a sensor
               1. http://economictimes.indiatimes.com/infotech/internet/Twitter-snags-over-100-million-users-eyes-money-
               making/articleshow/5808927.cms
@MikeMayer
Modeling
     Temporal Model                Spatial Model

      Every Tweet has a            Tweets considered in
       created_at chunk of           this system require
       data                          geolocation information

      Using probability the        The spatial model is far
       paper describes a way         more complicated
       to detect the probability
       of an event occuring         Need to consider time
                                     and a delay as event
                                     spreads (earthquake)


@MikeMayer
Spatial Model Continued
     Kalman Filters                    Particle Filters

      The paper describes an           Using Twitter user
       application of Kalman             geographic distribution
       filters to model two
       cases:                           Generate a set of
         1. Location estimate of
                                         coordinates and sort
            earthquake center            them by weight
         2. Trajectory estimation of    Resample and generate
            a typhoon
                                         a new set, predict new
                                         sets, weigh the sets,
                                         measure, then iterate
                                         until convergence

@MikeMayer
Twitter problems that affect
                  statistical analysis
      Sensors are not independent of each other

      One user will see another user’s tweets then can re-
       post them or re-tweet them

      Some of the algorithms described before would be
       more accurate if the sensors were independent




@MikeMayer
Experimentation and
                         Evaluations
      Finally they describe their experimentation methodology and
       evaluate their findings

     First, their algorithm:

     1.      Given a set of query terms G for a target event

     2.      Issue a query every s seconds and obtain tweets T

     3.      For each tweet obtain the features A,B, and C that were described
             earlier

     4.      Calculate the probability of occurrence using the SVM

     5.      For each tweet estimate its location based on the coordinates given
             or by querying Google Maps with the registered location of the user

     6.      Calculate the estimated distance from the Tweet to the event

@MikeMayer
Semantic Analysis Evaluation
      It turns out that the most important part of a Tweet is
       not the context of the words (C) nor is the content (B)
       it is in fact the statistical property (A)

      During an event users are surprised and send very
       short messages

      “Earthquake!”




@MikeMayer
Spatial Estimation
                  Evaluation
      The Kalman filter did a poor job at filtering out the
       noise in determining the probable location of the event

      It was difficult to locate events that were in sparsely
       populated areas as well as events that are surrounded
       in water

      In a naïve and straightforward way they mention that
       the number of sensors provide the most accurate
       positioning of an event




@MikeMayer
Conclusions 1
      I’ve thought that using Twitter as a sensor was an
       interesting idea for months.

      The first thing my mom does when there is an
       earthquake is run to her laptop and Tweet
       “EARTHQUAKE #socal”

      This paper is too mathematical for me to fully grasp in
       the short time given




@MikeMayer
Conclusions 2
     I found this fascinating:




     The fastest that an event was detected accurately was
     19 seconds.

     The accuracy they managed was very impressive.


@MikeMayer
Discussion Time
      Questions?




      Otherwise… onto the required points…




@MikeMayer
Discussion 1
      1. What the paper is about?
          Using Twitter (Tweets) as a sensor

      2. What is the major contribution?
          Showing that accuracy is possible

      3. What did you like best?
          The way the paper actually ended with positive results

      4. What are the weaknesses (according to you)?
          Generally they accomplished what they set out to do but
           it was very limited in scope (Japan). It could have also
           been applied to many more types of events.
@MikeMayer
Discussion 2
     1. What is the difference between a document, blog, and a
        micro-blog in the context of search systems?

     2. Tweets are considered to represent real time information.
        Is that right? What are its implications for News?

     3. What is a target event? How are tweets related to that?

     4. What is the goal of the system discussed in this paper?
        Do you think they are successful in their goal?

     5. Describe a particle filter. What does it do generally? How
        is it used in this paper?


@MikeMayer
Discussion 3
     6. What is a support vector machine? Why is it needed in
        this system?
     7. Human Sensors is an increasingly popular concept. Why
        do you think this is important? Give three examples
        where this could be effective.
     8. Discuss the system. How does it help? What are the
        critical steps in this algorithm?
     9. This paper talks about Kalman Filter and Particle Filter.
        What is the difference between these two? Do we need
        both or just one? If you are developing an application to
        detect location of an accident based on tweets – which
        one will you use?
     10. How has this paper changed your ideas of Twitter?
@MikeMayer
Thank You.
      Follow me on Twitter if you want…



      Personal: @MikeMayer

      Public: @MikeMayerDev




@MikeMayer
@MikeMayer

More Related Content

PPT
Earthquake shakes twitter users
PPTX
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
PPTX
Semantic Twitter Analyzing Tweets For Real Time Event Notification
DOCX
Tweet analysis for real time event detection and earthquake reporting system ...
PDF
Iaetsd real time event detection and alert system using sensors
PPTX
Event summarization using tweets
PDF
Social Sensor for Real Time Event Detection
PDF
Group-13 Project 15 Sub event detection on social media
Earthquake shakes twitter users
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Tweet analysis for real time event detection and earthquake reporting system ...
Iaetsd real time event detection and alert system using sensors
Event summarization using tweets
Social Sensor for Real Time Event Detection
Group-13 Project 15 Sub event detection on social media

What's hot (17)

PDF
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
PDF
Towards Enabling Probabilistic Databases for Participatory Sensing
PDF
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
ODP
Detecting Trends Through Twitter Stream v2
PDF
Twitter as a personalizable information service ii
PDF
GeospatialDataAnalysis
PDF
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
DOCX
Twitter sentiment analysis project report
DOCX
Sentiment analysis using machine learning
PDF
Link prediction 방법의 개념 및 활용
DOCX
Outsourcing privacy preserving social networks to a cloud
PDF
DH 199 Social Media Analytics
PDF
Trend detection and analysis on Twitter
PDF
Data-driven Studies on Social Networks: Privacy and Simulation
DOCX
Tweet sentiment analysis
PPTX
Link prediction with the linkpred tool
PDF
News construction from microblogging post using open data
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Towards Enabling Probabilistic Databases for Participatory Sensing
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
Detecting Trends Through Twitter Stream v2
Twitter as a personalizable information service ii
GeospatialDataAnalysis
MOVIE RATING PREDICTION BASED ON TWITTER SENTIMENT ANALYSIS
Twitter sentiment analysis project report
Sentiment analysis using machine learning
Link prediction 방법의 개념 및 활용
Outsourcing privacy preserving social networks to a cloud
DH 199 Social Media Analytics
Trend detection and analysis on Twitter
Data-driven Studies on Social Networks: Privacy and Simulation
Tweet sentiment analysis
Link prediction with the linkpred tool
News construction from microblogging post using open data
Ad

Similar to Earthquake shakes twitter users real-time event detection by social sensors (20)

PDF
Twaster final project report
PPTX
Generating event storylines from microblogs
PPT
Socialsensor project overview and topic discovery in tweeter streams
PPTX
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
PDF
Twitris - Web Information System 2011 Course
PPT
Classifying Twitter Content
PDF
Expelling Information of Events from Critical Public Space using Social Senso...
PDF
Sensing Trending Topics in Twitter for Greater Jakarta Area
PDF
Pre-defense_talk
PDF
Social cyber-criminal, towards automatic real time recognition of malicious p...
PDF
Ins and Outs of News Twitter as a Real-Time News Analysis Service
PDF
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
PDF
Final_report6
PDF
Towards Context-Aware Search and Analysis on Social Media Data
PPT
Twitter in Disaster Mode - Extremecom 2011
PDF
Kurniawan2016
PDF
Predicting the future with social media
PDF
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
PPTX
INEGI ESS big data workshop
Twaster final project report
Generating event storylines from microblogs
Socialsensor project overview and topic discovery in tweeter streams
Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data: Challenges and Expe...
Twitris - Web Information System 2011 Course
Classifying Twitter Content
Expelling Information of Events from Critical Public Space using Social Senso...
Sensing Trending Topics in Twitter for Greater Jakarta Area
Pre-defense_talk
Social cyber-criminal, towards automatic real time recognition of malicious p...
Ins and Outs of News Twitter as a Real-Time News Analysis Service
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Final_report6
Towards Context-Aware Search and Analysis on Social Media Data
Twitter in Disaster Mode - Extremecom 2011
Kurniawan2016
Predicting the future with social media
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
INEGI ESS big data workshop
Ad

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Spectroscopy.pptx food analysis technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
Teaching material agriculture food technology
PDF
Approach and Philosophy of On baking technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
Understanding_Digital_Forensics_Presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Spectroscopy.pptx food analysis technology
The AUB Centre for AI in Media Proposal.docx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Teaching material agriculture food technology
Approach and Philosophy of On baking technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development

Earthquake shakes twitter users real-time event detection by social sensors

  • 2. What is Twitter? • Twitter is categorized as a microblogging service. • Twitter users post small blurbs of text that are 140 characters or less called tweets. • With url shorteners and services tailored for Twitter a lot of information can be conveyed in that small space. • Twitter is very free-form and still ways to categorize tweets have emerged. Fusion Search (hashtags) @MikeMayer
  • 3. How is Twitter useful as a sensor?  Twitter users will often report their status, however relevant or irrelevant, to the interest of others  This means that the public timeline is full of noise  The timeline is updated in real-time, faster than a blog, faster than a “static” document  Tweets are faster than traditional news and users select from a buffet of other users to customize their news  However, if the tweets are carefully selected there can be a great deal of useful information found  Tweets contain a great deal of metadata @MikeMayer
  • 4. JSON representation of a single Tweet @MikeMayer Source: http://www.readwriteweb.com/archives/this_is_what_a_tweet_looks_like.php
  • 5. “Each Twitter user is a sensor and each Tweet is sensory information”  Of course context must be considered… more on that soon  A bag of words approach isn’t good enough for detecting earthquakes  “My dryer is shaking like crazy”  “Didn’t they used to have a ride at carnivals called Earthquake?”  The paper suggests a machine learning approach to determining the context @MikeMayer
  • 6. Event Detection  The primary focus of the paper is to determine the means to detect an event using so called social sensors  Events are “arbitrary classifications of space/time regions”  Targeted events are natural occurrences (weather, earthquakes, etc.) and human made (traffic jams, crime, etc.) @MikeMayer
  • 7. Semantic Analysis for Tweets  As said before, a bag of words is simply not good enough  To detect and target events they use a SVM (support vector machine), a widely used machine-learning algorithm  They classify Tweets into three components A. Statistical features (number of words…) B. Keyword features C. Word context features (words around a “query word”) @MikeMayer
  • 8. Support Vector Machine  Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. In simple words, given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.1  Very mathy- basically a way to classify data better 1. http://en.wikipedia.org/w/index.php?title=Special:Cite&page=Support_vector_machine&id=361629294 @MikeMayer
  • 9. Tweets as sensory values  Assumption 1 – “Each twitter user is regarded as a sensor…”  Twitter has over 100 million users1  That’s enough sensors to make up for the ones not operating correctly (asleep, tweeting gibberish, busy doing something else…)  Assumption 2 – “Each tweet is associated with a time and location…”  The location is the most fundamental requirement for tweets as a sensor 1. http://economictimes.indiatimes.com/infotech/internet/Twitter-snags-over-100-million-users-eyes-money- making/articleshow/5808927.cms @MikeMayer
  • 10. Modeling Temporal Model Spatial Model  Every Tweet has a  Tweets considered in created_at chunk of this system require data geolocation information  Using probability the  The spatial model is far paper describes a way more complicated to detect the probability of an event occuring  Need to consider time and a delay as event spreads (earthquake) @MikeMayer
  • 11. Spatial Model Continued Kalman Filters Particle Filters  The paper describes an  Using Twitter user application of Kalman geographic distribution filters to model two cases:  Generate a set of 1. Location estimate of coordinates and sort earthquake center them by weight 2. Trajectory estimation of  Resample and generate a typhoon a new set, predict new sets, weigh the sets, measure, then iterate until convergence @MikeMayer
  • 12. Twitter problems that affect statistical analysis  Sensors are not independent of each other  One user will see another user’s tweets then can re- post them or re-tweet them  Some of the algorithms described before would be more accurate if the sensors were independent @MikeMayer
  • 13. Experimentation and Evaluations  Finally they describe their experimentation methodology and evaluate their findings First, their algorithm: 1. Given a set of query terms G for a target event 2. Issue a query every s seconds and obtain tweets T 3. For each tweet obtain the features A,B, and C that were described earlier 4. Calculate the probability of occurrence using the SVM 5. For each tweet estimate its location based on the coordinates given or by querying Google Maps with the registered location of the user 6. Calculate the estimated distance from the Tweet to the event @MikeMayer
  • 14. Semantic Analysis Evaluation  It turns out that the most important part of a Tweet is not the context of the words (C) nor is the content (B) it is in fact the statistical property (A)  During an event users are surprised and send very short messages  “Earthquake!” @MikeMayer
  • 15. Spatial Estimation Evaluation  The Kalman filter did a poor job at filtering out the noise in determining the probable location of the event  It was difficult to locate events that were in sparsely populated areas as well as events that are surrounded in water  In a naïve and straightforward way they mention that the number of sensors provide the most accurate positioning of an event @MikeMayer
  • 16. Conclusions 1  I’ve thought that using Twitter as a sensor was an interesting idea for months.  The first thing my mom does when there is an earthquake is run to her laptop and Tweet “EARTHQUAKE #socal”  This paper is too mathematical for me to fully grasp in the short time given @MikeMayer
  • 17. Conclusions 2 I found this fascinating: The fastest that an event was detected accurately was 19 seconds. The accuracy they managed was very impressive. @MikeMayer
  • 18. Discussion Time  Questions?  Otherwise… onto the required points… @MikeMayer
  • 19. Discussion 1  1. What the paper is about?  Using Twitter (Tweets) as a sensor  2. What is the major contribution?  Showing that accuracy is possible  3. What did you like best?  The way the paper actually ended with positive results  4. What are the weaknesses (according to you)?  Generally they accomplished what they set out to do but it was very limited in scope (Japan). It could have also been applied to many more types of events. @MikeMayer
  • 20. Discussion 2 1. What is the difference between a document, blog, and a micro-blog in the context of search systems? 2. Tweets are considered to represent real time information. Is that right? What are its implications for News? 3. What is a target event? How are tweets related to that? 4. What is the goal of the system discussed in this paper? Do you think they are successful in their goal? 5. Describe a particle filter. What does it do generally? How is it used in this paper? @MikeMayer
  • 21. Discussion 3 6. What is a support vector machine? Why is it needed in this system? 7. Human Sensors is an increasingly popular concept. Why do you think this is important? Give three examples where this could be effective. 8. Discuss the system. How does it help? What are the critical steps in this algorithm? 9. This paper talks about Kalman Filter and Particle Filter. What is the difference between these two? Do we need both or just one? If you are developing an application to detect location of an accident based on tweets – which one will you use? 10. How has this paper changed your ideas of Twitter? @MikeMayer
  • 22. Thank You.  Follow me on Twitter if you want…  Personal: @MikeMayer  Public: @MikeMayerDev @MikeMayer