SlideShare a Scribd company logo
Towards Deep Learning from Twitter for
Improved Tsunami Alerts and Advisories
L. I. Lumb1 & J. R. Freemantle2
1York University & 2Independent
NH14A-03, 2017 AGU Fall Meeting
New Orleans, LA; December 11, 2017
Outline
● Motivation
● Previous Work
○ Text Classification
● Current Work
○ Natural Language Processing via Word Embeddings
○ Reanalysis of 2 Event Pairs
● Discussion
Geist, E.L., Titov, V.V., and Synolakis, C.E., 2006, Tsunami: wave of change: Scientific
American, v. 294, p. 56-63
Data extracted from Twitter via a Perl script that targets #earthquake
Lumb & Freemantle,
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
● Twitter metadata (handles, hashtags and URLs) contributes equally to Twitter
data (unstructured text that comprises the body of a Tweet) in constructing
feature vectors - i.e., the semantic value of Twitter metadata is ignored
● Curation of training data is extremely important (e.g., accuracy), but also
extremely time consuming as this supervised learning is a manual process
● “earthquake” can be used in different contexts (e.g., geophysics vs. movies
vs. politics …) and have a ‘subtly’ different meanings
5
Key Takeaways of “earthquake” Spam Classification
Lumb & Freemantle,
http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
Word Vectors
https://adriancolyer.files.wordpress.com/2016/04/word2vec-distributed-representation.png?w=600
"... a word is characterized by the company it keeps ..."
Firth (1957)
Firth, J.R. (1957). "A synopsis of linguistic theory 1930-1955". Studies in Linguistic
Analysis. Oxford: Philological Society: 1–32. Reprinted in F.R. Palmer, ed. (1968).
Selected Papers of J.R. Firth 1952-1959. London: Longman.
“earthquake” and
its ‘closest’ 20 words
Lumb & Freemantle,
HPCS 2017, http://2017.hpcs.ca/ (accepted).
Word-Vector Workflow: NLP via GloVe + PyTorch
http://pytorch.org
https://nlp.stanford.edu/projects/glove/
Lumb & Freemantle,
HPCS 2017, http://2017.hpcs.ca/ (accepted).
Pre-Trained Vectors Hammy Tweets Spammy Tweets
GloVe 6B 0.1182 0.0097481
Twitter 27B -0.033930 -0.064906
Preliminary Results: “earthquake” Cosine Similarities
GloVe 6B = Wikipedia 2014 + Gigaword 5, 6B tokens, 400K vocab, uncased, 50d
Twitter 27B = 2B tweets, 27B tokens, 1.2M vocab, uncased, 50d
Lumb & Freemantle, HPCS 2017, http://2017.hpcs.ca/ (accepted).
Event Pairs Selected for Reanalysis
Tohoku
05:46 UTC, 11 March 2011
29 km, ~9 Mw earthquake & tsunami
Miyagi
14:32 UTC, 7 April 2011
49 km, 7.1 Mw earthquake only
Chiapas
04:49 UTC, 8 September 2017
50 km, 8.2 Mw earthquake & tsunami
Central Mexico
18:14 UTC, 19 September 2017
51 km, 7.1 Mw earthquake only
Curated according to start time ONLY
Pre-Trained Vectors Tohoku 3/11/2011 Miyagi 4/7/2011
GloVe 6B -0.2289 0.06455
Twitter 27B -0.05655 -0.03156
# tweets / # words 1374 / 715 146 / 328
Re-analysis Results: “earthquake” Cosine Similarities
GloVe 6B = Wikipedia 2014 + Gigaword 5, 6B tokens, 400K vocab, uncased, 50d
Twitter 27B = 2B tweets, 27B tokens, 1.2M vocab, uncased, 50d
Pre-Trained Vectors Chiapas 9/8/2017 Central Mexico 9/19/2017
GloVe 6B -0.1306 -0.01169
Twitter 27B 0.1050 0.1273
# tweets / # words 304 / 468 415 / 759
“earthquake-tsunami” Similarity
0 1
GloVe 6B
0.8255
Twitter 27B
0.009244
Tohoku
0.7161
Miyagi
-0.2540
Chiapas
0.3156
Central Mexico
-0.001964
Vector size = 50
Discussion
● Embedded word vectors superior to text classification in isolating
geophysically relevant content
○ Embeddings convey significantly enhanced semantic value over bland features
○ Unsupervised learning replaces manually intensive requirement for close supervision
● Using NLP via embedded word vectors
○ Closest word and inter-corpora cosine similarities prove inconclusive in isolation
○ Intra-corpora cosine similarities (e.g., “earthquake-tsunami”) appear more promising in
isolating tsunami-producing earthquakes
○ Word-vector analogies require additional consideration
● Steps towards operationalization
○ Enable shift from offline, reanalysis to online, real-time streaming
○ Focus efforts on the time interval between the earthquake and (potential) arrival of the tsunami
● Applicable in other disaster scenarios - e.g., hurricanes, wildfires, ...
www.univa.com 14
Tsunami Advisories
if ( EARTHQUAKE ) then {
TSUNAMI }
if ( Mw > 8.0 and TRENCH
and DISPLACEMENT and DEEP WATER ) then {
TSUNAMI }
Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories
Q&A
L. I. Lumb1 & J. R. Freemantle2
1ianlumb@yorku.ca & 2james.freemantle@rogers.com
Additional Content
Motivation
● Non-deterministic cause
○ Uncertainty inherent in any attempt to predict earthquakes
■ In situ measurements may reduce uncertainty
● Lead times
○ Availability of actionable observations
○ Communication of situation - advisories, warnings, etc.
● Cause-effect relationship
○ Energy transfer - inputs ... coupling ... outputs
■ ‘Geometry’ - bathymetry and topography
○ Other factors - e.g., tides
● Established effect
○ Far-field estimates of tsunami propagation (pre-computed) and coastal inundation (real-time)
have proven to be extremely accurate ... requires
● Distributed array of deep-ocean tsunami detection buoys + forecasting model
After Karau et al., Learning Spark, O’Reilly, 2015
“earthquake” Spam Classification via Apache Spark
The Opportunity for Semantics
● A feature vector is a feature vector - it is devoid of semantics
● Ignores inherent, overall credibility of a Tweet - e.g., as quantified by
TweetCred
● Twitter metadata (handles, hashtags and URLs) contributes equally to Twitter
data (unstructured text that comprises the body of a Tweet) in constructing
feature vectors - i.e., the semantic value of Twitter metadata is also ignored
by Deep Learning
● The W3C’s Resource Description Framework (RDF) facilitates the
representation of metadata and thus exposes semantics
● The W3C’s Web Ontology Language (OWL) accounts for domain specifics -
disambiguates use of overloaded terms (e.g., “earthquake”) in different
contexts (e.g., geophysics vs. movies vs. …)
● Deep Learning in combination with RDF/OWL semantics has the potential to
produce learned models with knowledge represented
23
http://pytorch.org/about/
www.univa.com
PyTorch
● Python package that provides
○ Tensor computation – strong GPU acceleration, efficient memory usage
■ Integrated with NVIDIA CuDNN and NCCL libraries
○ Deep Neural Networks built on a tape-based autograd system
● Can leverage numpy, scipy and Cython as needed
● Available tutorials include Natural Language Processing (NLP)
Big Data’s 6Vs
24
http://credit.pvamu.edu/MCBDA2016/Slides/Day2
_Lumb_MCBDA1_Twitter_Tsunami.pdf
www.univa.com

More Related Content

PPT
Earthquake shakes twitter users
PPTX
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
PDF
Classification of Disastrous Tweets on Twitter using BERT Model
PDF
Twaster final project report
PDF
Iaetsd real time event detection and alert system using sensors
PPTX
Semantic Twitter Analyzing Tweets For Real Time Event Notification
PDF
Expelling Information of Events from Critical Public Space using Social Senso...
PPTX
Akram.pptx
Earthquake shakes twitter users
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
Classification of Disastrous Tweets on Twitter using BERT Model
Twaster final project report
Iaetsd real time event detection and alert system using sensors
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Expelling Information of Events from Critical Public Space using Social Senso...
Akram.pptx

Similar to Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories (20)

PPTX
From Research to Applications: What Can We Extract with Social Media Sensing?
PDF
report for an energetic equation using plagarisim
PDF
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
PPTX
Deep Attentive Multimodal Learning Approach for Disaster Identification from ...
DOCX
Tweet analysis for real time event detection and earthquake reporting system ...
PPTX
Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...
PDF
deep.pdf
PPTX
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
PPTX
Processing Social Media Messages in Mass Emergency: A Survey
PDF
Paper Work for an interative modelING FOR AN FREE
PDF
Cloud Major Project
PDF
Visual Information Analysis for Crisis and Natural Disasters Management and R...
PDF
Crisis Information Processing - with the power of A.I.
PPTX
Damage Assessment from Social Media Imagery Data During Disasters
PDF
Crisis Computing
PDF
Using Social Media to Enhance Emergency Situation Awareness
PDF
Classifying Crises-Information Relevancy with Semantics
PPT
The Android app Geohazard - Experiences with shared information on natural ha...
PDF
IDENTIFYING THE DAMAGE ASSESSMENT TWEETS DURING DISASTER
PPTX
Classifying Microblogs For Disasters
From Research to Applications: What Can We Extract with Social Media Sensing?
report for an energetic equation using plagarisim
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Deep Attentive Multimodal Learning Approach for Disaster Identification from ...
Tweet analysis for real time event detection and earthquake reporting system ...
Multi-Scale and Multi-Modal Streaming Data Aggregation and Processing for Dec...
deep.pdf
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Processing Social Media Messages in Mass Emergency: A Survey
Paper Work for an interative modelING FOR AN FREE
Cloud Major Project
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Crisis Information Processing - with the power of A.I.
Damage Assessment from Social Media Imagery Data During Disasters
Crisis Computing
Using Social Media to Enhance Emergency Situation Awareness
Classifying Crises-Information Relevancy with Semantics
The Android app Geohazard - Experiences with shared information on natural ha...
IDENTIFYING THE DAMAGE ASSESSMENT TWEETS DURING DISASTER
Classifying Microblogs For Disasters
Ad

More from Ian Lumb (13)

PPTX
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
PPTX
Managing Containerized HPC and AI Workloads on TSUBAME3.0
PPTX
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
PPTX
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
PPTX
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
PDF
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
PPTX
Docker 101 - all about Docker containers
PDF
High Performance Computing in the Cloud?
PPTX
VoDcast Slides: The Rise in Popularity of Apache Spark
PPTX
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
PPTX
Utilizing Public AND Private Clouds with Bright Cluster Manager
PPTX
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
PPTX
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Managing Containerized HPC and AI Workloads on TSUBAME3.0
Univa Unicloud - High Volume Workloads: How Smart Companies are Harnessing th...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Drilling Deep with Machine Learning as an Enterprise Enabled Micro Service
Machine Learning for Big Data Analytics: Scaling In with Containers while Sc...
Docker 101 - all about Docker containers
High Performance Computing in the Cloud?
VoDcast Slides: The Rise in Popularity of Apache Spark
Bright Topics Webinar April 15, 2015 - Modernized Monitoring for Cluster and ...
Utilizing Public AND Private Clouds with Bright Cluster Manager
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
Ad

Recently uploaded (20)

PPTX
2. Earth - The Living Planet earth and life
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Fluid dynamics vivavoce presentation of prakash
PPTX
Pharmacology of Autonomic nervous system
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PPTX
Application of enzymes in medicine (2).pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Microbiology with diagram medical studies .pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Overview of calcium in human muscles.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
2. Earth - The Living Planet earth and life
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Fluid dynamics vivavoce presentation of prakash
Pharmacology of Autonomic nervous system
Classification Systems_TAXONOMY_SCIENCE8.pptx
Science Quipper for lesson in grade 8 Matatag Curriculum
Application of enzymes in medicine (2).pptx
2. Earth - The Living Planet Module 2ELS
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Biophysics 2.pdffffffffffffffffffffffffff
Microbiology with diagram medical studies .pptx
Placing the Near-Earth Object Impact Probability in Context
Overview of calcium in human muscles.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx

Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories

  • 1. Towards Deep Learning from Twitter for Improved Tsunami Alerts and Advisories L. I. Lumb1 & J. R. Freemantle2 1York University & 2Independent NH14A-03, 2017 AGU Fall Meeting New Orleans, LA; December 11, 2017
  • 2. Outline ● Motivation ● Previous Work ○ Text Classification ● Current Work ○ Natural Language Processing via Word Embeddings ○ Reanalysis of 2 Event Pairs ● Discussion
  • 3. Geist, E.L., Titov, V.V., and Synolakis, C.E., 2006, Tsunami: wave of change: Scientific American, v. 294, p. 56-63
  • 4. Data extracted from Twitter via a Perl script that targets #earthquake Lumb & Freemantle, http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
  • 5. ● Twitter metadata (handles, hashtags and URLs) contributes equally to Twitter data (unstructured text that comprises the body of a Tweet) in constructing feature vectors - i.e., the semantic value of Twitter metadata is ignored ● Curation of training data is extremely important (e.g., accuracy), but also extremely time consuming as this supervised learning is a manual process ● “earthquake” can be used in different contexts (e.g., geophysics vs. movies vs. politics …) and have a ‘subtly’ different meanings 5 Key Takeaways of “earthquake” Spam Classification Lumb & Freemantle, http://credit.pvamu.edu/MCBDA2016/Slides/Day2_Lumb_MCBDA1_Twitter_Tsunami.pdf
  • 6. Word Vectors https://adriancolyer.files.wordpress.com/2016/04/word2vec-distributed-representation.png?w=600 "... a word is characterized by the company it keeps ..." Firth (1957) Firth, J.R. (1957). "A synopsis of linguistic theory 1930-1955". Studies in Linguistic Analysis. Oxford: Philological Society: 1–32. Reprinted in F.R. Palmer, ed. (1968). Selected Papers of J.R. Firth 1952-1959. London: Longman.
  • 7. “earthquake” and its ‘closest’ 20 words Lumb & Freemantle, HPCS 2017, http://2017.hpcs.ca/ (accepted).
  • 8. Word-Vector Workflow: NLP via GloVe + PyTorch http://pytorch.org https://nlp.stanford.edu/projects/glove/ Lumb & Freemantle, HPCS 2017, http://2017.hpcs.ca/ (accepted).
  • 9. Pre-Trained Vectors Hammy Tweets Spammy Tweets GloVe 6B 0.1182 0.0097481 Twitter 27B -0.033930 -0.064906 Preliminary Results: “earthquake” Cosine Similarities GloVe 6B = Wikipedia 2014 + Gigaword 5, 6B tokens, 400K vocab, uncased, 50d Twitter 27B = 2B tweets, 27B tokens, 1.2M vocab, uncased, 50d Lumb & Freemantle, HPCS 2017, http://2017.hpcs.ca/ (accepted).
  • 10. Event Pairs Selected for Reanalysis Tohoku 05:46 UTC, 11 March 2011 29 km, ~9 Mw earthquake & tsunami Miyagi 14:32 UTC, 7 April 2011 49 km, 7.1 Mw earthquake only Chiapas 04:49 UTC, 8 September 2017 50 km, 8.2 Mw earthquake & tsunami Central Mexico 18:14 UTC, 19 September 2017 51 km, 7.1 Mw earthquake only Curated according to start time ONLY
  • 11. Pre-Trained Vectors Tohoku 3/11/2011 Miyagi 4/7/2011 GloVe 6B -0.2289 0.06455 Twitter 27B -0.05655 -0.03156 # tweets / # words 1374 / 715 146 / 328 Re-analysis Results: “earthquake” Cosine Similarities GloVe 6B = Wikipedia 2014 + Gigaword 5, 6B tokens, 400K vocab, uncased, 50d Twitter 27B = 2B tweets, 27B tokens, 1.2M vocab, uncased, 50d Pre-Trained Vectors Chiapas 9/8/2017 Central Mexico 9/19/2017 GloVe 6B -0.1306 -0.01169 Twitter 27B 0.1050 0.1273 # tweets / # words 304 / 468 415 / 759
  • 12. “earthquake-tsunami” Similarity 0 1 GloVe 6B 0.8255 Twitter 27B 0.009244 Tohoku 0.7161 Miyagi -0.2540 Chiapas 0.3156 Central Mexico -0.001964 Vector size = 50
  • 13. Discussion ● Embedded word vectors superior to text classification in isolating geophysically relevant content ○ Embeddings convey significantly enhanced semantic value over bland features ○ Unsupervised learning replaces manually intensive requirement for close supervision ● Using NLP via embedded word vectors ○ Closest word and inter-corpora cosine similarities prove inconclusive in isolation ○ Intra-corpora cosine similarities (e.g., “earthquake-tsunami”) appear more promising in isolating tsunami-producing earthquakes ○ Word-vector analogies require additional consideration ● Steps towards operationalization ○ Enable shift from offline, reanalysis to online, real-time streaming ○ Focus efforts on the time interval between the earthquake and (potential) arrival of the tsunami ● Applicable in other disaster scenarios - e.g., hurricanes, wildfires, ...
  • 15. if ( EARTHQUAKE ) then { TSUNAMI }
  • 16. if ( Mw > 8.0 and TRENCH and DISPLACEMENT and DEEP WATER ) then { TSUNAMI }
  • 18. Q&A L. I. Lumb1 & J. R. Freemantle2 1ianlumb@yorku.ca & 2james.freemantle@rogers.com
  • 20. Motivation ● Non-deterministic cause ○ Uncertainty inherent in any attempt to predict earthquakes ■ In situ measurements may reduce uncertainty ● Lead times ○ Availability of actionable observations ○ Communication of situation - advisories, warnings, etc. ● Cause-effect relationship ○ Energy transfer - inputs ... coupling ... outputs ■ ‘Geometry’ - bathymetry and topography ○ Other factors - e.g., tides ● Established effect ○ Far-field estimates of tsunami propagation (pre-computed) and coastal inundation (real-time) have proven to be extremely accurate ... requires ● Distributed array of deep-ocean tsunami detection buoys + forecasting model
  • 21. After Karau et al., Learning Spark, O’Reilly, 2015 “earthquake” Spam Classification via Apache Spark
  • 22. The Opportunity for Semantics ● A feature vector is a feature vector - it is devoid of semantics ● Ignores inherent, overall credibility of a Tweet - e.g., as quantified by TweetCred ● Twitter metadata (handles, hashtags and URLs) contributes equally to Twitter data (unstructured text that comprises the body of a Tweet) in constructing feature vectors - i.e., the semantic value of Twitter metadata is also ignored by Deep Learning ● The W3C’s Resource Description Framework (RDF) facilitates the representation of metadata and thus exposes semantics ● The W3C’s Web Ontology Language (OWL) accounts for domain specifics - disambiguates use of overloaded terms (e.g., “earthquake”) in different contexts (e.g., geophysics vs. movies vs. …) ● Deep Learning in combination with RDF/OWL semantics has the potential to produce learned models with knowledge represented
  • 23. 23 http://pytorch.org/about/ www.univa.com PyTorch ● Python package that provides ○ Tensor computation – strong GPU acceleration, efficient memory usage ■ Integrated with NVIDIA CuDNN and NCCL libraries ○ Deep Neural Networks built on a tape-based autograd system ● Can leverage numpy, scipy and Cython as needed ● Available tutorials include Natural Language Processing (NLP)