SlideShare a Scribd company logo
RIP Boris Strugatski
Science Fiction will never be the same
Implicit Sentiment Mining
     (do you tweet like Hamas?)

          Maksim Tsvetovat
           Jacqueline Kazil
        Alexander Kouznetsov
My book
Twitter predicts stock market
Sentiment Mining, old-schoool

• Start with a corpus of words that have sentiment
  orientation (bad/good):
     • “awesome” : +1
     • “horrible”: -1
     • “donut” : 0 (neutral)

• Compute sentiment of a text by averaging all
  words in text
…however…
• This doesn’t quite work (not reliably, at least).

• Human emotions are actually quite complex




• ….. Anyone surprised?
We do things like this:



“This restaurant would deserve highest praise if
      you were a cockroach” (a real Yelp review ;-)
We do things like this:



  “This is only a flesh wound!”
We do things like this:



“This concert was f**ing awesome!”
We do things like this:



“My car just got rear-ended! F**ing awesome!”
We do things like this:



“A rape is a gift from God” (he lost! Good ;-)
To sum up…

• Ambiguity is rampant

• Context matters

• Homonyms are everywhere

• Neutral words become charged as discourse
 changes, charged words lose their meaning
More Sentiment Analysis

• We can parse text using POS (parts-of-
  speech) identification

• This helps with homonyms and some
  ambiguity
More Sentiment Analysis

• Create rules with amplifier words and inverter
  words:
   – “This concert (np) was (v) f**ing (AMP) awesome (+1) = +2

   – “But the opening act (np) was (v) not (INV) great (+1) = -1

   – “My car (np) got (v) rear-ended (v)! F**ing (AMP)
     awesome (+1) = +2??
To do this properly…
• Valence (good vs. bad)

• Relevance (me vs. others)

• Immediacy (now/later)

• Certainty (definitely/maybe)
•   …. And about 9 more less-significant dimensions


        Samsonovich A., Ascoli G.: Cognitive map dimensions of the human value
        system extracted from the natural language. In Goertzel B. (Ed.): Advances in
        Artificial General Intelligence (Proc. 2006 AGIRI Workshop), IOS Press, pp. 111-
        124 (2007).
This is hard



• But worth it?
  Michelle de Haaff (2010), Sentiment Analysis, Hard But Worth It!, CustomerThink
Sentiment, Gangnam Style!
Hypothesis


• Support for a political candidate, party, brand,
  country, etc. can be detected by observing
  indirect indicators of sentiment in text
Mirroring – unconscious copying
  of words or body language




 Fay, W. H.; Coleman, R. O. (1977). "A human sound transducer/reproducer: Temporal
 capabilities of a profoundly echolalic child". Brain and language 4 (3): 396–402
Marker words
• All speakers have some words and
  expressions in common (e.g.
  conservative, liberal, party designation,
  etc)
• However, everyone has a set of
  trademark words and expressions that
  make him unique.
GOP Presidential Candidates
Israel vs. Hamas on Twitter
Observing Mirroring

• We detect marker words and expressions in
 social media speech and compute sentiment
 by observing and counting mirrored phrases
The research question


• Is media biased towards Israel or Hamas in
  the current conflict?

• What is the slant of various media sources?
Data harvest
• Get Twitter feeds for:
   – @IDFSpokesperson
   – @AlQuassam
   – Twitter feeds for CNN, BBC, CNBC, NPR, Al-Jazeera,
     FOX News – all filtered to only include articles on
     Israel and Gaza

• (more text == more reliable results)
Fast Computational Linguistics
Text Cleaning
import string
stoplist_str="""
a
a's                                                                • Tweet text is dirty
able
About                                                              • (RT, VIA, #this and
...                                                                   @that, ROFL, etc)
...
z                                                                  • Use a stoplist to produce a
zero
rt                                                                    stripped-down tweet
via
"""

stoplist=[w.strip() for w in stoplist_str.split('n') if w !='']
Language ID

• Language identification is pretty easy…

• Every language has a characteristic
  distribution of tri-grams (3-letter sequences);
  – E.g. English is heavy on “the” trigram

• Use open-source library “guess-language”
Stemming
• Stemming identifies root of a word, stripping
  away:
  – Suffixes, prefixes, verb tense, etc

• “stemmer”, “stemming”, “stemmed” ->>
  “stem”
• “go”,”going”,”gone” ->> “go”
Term Networks
• Output of the cleaning step is a term
   vector
• Union of term vectors is a term network
• 2-mode network linking speakers with
   bigrams
• 2-mode network linking locations with
   bigrams
• Edge weight = number of occurrences
   of edge bigram/location or
   candidate/location
Build a larger net

• Periodically purge single co-occurrences
  – Edge weights are power-law distributed
  – Single co-occurrences account for ~ 90% of data

• Periodically discount and purge old co-
  occurrences
  – Discourse changes, data should reflect it.
Israel vs. Hamas on Twitter
Israel, Hamas and Media
Metrics computation

• Extract ego-networks for IDF and HAMAS
• Extract ego-networks for media organizations
• Compute hamming distance H(c,l)
   – Cardinality of an intersection set between two networks
   – Or… how much does CNN mirror Hamas? What about FOX?

• Normalize to percentage of support
Aggregate & Normalize


• Aggregate speech
  differences and
  similarities by
  media source
• Normalize values
Media Sources, Hamas and IDF
                         Chart Title
                         IDF   Hamas

    NPR       0.579395354               0.420604646

AlJazeera   0.530344094                0.469655906

    CNN       0.585616438                0.414383562

     BBC     0.537492158               0.462507842

    FOX     0.49329523                 0.50670477

   CNBC        0.601137576               0.398862424
Ron Paul, Romney, Gingrich, Santorum
         March 2012 (based on Twitter Support)
MT
MN
UT
MD
ID
IA
IL
AR
AK
PA
LA
HI
SD
KY
KS
OK
GA
CO
RI
NE
NC
NJ
WY
WV
WA

     0       0.2    0.4    0.6    0.8    1       1.2
Conclusions

• This works pretty well! ;-)

• However – it only works in
  aggregates, especially on Twitter.

• More text == better accuracy.
Conclusions

• The algorithm is cheap:
  – O(n) for words on ingest – real-time on a stream

  – O(n^2) for storage (pruning helps a lot)

• Storage can go to Redis
  – make use of built-in set operations
Implicit Sentiment Mining in Twitter Streams

More Related Content

PDF
Super Bowl 50 & the Twitterverse
PDF
Recommendations 101
PDF
ATCC交點#6 - 雨群 - Lunch time
DOCX
Presentacion de portafolio
DOCX
Pramod resume
PPT
被遮蔽的歷史
PDF
議題設定與公眾決策
DOC
Zaragoza turismo-52
Super Bowl 50 & the Twitterverse
Recommendations 101
ATCC交點#6 - 雨群 - Lunch time
Presentacion de portafolio
Pramod resume
被遮蔽的歷史
議題設定與公眾決策
Zaragoza turismo-52

Viewers also liked (16)

PDF
Research_and_Development_in_the_Solar_Re
PDF
Ipsos sack ministers survey april 2015
DOC
Zaragoza turismo-100
PPT
Rmls Data 1 22 08
PPT
Social media in higher ed may 2010
DOC
Zaragoza turismo 230 bis
PDF
New strategies for attacking deferred maintenance december 2012
PPT
Daniel Hibbert - Reward in Local Government - PPMA Seminar April 2012
XLS
Daneia Apografh Draseon
PPTX
Relocating For Work?
PDF
TXT
지리산콘도 미국비자신청방법
PDF
Senior Capstone - Nasogastruc Intubation Training
PDF
PPTX
Devoxx France 2015 - UX : Le Poids des Mots - 1.1
PDF
Generation Y Study In China Whitepaper
Research_and_Development_in_the_Solar_Re
Ipsos sack ministers survey april 2015
Zaragoza turismo-100
Rmls Data 1 22 08
Social media in higher ed may 2010
Zaragoza turismo 230 bis
New strategies for attacking deferred maintenance december 2012
Daniel Hibbert - Reward in Local Government - PPMA Seminar April 2012
Daneia Apografh Draseon
Relocating For Work?
지리산콘도 미국비자신청방법
Senior Capstone - Nasogastruc Intubation Training
Devoxx France 2015 - UX : Le Poids des Mots - 1.1
Generation Y Study In China Whitepaper
Ad

Similar to Implicit Sentiment Mining in Twitter Streams (20)

PDF
IE: Named Entity Recognition (NER)
PDF
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
PPTX
02 naive bays classifier and sentiment analysis
PPT
Natural Language Processing
PPT
Introduction to Natural Language Processing
PPTX
The Web of Data: do we actually understand what we built?
PDF
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
PPT
NLP Introduction.ppt machine learning presentation
PPTX
Watson System
PDF
We love NLTK
PDF
OUTDATED Text Mining 2/5: Language Modeling
PPTX
TRank ISWC2013
PPT
intro.ppt
PPTX
Petermrjisc20141201
PDF
Overview of text mining and NLP (+software)
PDF
Data Exploration with Elasticsearch
PDF
useR! 2012 Talk
PPTX
2013 siam-cse-big-data
PPTX
Text Analysis Operations using NLTK.pptx
PPTX
DeepLearning
IE: Named Entity Recognition (NER)
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
02 naive bays classifier and sentiment analysis
Natural Language Processing
Introduction to Natural Language Processing
The Web of Data: do we actually understand what we built?
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
NLP Introduction.ppt machine learning presentation
Watson System
We love NLTK
OUTDATED Text Mining 2/5: Language Modeling
TRank ISWC2013
intro.ppt
Petermrjisc20141201
Overview of text mining and NLP (+software)
Data Exploration with Elasticsearch
useR! 2012 Talk
2013 siam-cse-big-data
Text Analysis Operations using NLTK.pptx
DeepLearning
Ad

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Getting Started with Data Integration: FME Form 101
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
A Presentation on Touch Screen Technology
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Tartificialntelligence_presentation.pptx
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Mushroom cultivation and it's methods.pdf
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
A Presentation on Artificial Intelligence
Assigned Numbers - 2025 - Bluetooth® Document
Getting Started with Data Integration: FME Form 101
NewMind AI Weekly Chronicles - August'25-Week II
A Presentation on Touch Screen Technology
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Chapter 5: Probability Theory and Statistics
Tartificialntelligence_presentation.pptx
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Encapsulation theory and applications.pdf
cloud_computing_Infrastucture_as_cloud_p
Mushroom cultivation and it's methods.pdf
Web App vs Mobile App What Should You Build First.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Enhancing emotion recognition model for a student engagement use case through...
Univ-Connecticut-ChatGPT-Presentaion.pdf

Implicit Sentiment Mining in Twitter Streams

  • 1. RIP Boris Strugatski Science Fiction will never be the same
  • 2. Implicit Sentiment Mining (do you tweet like Hamas?) Maksim Tsvetovat Jacqueline Kazil Alexander Kouznetsov
  • 5. Sentiment Mining, old-schoool • Start with a corpus of words that have sentiment orientation (bad/good): • “awesome” : +1 • “horrible”: -1 • “donut” : 0 (neutral) • Compute sentiment of a text by averaging all words in text
  • 6. …however… • This doesn’t quite work (not reliably, at least). • Human emotions are actually quite complex • ….. Anyone surprised?
  • 7. We do things like this: “This restaurant would deserve highest praise if you were a cockroach” (a real Yelp review ;-)
  • 8. We do things like this: “This is only a flesh wound!”
  • 9. We do things like this: “This concert was f**ing awesome!”
  • 10. We do things like this: “My car just got rear-ended! F**ing awesome!”
  • 11. We do things like this: “A rape is a gift from God” (he lost! Good ;-)
  • 12. To sum up… • Ambiguity is rampant • Context matters • Homonyms are everywhere • Neutral words become charged as discourse changes, charged words lose their meaning
  • 13. More Sentiment Analysis • We can parse text using POS (parts-of- speech) identification • This helps with homonyms and some ambiguity
  • 14. More Sentiment Analysis • Create rules with amplifier words and inverter words: – “This concert (np) was (v) f**ing (AMP) awesome (+1) = +2 – “But the opening act (np) was (v) not (INV) great (+1) = -1 – “My car (np) got (v) rear-ended (v)! F**ing (AMP) awesome (+1) = +2??
  • 15. To do this properly… • Valence (good vs. bad) • Relevance (me vs. others) • Immediacy (now/later) • Certainty (definitely/maybe) • …. And about 9 more less-significant dimensions Samsonovich A., Ascoli G.: Cognitive map dimensions of the human value system extracted from the natural language. In Goertzel B. (Ed.): Advances in Artificial General Intelligence (Proc. 2006 AGIRI Workshop), IOS Press, pp. 111- 124 (2007).
  • 16. This is hard • But worth it? Michelle de Haaff (2010), Sentiment Analysis, Hard But Worth It!, CustomerThink
  • 18. Hypothesis • Support for a political candidate, party, brand, country, etc. can be detected by observing indirect indicators of sentiment in text
  • 19. Mirroring – unconscious copying of words or body language Fay, W. H.; Coleman, R. O. (1977). "A human sound transducer/reproducer: Temporal capabilities of a profoundly echolalic child". Brain and language 4 (3): 396–402
  • 20. Marker words • All speakers have some words and expressions in common (e.g. conservative, liberal, party designation, etc) • However, everyone has a set of trademark words and expressions that make him unique.
  • 22. Israel vs. Hamas on Twitter
  • 23. Observing Mirroring • We detect marker words and expressions in social media speech and compute sentiment by observing and counting mirrored phrases
  • 24. The research question • Is media biased towards Israel or Hamas in the current conflict? • What is the slant of various media sources?
  • 25. Data harvest • Get Twitter feeds for: – @IDFSpokesperson – @AlQuassam – Twitter feeds for CNN, BBC, CNBC, NPR, Al-Jazeera, FOX News – all filtered to only include articles on Israel and Gaza • (more text == more reliable results)
  • 27. Text Cleaning import string stoplist_str=""" a a's • Tweet text is dirty able About • (RT, VIA, #this and ... @that, ROFL, etc) ... z • Use a stoplist to produce a zero rt stripped-down tweet via """ stoplist=[w.strip() for w in stoplist_str.split('n') if w !='']
  • 28. Language ID • Language identification is pretty easy… • Every language has a characteristic distribution of tri-grams (3-letter sequences); – E.g. English is heavy on “the” trigram • Use open-source library “guess-language”
  • 29. Stemming • Stemming identifies root of a word, stripping away: – Suffixes, prefixes, verb tense, etc • “stemmer”, “stemming”, “stemmed” ->> “stem” • “go”,”going”,”gone” ->> “go”
  • 30. Term Networks • Output of the cleaning step is a term vector • Union of term vectors is a term network • 2-mode network linking speakers with bigrams • 2-mode network linking locations with bigrams • Edge weight = number of occurrences of edge bigram/location or candidate/location
  • 31. Build a larger net • Periodically purge single co-occurrences – Edge weights are power-law distributed – Single co-occurrences account for ~ 90% of data • Periodically discount and purge old co- occurrences – Discourse changes, data should reflect it.
  • 32. Israel vs. Hamas on Twitter
  • 34. Metrics computation • Extract ego-networks for IDF and HAMAS • Extract ego-networks for media organizations • Compute hamming distance H(c,l) – Cardinality of an intersection set between two networks – Or… how much does CNN mirror Hamas? What about FOX? • Normalize to percentage of support
  • 35. Aggregate & Normalize • Aggregate speech differences and similarities by media source • Normalize values
  • 36. Media Sources, Hamas and IDF Chart Title IDF Hamas NPR 0.579395354 0.420604646 AlJazeera 0.530344094 0.469655906 CNN 0.585616438 0.414383562 BBC 0.537492158 0.462507842 FOX 0.49329523 0.50670477 CNBC 0.601137576 0.398862424
  • 37. Ron Paul, Romney, Gingrich, Santorum March 2012 (based on Twitter Support) MT MN UT MD ID IA IL AR AK PA LA HI SD KY KS OK GA CO RI NE NC NJ WY WV WA 0 0.2 0.4 0.6 0.8 1 1.2
  • 38. Conclusions • This works pretty well! ;-) • However – it only works in aggregates, especially on Twitter. • More text == better accuracy.
  • 39. Conclusions • The algorithm is cheap: – O(n) for words on ingest – real-time on a stream – O(n^2) for storage (pruning helps a lot) • Storage can go to Redis – make use of built-in set operations