SlideShare a Scribd company logo
Tweeting for
Hillary
Li Meng, Matt Beaulieu, ML Tlachac, Yousef Fadila
DS 501 : Introduction To Data Science – Case Study 1: Collecting Data from Twitter
https://github.com/yousef-fadila/casestudy1/blob/master/CaseStudy1.ipynb
“The more compelling campaign
is a direct result of better data
collection, analysis and smart
decision making”
-PromptCloud
Motivation
Social media is a means for getting political news, and initiating
political discussion
Being able to interpret data with regards to the election would
give a campaign manager live feedback on how their
candidates actions likely impact polling
This allows them to gain an advantage by reacting accordingly
to changing political climates
The Data
Pulled about 15.5K Tweets from the
twitter streaming API
Filter based on:
Language: en
Tweets mentioning @Hillary Clinton
Can then process hashtags,
mentions, and relevant words, to
Most Frequent Words
Appearances Word
1240 trump
915 hillary
113 benghazi
346 cant
142 didnt
252 doesnt
146 poorest
117 trumps
130 wont
259 pneumonia
87 footing
192 liar
232 donors
541 dont
45 dnc
Appearances Word
245 thats
91 isnt
41 tweet
63 ive
85 nypd
142 systematically
66 whats
68 cough
61 hypocrisy
32 dishonesty
103 crooked
40 theres
47 stamina
66 unfit
30 scum
Types of Frequent Words
1. Opponent: trump, trumps
2. Criticism: unfit, liar, hypocrisy
3. Topics: bodyguards, benghazi, poorest, blackmail, pneumonia,
audiobooks
4. Patterns: cant, doesnt, didnt, wont, dont, isnt
Popular Tweets
Entity Popularity
Screen Name Mentions
HillaryClinton 15421
RealDonaldTrump 2718
FoxNews 1532
POTUS 503
CNN 481
politico 283
timkaine 263
FLOTUS 245
MSNBC 244
USAneedsTRUMP 235
Popular Mentions with @HillaryClinton Popular #hashtags with @HillaryClinton
Hashtag Count
#MAGA 385
#ImWithHer 351
#SpecialReport 209
#NeverHillary 178
#DNCLeak 177
#HispanicHeritageMonth 163
#tcot 156
#Trump 149
#TrumpPence16 125
#HillaryHealth 102
Tweeting for Hillary - DS 501 case study 1
Hillary’s Friends
ID Screen Name
571202103 Medium
21337440 ChildDefender
23449384 amberdiscko
128790234 Samynemir
1656913327 sarajacobs89
325886383 SammyKoppelman
802430450 Natasha_S_Law
729761993461248000 ktvibbs
115740215 SarahAudelo
34782406 Lincoln_Ross
3044781131 HillaryforAR
113298560 GunaRockYa
15972271 CdotDukes
582037089 MiguelAyala312
734768872625188864 AndrewBatesNC
41021335 TroyClair
4736170399 BrianZuzenak
150885854 SarahPeckVA
231673 yianni
125083946 GillDrummond
● Communication Directors
● Charities
● Media Websites
● United States Senators
● etc.
Sentiment Analysis
Using Python’s NLTK text classifier, classified each tweet as “Positive”,
“Negative”, or “Neutral”.
Could give an idea of how “twitter” felt about Hillary Clinton
Positive Neutral Negative
Geographic Analysis
Using the “positivity” of each tweet, we formed a ratio of positive and
negative tweets, and compared it national polling data, to see how
tweet hashtags related to polling data, if at all.
Sentiment Analysis on Text
Hashtags in Positive Tweets Count
#HispanicHeritageMonth 118
#ImWithHer 107
#MAGA 72
#tcot 65
#Democrats 50
#RedNationRising 46
#WakeUpAmerica 43
#NeverHillary 32
#HillaryClinton 31
Hashtags in Negative Tweets Count
#ImWithHer 74
#LatinosWithTrump 51
#AmericansUnitedForTrump 49
#MAGA 42
#NeverHillary 39
#CrookedHillary 38
● Broke down the most popular hashtags in
positive and negative tweets
● Some hashtags, in either table, seemed out
of place
● This could be part of the source of error in the
sentiment classification
Sentiment analysis on Hashtags
Manually identify positive and negative hashtags, and use this to
determine popular words in tweets containing those hashtags in order
to re-train the NLTK alogrithim
Positive Hashtags include...
● Never Trump
● Hillary2016
● StrongerTogether
● Vote
● UnitedBlue
Negative Hashtags include...
● MAGA
● NeverHillary
● CrookedHillary
● LatinoswithTrump
● AmericansUnitedwithTrump
Tweeting for Hillary - DS 501 case study 1
Conclusions
Word frequency analysis revealed relevant tweets to Clinton, and issues that
she could consider addressing, or at least know what’s being talked about.
Judging tweets by positive or negative sentiment gave mixed results.
Training the positive and negative classifier on positive or negative hashtags
proved more insightful.
Ultimately, 15.5K tweets is not enough data, especially when separating it by
state.
Twitter has great potential to be useful to campaigns.
Thank You
Questions?
Source code and Charts: https://github.com/yousef-fadila/casestudy1/blob/master/CaseStudy1.ipynb

More Related Content

PPTX
Cognitive Politics - Predicting 2016 US Election Outcome
PPTX
Cognitive Politics US elections'16 closing predictions
PDF
Final Draft Duerr
PPTX
5 Psychological Ad Copy Hacks to 3X Your CTR's - Margot da Cunha's Presentati...
PDF
NEDMA15: Building and Engaging Online Communities with Twitter - Justine Jord...
PDF
Tweeting patternpresamb(1)
PDF
Thinking About Running for Office?
Cognitive Politics - Predicting 2016 US Election Outcome
Cognitive Politics US elections'16 closing predictions
Final Draft Duerr
5 Psychological Ad Copy Hacks to 3X Your CTR's - Margot da Cunha's Presentati...
NEDMA15: Building and Engaging Online Communities with Twitter - Justine Jord...
Tweeting patternpresamb(1)
Thinking About Running for Office?

Viewers also liked (20)

PPTX
El princito por kc y mr
PPTX
2017.01.20
PPT
Hacia una cultura ecológica
PPTX
Spot deceptive TripAdvisor Reviews
PPTX
Good angle bad angle by dr faustus
PDF
Europe Language Jobs Annual Review 2016
PPTX
Trabajo
DOCX
Mery sanchez....
PPTX
Incapacitació i tutela i altres mesures legals
PDF
Actividades para productos notables y factorizaciones induccion
PPT
Topología
PPTX
Tercer indicador. michel y lina
PPT
Por la orda
PPTX
Reconocimiento general y de actores
PPTX
Historia de roma
PDF
Oa slide
PPT
Unidad 5 el univerrsomodificado (1)
PDF
Matrixprop
PDF
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
El princito por kc y mr
2017.01.20
Hacia una cultura ecológica
Spot deceptive TripAdvisor Reviews
Good angle bad angle by dr faustus
Europe Language Jobs Annual Review 2016
Trabajo
Mery sanchez....
Incapacitació i tutela i altres mesures legals
Actividades para productos notables y factorizaciones induccion
Topología
Tercer indicador. michel y lina
Por la orda
Reconocimiento general y de actores
Historia de roma
Oa slide
Unidad 5 el univerrsomodificado (1)
Matrixprop
INVESTIGATING THE STRUCTURE, MORPHOLOGY AND OPTICAL BAND GAP OF CADMIUM SULPH...
Ad

Similar to Tweeting for Hillary - DS 501 case study 1 (16)

PPTX
Can Digital Data help predict the results of the US elections?
PDF
What do you do with 280 million tweets from the 2016 U.S. election?
PDF
Politics targets report
PDF
Event Analysis on the 2016 U.S. Presidential Election Using Social Media
PPTX
Effects of social media in political campaigns
PDF
Twitter Sentiment and Network Analysis
PDF
Knowledge base enabled Information Filtering on Social Web -- EMC
PDF
Demo
PDF
Trumping the Polls: Event Analysis During the 2016 Presidential Election
PDF
Twitter Analysis: Fake News
PDF
Using Tweets for Understanding Public Opinion During U.S. Primaries and Predi...
PDF
Document(2)
PPTX
Beyond the hashtags
PPTX
Twitter Data Analytics
PDF
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
PPTX
Knoesis-Semantic filtering-Tutorials
Can Digital Data help predict the results of the US elections?
What do you do with 280 million tweets from the 2016 U.S. election?
Politics targets report
Event Analysis on the 2016 U.S. Presidential Election Using Social Media
Effects of social media in political campaigns
Twitter Sentiment and Network Analysis
Knowledge base enabled Information Filtering on Social Web -- EMC
Demo
Trumping the Polls: Event Analysis During the 2016 Presidential Election
Twitter Analysis: Fake News
Using Tweets for Understanding Public Opinion During U.S. Primaries and Predi...
Document(2)
Beyond the hashtags
Twitter Data Analytics
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Knoesis-Semantic filtering-Tutorials
Ad

More from Yousef Fadila (12)

PPTX
Trackster Pruning at the CMS High-Granularity Calorimeter
PDF
Synergy on the Blockchain! whitepaper
PDF
Synergy Platform Whitepaper alpha
PDF
Recommandation systems -
PPTX
Analysis on steam platform
PPTX
interactive voting based map matching algorithm
PPTX
co-Hadoop: Data co-location on Hadoop.
PPTX
Textual & Sentiment Analysis of Movie Reviews
PPTX
Anomaly Detection - Catch me if you can
PPTX
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
PPT
Innovative thinking التفكير الابداعي
PDF
Am i overpaying - business proposal
Trackster Pruning at the CMS High-Granularity Calorimeter
Synergy on the Blockchain! whitepaper
Synergy Platform Whitepaper alpha
Recommandation systems -
Analysis on steam platform
interactive voting based map matching algorithm
co-Hadoop: Data co-location on Hadoop.
Textual & Sentiment Analysis of Movie Reviews
Anomaly Detection - Catch me if you can
CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1
Innovative thinking التفكير الابداعي
Am i overpaying - business proposal

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Quality review (1)_presentation of this 21
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Lecture1 pattern recognition............
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Business Acumen Training GuidePresentation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Miokarditis (Inflamasi pada Otot Jantung)
Clinical guidelines as a resource for EBP(1).pdf
climate analysis of Dhaka ,Banglades.pptx
Quality review (1)_presentation of this 21
Fluorescence-microscope_Botany_detailed content
Introduction-to-Cloud-ComputingFinal.pptx
Lecture1 pattern recognition............
Acceptance and paychological effects of mandatory extra coach I classes.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Reliability_Chapter_ presentation 1221.5784
oil_refinery_comprehensive_20250804084928 (1).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
.pdf is not working space design for the following data for the following dat...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Moving the Public Sector (Government) to a Digital Adoption
Business Acumen Training GuidePresentation.pptx

Tweeting for Hillary - DS 501 case study 1

  • 1. Tweeting for Hillary Li Meng, Matt Beaulieu, ML Tlachac, Yousef Fadila DS 501 : Introduction To Data Science – Case Study 1: Collecting Data from Twitter https://github.com/yousef-fadila/casestudy1/blob/master/CaseStudy1.ipynb
  • 2. “The more compelling campaign is a direct result of better data collection, analysis and smart decision making” -PromptCloud
  • 3. Motivation Social media is a means for getting political news, and initiating political discussion Being able to interpret data with regards to the election would give a campaign manager live feedback on how their candidates actions likely impact polling This allows them to gain an advantage by reacting accordingly to changing political climates
  • 4. The Data Pulled about 15.5K Tweets from the twitter streaming API Filter based on: Language: en Tweets mentioning @Hillary Clinton Can then process hashtags, mentions, and relevant words, to
  • 5. Most Frequent Words Appearances Word 1240 trump 915 hillary 113 benghazi 346 cant 142 didnt 252 doesnt 146 poorest 117 trumps 130 wont 259 pneumonia 87 footing 192 liar 232 donors 541 dont 45 dnc Appearances Word 245 thats 91 isnt 41 tweet 63 ive 85 nypd 142 systematically 66 whats 68 cough 61 hypocrisy 32 dishonesty 103 crooked 40 theres 47 stamina 66 unfit 30 scum
  • 6. Types of Frequent Words 1. Opponent: trump, trumps 2. Criticism: unfit, liar, hypocrisy 3. Topics: bodyguards, benghazi, poorest, blackmail, pneumonia, audiobooks 4. Patterns: cant, doesnt, didnt, wont, dont, isnt
  • 8. Entity Popularity Screen Name Mentions HillaryClinton 15421 RealDonaldTrump 2718 FoxNews 1532 POTUS 503 CNN 481 politico 283 timkaine 263 FLOTUS 245 MSNBC 244 USAneedsTRUMP 235 Popular Mentions with @HillaryClinton Popular #hashtags with @HillaryClinton Hashtag Count #MAGA 385 #ImWithHer 351 #SpecialReport 209 #NeverHillary 178 #DNCLeak 177 #HispanicHeritageMonth 163 #tcot 156 #Trump 149 #TrumpPence16 125 #HillaryHealth 102
  • 10. Hillary’s Friends ID Screen Name 571202103 Medium 21337440 ChildDefender 23449384 amberdiscko 128790234 Samynemir 1656913327 sarajacobs89 325886383 SammyKoppelman 802430450 Natasha_S_Law 729761993461248000 ktvibbs 115740215 SarahAudelo 34782406 Lincoln_Ross 3044781131 HillaryforAR 113298560 GunaRockYa 15972271 CdotDukes 582037089 MiguelAyala312 734768872625188864 AndrewBatesNC 41021335 TroyClair 4736170399 BrianZuzenak 150885854 SarahPeckVA 231673 yianni 125083946 GillDrummond ● Communication Directors ● Charities ● Media Websites ● United States Senators ● etc.
  • 11. Sentiment Analysis Using Python’s NLTK text classifier, classified each tweet as “Positive”, “Negative”, or “Neutral”. Could give an idea of how “twitter” felt about Hillary Clinton Positive Neutral Negative
  • 12. Geographic Analysis Using the “positivity” of each tweet, we formed a ratio of positive and negative tweets, and compared it national polling data, to see how tweet hashtags related to polling data, if at all.
  • 13. Sentiment Analysis on Text Hashtags in Positive Tweets Count #HispanicHeritageMonth 118 #ImWithHer 107 #MAGA 72 #tcot 65 #Democrats 50 #RedNationRising 46 #WakeUpAmerica 43 #NeverHillary 32 #HillaryClinton 31 Hashtags in Negative Tweets Count #ImWithHer 74 #LatinosWithTrump 51 #AmericansUnitedForTrump 49 #MAGA 42 #NeverHillary 39 #CrookedHillary 38 ● Broke down the most popular hashtags in positive and negative tweets ● Some hashtags, in either table, seemed out of place ● This could be part of the source of error in the sentiment classification
  • 14. Sentiment analysis on Hashtags Manually identify positive and negative hashtags, and use this to determine popular words in tweets containing those hashtags in order to re-train the NLTK alogrithim Positive Hashtags include... ● Never Trump ● Hillary2016 ● StrongerTogether ● Vote ● UnitedBlue Negative Hashtags include... ● MAGA ● NeverHillary ● CrookedHillary ● LatinoswithTrump ● AmericansUnitedwithTrump
  • 16. Conclusions Word frequency analysis revealed relevant tweets to Clinton, and issues that she could consider addressing, or at least know what’s being talked about. Judging tweets by positive or negative sentiment gave mixed results. Training the positive and negative classifier on positive or negative hashtags proved more insightful. Ultimately, 15.5K tweets is not enough data, especially when separating it by state. Twitter has great potential to be useful to campaigns.
  • 17. Thank You Questions? Source code and Charts: https://github.com/yousef-fadila/casestudy1/blob/master/CaseStudy1.ipynb