SlideShare a Scribd company logo
DETECTING A HACKED
TWEET
with Machine Learning and Artificial Intelligence
Sponsored by
Kory Becker 2015
http://primaryobjects.com/cms/article158
http://linkedin.com/in/korybecker
http://twitter.com/primaryobjects
APRIL 23, 2013 1:15PM
143 POINT DROP
ALL YOUR DATA ARE BELONG TO US
 Accord.NET SVM, Tried Gaussian (96%), then linear (97%) kernel
 Extract Tweets with TweetSharp
 Create Document Corpus (6,054 tweets)
 Create Vocabulary (2,225 words)
 Digitize Corpus
 Porter-Stemmer (“talking” => “talk”, “explosion” => “explos”)
 Term Frequency Inverse Document Frequency (TF*IDF)
 Word Existence
 Vector Size = Vocabulary Size | Matrix = double[6054][2225]
ACCURACY
100% TRAINING
97.38% CV
96.23% TEST
CONCLUSION
Kory Becker
http://linkedin.com/in/korybecker
http://twitter.com/primaryobjects
Detecting a Hacked Tweet with Machine Learning
http://primaryobjects.com/CMS/Article158
An Intelligent Approach to Image
Classification By Color
http://primaryobjects.com/CMS/Article154
Self-Programming Artificial Intelligence
http://primaryobjects.com/CMS/Article149

More Related Content

PDF
Drupal Security: How to survive Drupalgeddon and prepare for future (European...
PPTX
IBM Watson Concept Insights
PPTX
All your types are belong to us!
PPTX
Self Programming Artificial Intelligence
PDF
Marketing Week Live 2017
PPSX
Alchemy Catalyst 10 - What's New
PDF
Knowtech2013 peter schuett_ibm_resonanzgesellschaft
PDF
Watson Marketing 2017 Research
Drupal Security: How to survive Drupalgeddon and prepare for future (European...
IBM Watson Concept Insights
All your types are belong to us!
Self Programming Artificial Intelligence
Marketing Week Live 2017
Alchemy Catalyst 10 - What's New
Knowtech2013 peter schuett_ibm_resonanzgesellschaft
Watson Marketing 2017 Research

Viewers also liked (7)

PDF
Cognitive Computing.PDF
PDF
The New Era of Cognitive Computing
PDF
IBM Watson Overview
PDF
IBM Watson Analytics Presentation
PPTX
IBM Internet of Things Offerings
PPT
The Future is Artificial Intelligence, David Cole, IBM Watson
PDF
TEDx Manchester: AI & The Future of Work
Cognitive Computing.PDF
The New Era of Cognitive Computing
IBM Watson Overview
IBM Watson Analytics Presentation
IBM Internet of Things Offerings
The Future is Artificial Intelligence, David Cole, IBM Watson
TEDx Manchester: AI & The Future of Work

More from Kory Becker (11)

PPTX
Intelligent Heuristics for the Game Isolation
PPTX
Tips for Submitting a Proposal to Grace Hopper GHC 2020
PPTX
Grace Hopper 2019 Quantum Computing Recap
PPTX
An Introduction to Quantum Computing - Hopper X1 NYC 2019
PPTX
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
PPTX
2017 CodeFest Wrap-up Presentation
PPTX
Discovering Trending Topics in News - 2017 Edition
PPTX
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
PPTX
Self Programming Artificial Intelligence - Lightning Talk
PPTX
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
PPTX
Machine Learning in a Flash: An Introduction to Natural Language Processing
Intelligent Heuristics for the Game Isolation
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Grace Hopper 2019 Quantum Computing Recap
An Introduction to Quantum Computing - Hopper X1 NYC 2019
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
2017 CodeFest Wrap-up Presentation
Discovering Trending Topics in News - 2017 Edition
Machine Learning in a Flash (Extended Edition 2): An Introduction to Neural N...
Self Programming Artificial Intelligence - Lightning Talk
Machine Learning in a Flash (Extended Edition): An Introduction to Natural La...
Machine Learning in a Flash: An Introduction to Natural Language Processing

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
A Presentation on Artificial Intelligence
PPTX
1. Introduction to Computer Programming.pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Hybrid model detection and classification of lung cancer
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
project resource management chapter-09.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
Encapsulation_ Review paper, used for researhc scholars
Unlocking AI with Model Context Protocol (MCP)
A Presentation on Artificial Intelligence
1. Introduction to Computer Programming.pptx
Enhancing emotion recognition model for a student engagement use case through...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Hybrid model detection and classification of lung cancer
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A comparative study of natural language inference in Swahili using monolingua...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
SOPHOS-XG Firewall Administrator PPT.pptx
1 - Historical Antecedents, Social Consideration.pdf
project resource management chapter-09.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Getting Started with Data Integration: FME Form 101
Group 1 Presentation -Planning and Decision Making .pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
WOOl fibre morphology and structure.pdf for textiles

Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)

  • 1. DETECTING A HACKED TWEET with Machine Learning and Artificial Intelligence Sponsored by Kory Becker 2015 http://primaryobjects.com/cms/article158 http://linkedin.com/in/korybecker http://twitter.com/primaryobjects
  • 2. APRIL 23, 2013 1:15PM 143 POINT DROP
  • 3. ALL YOUR DATA ARE BELONG TO US  Accord.NET SVM, Tried Gaussian (96%), then linear (97%) kernel  Extract Tweets with TweetSharp  Create Document Corpus (6,054 tweets)  Create Vocabulary (2,225 words)  Digitize Corpus  Porter-Stemmer (“talking” => “talk”, “explosion” => “explos”)  Term Frequency Inverse Document Frequency (TF*IDF)  Word Existence  Vector Size = Vocabulary Size | Matrix = double[6054][2225]
  • 5. CONCLUSION Kory Becker http://linkedin.com/in/korybecker http://twitter.com/primaryobjects Detecting a Hacked Tweet with Machine Learning http://primaryobjects.com/CMS/Article158 An Intelligent Approach to Image Classification By Color http://primaryobjects.com/CMS/Article154 Self-Programming Artificial Intelligence http://primaryobjects.com/CMS/Article149

Editor's Notes

  • #2: 1. Introduction My name is Kory Becker. I'm a Software Architect at The Associated Press. I develop web applications by day, and have a fascination with artificial intelligence. If you like, you can follow the (short) slides for this talk at slideshare.net/korybecker.
  • #3: 2. What? On April 23, 2013 the stock market experienced one of its biggest flash-crash drops of the year, with the Dow Jones industrial average falling 143 points (over 1%) in a matter of minutes. Unlike the 2012 stock market blip, this one wasn't caused by an individual trade, but rather by a single tweet from The Associated Press account on the social network, Twitter. The tweet, of course, wasn't written by AP, but rather by an impostor (claimed by the Syrian Electronic Army) who had temporarily gained control of the account. Could a computer program have detected the tweet as hacked? The tweet was "Breaking: Two Explosions in the White House and Barack Obama is injured". Now, there are a couple of specific characteristics about the text in question. The term "Breaking" has incorrect casing, coming from AP. It would usually be all capitals. The combination of "White House" + "and" + "Barack Obama" is rare. Maybe a computer could pick up on this? So, what did we do?
  • #4: 3. How? The idea was to write a program using artificial intelligence. Specifically, a machine learning algorithm with supervised learning. The computer would be given a list of tweets and be told whether a tweet is real or fake. It can then learn common terms in each category and (hopefully) figure out how to detect the hacked tweet. Using the Accord.NET machine learning library, I started by implementing a support vector machine (SVM) with a gaussian kernel. SVMs work with different kernels, and gaussian allows fitting data points in a variety of non-linear shapes (round, curvy, etc). I extracted tweets using the TweetSharp library. I created a document corpus of about 6,000 tweets and a vocabulary of about 2,000 words. The documents were digitized by tokenizing the tweets, running porter-stemmer to shorten words, and then creating a bag-of-words model. Each tweet's unique terms were added to the vocabulary. Then, you loop through each tweet and check each word against the vocabulary. If the word exists, you mark a 1 in an array for that tweet. If it doesn't exist, you mark a 0. You end up with an array of 1's and 0's for each tweet. This is perfect for training a machine learning program. To train and test the accuracy, the tweets were split into a training, cross validation, and test set. The computer uses the training set to learn which tweets it classifies right or wrong and fine-tune its model. It then runs against the cross validation set to see how it does on tweets that it hasn't trained on. So, what were the results?
  • #5: 4. Result? The gaussian kernel did pretty well. It scored 99.7% accuracy on the training set and 96% on the cross validation. The SVM was then switched to use a linear kernel. This bumped up the accuracy to 100% training and 97% cross validation. Ok, but did it detect the hacked tweet? The initial training set contained random tweets from AP and non-AP Twitter accounts. It correctly classified AP tweets, but failed on the particular hacked tweet. I fed the training set additional tweets, such as "-from:AP obama" and "-from:AP breaking" so it had knowledge of the actual topic. And what do you know, it worked!
  • #6: 5. Conclusion There are a lot more details in this project, including some cool learning curve charts and examples of tweets being classified. You can read my full article at http://www.primaryobjects.com/cms/article158 (the top link in the last slide). There are some code samples for setting up the SVM and you can even download the test set results. If you're curious about artificial intelligence, I also have some other interesting articles, including Self-Programming Artificial Intelligence (the last link in the slide), where a computer program uses genetic algorithms to successfully write its own computer programs. Scary stuff! In conclusion, my name is Kory Becker. Feel free to chat if you have any questions or connect online via @primaryobjects on Twitter or Kory Becker on LinkedIn. Thanks.