SlideShare a Scribd company logo
1

Why Twitter Is All The Rage:
A Data Miner's Perspective
Matthew A. Russell
O'Reilly Webcast
15 Oct 2013
2

Hello, My Name Is ... Matthew
Educated as a Computer Scientist
CTO @ Digital Reasoning Systems
Data mining; machine learning
Author @ O'Reilly Media
5 published books on technology
Principal @ Zaffra
Selective boutique consulting
3

Transforming Curiosity Into Insight
An open source software (OSS) project
http://bit.ly/MiningTheSocialWeb2E
A book
http://bit.ly/135dHfs
Accessible to (virtually) everyone
Virtual machine with turn-key coding
templates for data science experiments
Think of the book as "premium" support for
the OSS project
4

Overview
Background
Twitter as a data science platform
Politics, influence, world events
Data science tools for mining Twitter
Q&A
5

Background
6

Data Science

Data => Actionable information
Highly interdisciplinary
Nascent
Necessary

http://wikipedia.org/wiki/Data_science
7

Digital Signal Explosion
A model for the world: signal and sinks
Growth in data exhaust is accelerating
Digital fingerprints
"Software is eating the world"
Data mining opportunities galore...
8

Digital Data Stats
100 terabytes of data uploaded daily to Facebook.
Brands and organizations on Facebook receive 34,722 Likes every minute of
the day.
According to Twitter’s own research in early 2012, it sees roughly 175 million
tweets every day
30 Billion pieces of content shared on Facebook every month.
Data production will be 44 times greater in 2020 than it was in 2009
According to estimates, the volume of business data worldwide, across all
companies, doubles every 1.2 years.
See http://wikibon.org/blog/big-data-statistics
9

Social Media Is All the Rage
World population: ~7B people
Facebook: 1.15B users
Twitter: 500M users
Google+ 343M users
LinkedIn: 238M users
~200M+ blogs (conservative estimate)
10

Why Does Social Media Matter?
It's the frontier for predictive analytics
Understanding world events
Swaying political elections
Modeling human behavior
Analyzing sentiment
Making intelligent recommendations
11

Twitter Is All the Rage
It satisfies fundamental human desires
We want to be heard
We want to satisfy our curiosity
We want it easy
We want it now
Accessible, rich, and (mostly) "open" data
RESTful APIs and JSON responses
Great proving ground for predictive analytics
12

Twitter's Network Dynamics
500M curious users
100M curious users actively engaging
Real-time communication
Short, sweet, ... and fast
Asymmetric Following Model
An interest graph
13

Twitter as a data science platform
14

What's in a Tweet?
140 Characters ...
... Plus ~5KB of metadata!
Authorship
Time & location
Tweet "entities"
Replying, retweeting, favoriting, etc.
15

Twitter and Facebook Compared
Twitter

Facebook

Accounts Types: "Anything"

Accounts Types: People & Pages

"Following" Relationships

Mutual Connections

Favorites

"Likes"

Retweets

"Shares"

Replies

"Comments"

(Almost) No Privacy Controls

Extensive Privacy Controls
16

Social Network Mechanics

Roberto

Mercedes

Jorge

Nina

Ana
17

Interest Graph Mechanics
U2

Roberto

Mercedes

Juan
Luis
Luís
Guerra

Ana

Jorge

Nina
18

A (Social) Interest Graph
U2

Roberto

Mercedes

Juan
Luis
Luís
Guerra

Ana

Jorge

Nina
19

A (Political) Interest Graph
Johnny
Araya
Roberto

Mercedes

Rodolfo
Hernández

Ana

Jorge

Nina
20

Costa Rican Presidential Candidates

@ElDoctor2014

@Johnny_Araya
21

~3 Months on Twitter
Aug 2013

Sept 2013

% Change

Johnny Araya

14,573

15,506

6.40%

Otto Guevara
Guth

114

159

39.47%

José María
Villalta FlorezEstrada

8,160

8,990

10.17%

745

858

15.17%

1,192

1,487

24.75%

Dr. Rodolfo
Hernández
Luis Guillermo
Solís Rivera
22

Who are Candidates Following?
23

What are Candidates Tweeting?
24

Potential Influence
25

Potential Twitter Influence
Araya

Hernández

Followers

~14k

~750

Theoretical
Reach

~40M

~550k

Reach (10)

490

673

Reach (100)

289

702

Reach (1000)

2782

X

Reach (10,000)

2832

X

"Suspect"
Followers

3,246

94

See also http://wp.me/p3QiJd-2a
26

Considerations for Measuring Influence
Spam bot accounts that effectively are zombies and can’t be harnessed for any
utility at all
Inactive or abandoned accounts that can’t influence or be influenced since they
are not in use
Accounts that follow so many other accounts that the likelihood of getting
noticed (and thus influencing) is practically zero
The network effects of retweets by accounts that are active and can be
influenced to spread a message
See also http://wp.me/p3QiJd-2a
27

Social Media Popularity: Araya vs Hernández

Twitter Popularity

Facebook Popularity

Araya%

Araya%

Hernandez%

Hernandez%
28

Realtime Analysis: #Syria

Monitor Twitter's firehose for realtime data using filters such as #Syria
Keep in mind the sheer volume of data can be considerable
Analysis at MiningTheSocialWeb.com
29

#Syria: Who?

See http://wp.me/p3QiJd-1I
30

#Syria: Who?

See http://wp.me/p3QiJd-1I
31

#Syria: Who?

See http://wp.me/p3QiJd-1I
32

#Syria: What?

See http://wp.me/p3QiJd-1I
33

#Syria: What?

See http://wp.me/p3QiJd-1I
34

#Syria: Where?

See http://wp.me/p3QiJd-1I
35

#Syria: When?

See http://wp.me/p3QiJd-1I
36

#Syria: Why?

That's for you (as the data scientist) to decide
Quantitative automation can amplify human intelligence
Qualitative analysis is still requires human intelligence
37

Data science tools for mining Twitter
38

MTSW Virtual Machine Experience
Goal: Make it easy to transform curiosity into insight
Vagrant-based virtual machine
Virtualbox or AWS
IPython Notebook User Experience
Point-and-click GUI
100+ turn-key examples and templates
Social web mining for the masses
39

Social Media Analysis Framework
A memorable four step process to guide data science experiments:
Aspire
Acquire
Analyze
Summarize
40
41
42
43
44

Free Resources
Mining the Social Web 2E Chapter 1 (Chimera)
http://bit.ly/13XgNWR
Source Code (GitHub)
http://bit.ly/MiningTheSocialWeb2E
http://bit.ly/1fVf5ej (numbered examples)
Screencasts (Vimeo)
http://bit.ly/mtsw2e-screencasts
http://MiningTheSocialWeb.com
45

Q&A

More Related Content

PDF
Mining Social Web Data Like a Pro: Four Steps to Success
PDF
Mining the Social Web for Fun and Profit: A Getting Started Guide
PDF
Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)
PDF
Mining Social Web APIs with IPython Notebook (Strata 2013)
PDF
Mining the Social Web for Fun and Profit: A Getting Started Guide
PDF
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
PDF
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
PDF
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web Data Like a Pro: Four Steps to Success
Mining the Social Web for Fun and Profit: A Getting Started Guide
Why Twitter Is All The Rage: A Data Miner's Perspective (PyTN 2014)
Mining Social Web APIs with IPython Notebook (Strata 2013)
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Mining Social Web APIs with IPython Notebook (Data Day Texas 2015)
Mining Social Web APIs with IPython Notebook (PyCon 2014)

What's hot (12)

PDF
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
PDF
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
PDF
Analyzing social media with Python and other tools (4/4)
PPTX
Programming for Everybody in Python
PDF
Social Web 2014: Final Presentations (Part I)
PDF
Data Visualization: A Quick Tour for Data Science Enthusiasts
PDF
Jan 2010 Twitter Effectiveness Preso
PDF
DIY basic Facebook data mining
PPT
Tweeting and Texting
PPTX
Jonathan bright - collecting social media data with the python programming la...
PPT
Module 02.Spreadable media
PDF
GeospatialDataAnalysis
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Analyzing social media with Python and other tools (4/4)
Programming for Everybody in Python
Social Web 2014: Final Presentations (Part I)
Data Visualization: A Quick Tour for Data Science Enthusiasts
Jan 2010 Twitter Effectiveness Preso
DIY basic Facebook data mining
Tweeting and Texting
Jonathan bright - collecting social media data with the python programming la...
Module 02.Spreadable media
GeospatialDataAnalysis
Ad

Similar to Why Twitter Is All the Rage: A Data Miner's Perspective (20)

PPT
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
PDF
Challenges in-archiving-twitter
PDF
Eavesdropping on the Twitter Microblogging Site
PPT
The evolution of research on social media
PPT
Navca Twitter
PPTX
PPT
What's New In Communication 2009
PPT
John Conroy
PPTX
NASW Workshop: The Secret Life of Social Media
PDF
creating a trading zone around twitter srchives. case study: paris attacks
PDF
Top Uses of Twitter Data
PPTX
Twitter Presentation
PPTX
Rob Procter
PPT
New tools twitter
PPTX
Jumping on the Twitter Bandwagon
PPT
Keeping up: strategic use of online social networks for librarian current awa...
PPTX
Twitter Presentation
PDF
Data and Journalism
PPT
Digital Day Presentation Social Media Monitoring
PPTX
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Challenges in-archiving-twitter
Eavesdropping on the Twitter Microblogging Site
The evolution of research on social media
Navca Twitter
What's New In Communication 2009
John Conroy
NASW Workshop: The Secret Life of Social Media
creating a trading zone around twitter srchives. case study: paris attacks
Top Uses of Twitter Data
Twitter Presentation
Rob Procter
New tools twitter
Jumping on the Twitter Bandwagon
Keeping up: strategic use of online social networks for librarian current awa...
Twitter Presentation
Data and Journalism
Digital Day Presentation Social Media Monitoring
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Ad

Recently uploaded (20)

PDF
August Patch Tuesday
PPTX
Modernising the Digital Integration Hub
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Architecture types and enterprise applications.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
1. Introduction to Computer Programming.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
August Patch Tuesday
Modernising the Digital Integration Hub
Programs and apps: productivity, graphics, security and other tools
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Chapter 5: Probability Theory and Statistics
Getting started with AI Agents and Multi-Agent Systems
Architecture types and enterprise applications.pdf
Enhancing emotion recognition model for a student engagement use case through...
O2C Customer Invoices to Receipt V15A.pptx
Assigned Numbers - 2025 - Bluetooth® Document
1. Introduction to Computer Programming.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
1 - Historical Antecedents, Social Consideration.pdf
cloud_computing_Infrastucture_as_cloud_p
NewMind AI Weekly Chronicles - August'25-Week II
Zenith AI: Advanced Artificial Intelligence
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...

Why Twitter Is All the Rage: A Data Miner's Perspective