Five Steps to Search and Store
Tweets by Keywords
• Created by The Curiosity Bits Blog (curiositybits.com)
• With the support from Dr. Gregory D. Saxton
(http://social-metrics.org/ )
The output you will get…
Let’s say I want to study Twitter discussions of the missing Malaysian airliner
MH370. I plan to gather all tweets that include the keywords MH370 or
Malaysian.
You will get an ample amount of metadata for each tweet. Here is a breakdown
of each metadata type:
name Def.
tweet_id The unique identifier for a tweet
inserted_date When the tweet is downloaded into your database
language language
retweeted_status Is the tweet a RETWEET?
content The content of the tweet
from_user_scree
n_name
The screen name of the tweet sender
name Def.
from_user_followers_count The number of followers the sender has
from_user_friends_count The number of users the sender is following
from_user_listed_count How many times the sender is listed
from_user_statuses_count The number of tweets sent by the sender
from_user_description The profile bio of the sender
from_user_location The location of the sender
from_user_created_at When the Twitter account is created
retweet_count How many times the tweet is retweeted
entities_urls The URLs included in the tweet
entities_urls_count The number of URLs included in the tweet
entities_hashtags The hashtags included in the tweet
entities_hashtags_count The number of hashtags in the tweet
entities_mentions The screen-names mentioned in a tweet
name Def.
in_reply_to_screen_name The screen name of the user who is replied to
by the sender
in_reply_to_status_id The unique identifier of a reply
entities_expanded_urls Complete URLs extracted from short URLs
json_output The ENTIRE metadata in JSON format,
including metadata not parsed into columns
entities_media_count NA
media_expanded_url NA
media_url NA
media_type NA
video_link NA
photo_link NA
twitpic NA
Step 1: Checklist
• Do you know how to install necessary Python
libraries? If not, please review pg.8 in
http://curiositybits.com/python-for-mining-the-social-web/python-
tutorial-mining-twitter-user-profile/
• Do you know how to browse and edit SQLite
database through SQLite Database Browser? If not,
please review pg.10-14 in http://curiositybits.com/python-for-
mining-the-social-web/python-tutorial-mining-twitter-user-profile/
Download the code
https://drive.google.com/file/d/0Bwwg6GLCW_I
Pdm1mcHNXeU85Nkk/edit?usp=sharing
Have you installed these necessary
Python libraries?
Step 1: Checklist
Step 1: Checklist
Most importantly, we need to install a Twitter mining
library called Twython
(https://twython.readthedocs.org/en/latest/index.html)
Step 2: enter the search terms
You can enter multiple search terms, separated by comas. Please notice
that the last search term ends by a coma.
You can enter non-English search terms. But make sure the Python
script starts by the following block of code:
Step 3: enter your API keys
API Key
API secret
Access token
Access token secret
Enter the key inside the quotation marks
Step 3: enter your API keys
• Set up your API keys - 1
First, go to https://dev.twitter.com/, and sign in your
Twitter account. Go to my applications page to create
an application.
Step 3: enter your API keys
• Set up your API keys - 2
Enter any name that makes sense to you
Enter any text that makes sense to you
you can enter any legitimate URL, here, I put in the URL of my institution.
Same as above, you can enter any legitimate
URL, here, I put in the URL of my institution.
Step 4: change the parameter
result_type defined by the Twitter API Documents. Now, we
set it to recent, we can also set it to mixed or popular.
Step 4: change the parameter
Here is a list of parameters you can tweak or add:
https://dev.twitter.com/docs/api/1.1/get/search/tweets
For example, if you want to limit the search to Chinese, you
can add lang = ‘zh’
Step 4: change the parameter
For another example, if you want to limit the search to all
tweets sent until April 1 of 2014. You can add until = ‘2014-
04-01’
Step 5: set up SQLite database
• When you type in just a file name, the database will be
saved in the same folder with the Python script. You can
use a full file path such as
sqlite:///C:/xxxx/xxx/MH370.sqlite.
Hit RUN!
If you run the script daily or twice a day, you should be
good enough to cover all tweets generated on that day,
and tweets a few days old.
But, historical tweets are EXPENSIVE! Tweets older than
a week can be purchased through http://gnip.com/
Are we getting all the tweets?

More Related Content

PPTX
Five Steps to Get Facebook Engagement Indicators
PPTX
Five steps to get tweets sent by a list of users
PPTX
Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2
PPTX
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
PPTX
Python Tutorial-Mining imgur images
PPTX
R Class: Set up Social Media API
PDF
Collect twitter data using python
PDF
Collect twitter data using python
Five Steps to Get Facebook Engagement Indicators
Five steps to get tweets sent by a list of users
Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Python Tutorial-Mining imgur images
R Class: Set up Social Media API
Collect twitter data using python
Collect twitter data using python

What's hot (7)

PPTX
Android Presentation
PDF
Corporate Secret Challenge - CyberDefenders.org by Azad
PPTX
Browser Extensions
PPTX
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
PPT
Facebook 3rd Party Api
PPTX
R project(Analyze Twitter with R)
PPT
Advanced Search Engine Techniques
Android Presentation
Corporate Secret Challenge - CyberDefenders.org by Azad
Browser Extensions
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
Facebook 3rd Party Api
R project(Analyze Twitter with R)
Advanced Search Engine Techniques
Ad

Viewers also liked (20)

PDF
DIY basic Facebook data mining
PPTX
Predicting opinion leadership on twitter
PPTX
Network Structures For A Better Twitter Community
PPTX
Data Mining: Graph mining and social network analysis
PPTX
Data Mining: Graph mining and social network analysis
PDF
Data mining in social network
PPTX
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social Media
PPTX
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR)
PDF
Twitter analytics client reports
PDF
Comparing noSQL databases : benchmark
PDF
#internet30
PPTX
El scoring bancario en los tiempos del Big Data
PPTX
Tutorial Contoh Penggunaan API Twitter
PDF
Mining Facebook for Feelings
PDF
Data Mining in Facebook
PDF
Implicaciones de la evolución de las audiencias
PPTX
10 things I learned about Social video
PDF
FLTK Summer Course - Part VII - Seventh Impact
PDF
FLTK Summer Course - Part VI - Sixth Impact - Exercises
PDF
FLTK Summer Course - Part II - Second Impact - Exercises
DIY basic Facebook data mining
Predicting opinion leadership on twitter
Network Structures For A Better Twitter Community
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
Data mining in social network
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social Media
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR)
Twitter analytics client reports
Comparing noSQL databases : benchmark
#internet30
El scoring bancario en los tiempos del Big Data
Tutorial Contoh Penggunaan API Twitter
Mining Facebook for Feelings
Data Mining in Facebook
Implicaciones de la evolución de las audiencias
10 things I learned about Social video
FLTK Summer Course - Part VII - Seventh Impact
FLTK Summer Course - Part VI - Sixth Impact - Exercises
FLTK Summer Course - Part II - Second Impact - Exercises
Ad

Similar to Five steps to search and store tweets by keywords (20)

PDF
Unleashing twitter data for fun and insight
PDF
Unleashing Twitter Data for Fun and Insight
PDF
Mining Georeferenced Data
PDF
CSE5656 Complex Networks - Gathering Data from Twitter
PPTX
Twitter api
PDF
Application Programming Interfaces
PPTX
Social Media Data
PDF
Programming to the Twitter API: ReTweeter
KEY
YQL: Select * from Internet
PDF
Redis for your boss 2.0
PPT
what is-twitter
DOCX
Python report on twitter sentiment analysis
PDF
Tweet analyzer web applicaion
PDF
Build a Searchable Knowledge Base
PPT
John Conroy
KEY
Effective Use of the Twitter Search API
PPTX
How to get data from twitter (by hnnrrhm)
PPTX
Social Media Data Collection & Analysis
PPTX
Development of Twitter Application #5 - Users
PDF
Reverse Engineering Twitter Hashtag Algorithm
Unleashing twitter data for fun and insight
Unleashing Twitter Data for Fun and Insight
Mining Georeferenced Data
CSE5656 Complex Networks - Gathering Data from Twitter
Twitter api
Application Programming Interfaces
Social Media Data
Programming to the Twitter API: ReTweeter
YQL: Select * from Internet
Redis for your boss 2.0
what is-twitter
Python report on twitter sentiment analysis
Tweet analyzer web applicaion
Build a Searchable Knowledge Base
John Conroy
Effective Use of the Twitter Search API
How to get data from twitter (by hnnrrhm)
Social Media Data Collection & Analysis
Development of Twitter Application #5 - Users
Reverse Engineering Twitter Hashtag Algorithm

More from Weiai Wayne Xu (6)

PPTX
Big data, small data and everything in between
PPTX
Say search and sales e-cigar and big data
PPTX
Xu talk 3-17-2015
PPTX
The Networked Creativity in the Censored Web 2.0
PPTX
The Networked Cultural Diffusion of Kpop on YouTube
PPTX
What makes an image worth a thousand words NCA2014
Big data, small data and everything in between
Say search and sales e-cigar and big data
Xu talk 3-17-2015
The Networked Creativity in the Censored Web 2.0
The Networked Cultural Diffusion of Kpop on YouTube
What makes an image worth a thousand words NCA2014

Recently uploaded (20)

PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
IGGE1 Understanding the Self1234567891011
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Hazard Identification & Risk Assessment .pdf
PDF
Trump Administration's workforce development strategy
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
My India Quiz Book_20210205121199924.pdf
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Chinmaya Tiranga quiz Grand Finale.pdf
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Cambridge-Practice-Tests-for-IELTS-12.docx
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
IGGE1 Understanding the Self1234567891011
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
FORM 1 BIOLOGY MIND MAPS and their schemes
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Weekly quiz Compilation Jan -July 25.pdf
Hazard Identification & Risk Assessment .pdf
Trump Administration's workforce development strategy
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
History, Philosophy and sociology of education (1).pptx
Environmental Education MCQ BD2EE - Share Source.pdf
Practical Manual AGRO-233 Principles and Practices of Natural Farming
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
My India Quiz Book_20210205121199924.pdf

Five steps to search and store tweets by keywords

  • 1. Five Steps to Search and Store Tweets by Keywords • Created by The Curiosity Bits Blog (curiositybits.com) • With the support from Dr. Gregory D. Saxton (http://social-metrics.org/ )
  • 2. The output you will get… Let’s say I want to study Twitter discussions of the missing Malaysian airliner MH370. I plan to gather all tweets that include the keywords MH370 or Malaysian. You will get an ample amount of metadata for each tweet. Here is a breakdown of each metadata type: name Def. tweet_id The unique identifier for a tweet inserted_date When the tweet is downloaded into your database language language retweeted_status Is the tweet a RETWEET? content The content of the tweet from_user_scree n_name The screen name of the tweet sender
  • 3. name Def. from_user_followers_count The number of followers the sender has from_user_friends_count The number of users the sender is following from_user_listed_count How many times the sender is listed from_user_statuses_count The number of tweets sent by the sender from_user_description The profile bio of the sender from_user_location The location of the sender from_user_created_at When the Twitter account is created retweet_count How many times the tweet is retweeted entities_urls The URLs included in the tweet entities_urls_count The number of URLs included in the tweet entities_hashtags The hashtags included in the tweet entities_hashtags_count The number of hashtags in the tweet entities_mentions The screen-names mentioned in a tweet
  • 4. name Def. in_reply_to_screen_name The screen name of the user who is replied to by the sender in_reply_to_status_id The unique identifier of a reply entities_expanded_urls Complete URLs extracted from short URLs json_output The ENTIRE metadata in JSON format, including metadata not parsed into columns entities_media_count NA media_expanded_url NA media_url NA media_type NA video_link NA photo_link NA twitpic NA
  • 5. Step 1: Checklist • Do you know how to install necessary Python libraries? If not, please review pg.8 in http://curiositybits.com/python-for-mining-the-social-web/python- tutorial-mining-twitter-user-profile/ • Do you know how to browse and edit SQLite database through SQLite Database Browser? If not, please review pg.10-14 in http://curiositybits.com/python-for- mining-the-social-web/python-tutorial-mining-twitter-user-profile/ Download the code https://drive.google.com/file/d/0Bwwg6GLCW_I Pdm1mcHNXeU85Nkk/edit?usp=sharing
  • 6. Have you installed these necessary Python libraries? Step 1: Checklist
  • 7. Step 1: Checklist Most importantly, we need to install a Twitter mining library called Twython (https://twython.readthedocs.org/en/latest/index.html)
  • 8. Step 2: enter the search terms You can enter multiple search terms, separated by comas. Please notice that the last search term ends by a coma. You can enter non-English search terms. But make sure the Python script starts by the following block of code:
  • 9. Step 3: enter your API keys API Key API secret Access token Access token secret Enter the key inside the quotation marks
  • 10. Step 3: enter your API keys • Set up your API keys - 1 First, go to https://dev.twitter.com/, and sign in your Twitter account. Go to my applications page to create an application.
  • 11. Step 3: enter your API keys • Set up your API keys - 2 Enter any name that makes sense to you Enter any text that makes sense to you you can enter any legitimate URL, here, I put in the URL of my institution. Same as above, you can enter any legitimate URL, here, I put in the URL of my institution.
  • 12. Step 4: change the parameter result_type defined by the Twitter API Documents. Now, we set it to recent, we can also set it to mixed or popular.
  • 13. Step 4: change the parameter Here is a list of parameters you can tweak or add: https://dev.twitter.com/docs/api/1.1/get/search/tweets For example, if you want to limit the search to Chinese, you can add lang = ‘zh’
  • 14. Step 4: change the parameter For another example, if you want to limit the search to all tweets sent until April 1 of 2014. You can add until = ‘2014- 04-01’
  • 15. Step 5: set up SQLite database • When you type in just a file name, the database will be saved in the same folder with the Python script. You can use a full file path such as sqlite:///C:/xxxx/xxx/MH370.sqlite.
  • 17. If you run the script daily or twice a day, you should be good enough to cover all tweets generated on that day, and tweets a few days old. But, historical tweets are EXPENSIVE! Tweets older than a week can be purchased through http://gnip.com/ Are we getting all the tweets?