SlideShare a Scribd company logo
Created by The Curiosity Bits Blog (curiositybits.com)
Download the Python code used in the tutorial
Codes provided by Dr. Gregory D. Saxton
Mining Twitter User Profile on
Python
1
Prerequisite
Setting up API keys: pg.4-6
Installing necessary Python libraries: pg.7-8
Creating a list ofTwitter screen-names: pg.9
Setting up a SQLite Database to storeTwitter data: pg.10-14
But, if you are a Python newbie, so let’s start with the
very basics.
2
We assume you are a Python newbie, so let’s start with the
very basics.
• Choosing the right Python platform: Python is a programing
language, but you can use different software packages to write, edit
and run Python codes. We choose Anaconda which is free to
download, and the Python version is 2.7.
• Once you install Anaconda, you can play around Python codes in
Spyder
3
Setting up API keys
• We need keys to getTwitter data throughTwitter API
(https://dev.twitter.com/).You need: API Key, API Secret, Access token,
Access token secret.
• First, go to https://dev.twitter.com/, and sign in yourTwitter account. Go
to my applications page to create an application.
4
Enter any name that makes sense to
you
Enter any text that makes sense to
you
you can enter any legitimate URL, here, I put in
the URL of my institution.
Same as above, you can enter any legitimate URL,
here, I put in the URL of my institution.
Setting up API keys
5
• After creating the app, go to API Keys page, scroll down to the
bottom and click Create my access token. Wait for a few minutes
and refresh the page, then you get all your keys!
Setting up API keys
you need API Key, API Secret, Access token, Access token secret.
6
Installing necessary Python libraries
Think of Python libraries as the apps running on your operating
system.To use our code, you need the following libraries:
• Simplejson (https://pypi.python.org/pypi/simplejson)
• Sqlite3 (http://sqlite.org/)
• Sqlalchemy (http://www.sqlalchemy.org/)
• Twython
(https://twython.readthedocs.org/en/latest/index.html)
7
Installing necessary Python libraries
To install the libraries, go to Start menu and type in CMD and run the CMD file as
administrator. Once you are on CMD, type in the command line pip install, followed by the
name of Python library. For example, to install Twython, you need to type pip install
twython, and press enter. Use this procedure to Install all necessary libraries.
8
• Our Python code enables gathering profile information for multiple
Twitter users. So, first let’s create a list of users.The list should be in
.csv format and contains three columns (in accordance to the
configuration in our Python code). Specially, it looks like this:
Creating a list ofTwitter screen-names
The first column lists sequential
numbers
the second column listsTwitter
screen-names you are interested
in
For the third column, I entered 1
all throughout, but you can leave
it blank.
9
Setting up a SQLite Database to storeTwitter data
You need a storage for incoming data fromTwitterAPI.That
is what databases are for.We use SQLite, a Python library
based on SQL. SQL is a common relational database
management system (RDBMS). In previous steps, you have
installed this sqlite library (sqlite3). On top of that, you can
download a database browser to view and edit the database
just like an Excel file.
Go to http://sqlitebrowser.sourceforge.net/ and download
SQLite Database Browser. It allows you to view and edit
SQLite databases. 10
Setting up a SQLite Database to storeTwitter data
Once you have the files downloaded, run the following file.
11
Setting up a SQLite Database to storeTwitter data
Now, we need to import theTwitter users list into a SQLite database.To do that,
create a new database. Remember the database file name because we need to
write that into Python code.
The default file extension for sqlite is .sqlite, to prevent future complications,
add the extension .sqlite when you save a file in SQLite database browser,.
12
File-Import-Table From CSV File, import the
.csv file you saved. Name the imported table as
accounts.This table name corresponds to the
one we will use in Python code. After you click
create, the csv list will be loaded into the
database, and you can browse it in Browse
Data. Lastly, remember to save the database.
Setting up a SQLite Database to storeTwitter data
Stay on the database file you just created.
13
Setting up a SQLite Database to storeTwitter data
Now, we need to modify the imported table.
Go to Edit-ModifyTables, then use Edit field
to change column names.To correspond to our
Python code, name the first column as rowed,
and FiledType as Integer; the second column
as screen_name, and Field type String, and the
third as user_type, and String. In the end, the
database table is defined as the screen-shoted.
14
Now, moving on to the actual Python code…
Download the Python code, and open it inAnaconda
15
There are only a few places you need to change, but let’s
walk through the code first…
The first block of code is to import necessary Python libraries
Make sure you have
installed all these
necessary libraries
16
The second block is where you need to enter the keys we have obtained in the
beginning. Just copy and paste the keys inside quotation mark.
API Key
API secret
Access token
Access token secret
17
The third block is where we define columns in SQLite database. For now, we do not
need to edit anything here.
18
The fourth block is where we ask the Python code to getTwitter user profile
information based on a list of users already saved in SQLite database. Here, you will
see that table names and the column names correspond to the ones we previously
saved in SQLite.
19
The fifth block is where we make specific request throughTwitter API to
get data:
Here, we ask Python to
get one recent status
from the listed user.This
procedure returns the
user’s profile
information.We will
discuss what profile
information is available
later on.
20
The raw output fromTwitter API is in JSON format. JSON is a standardized way of
storing information. Now we need to map the information in JSON format to the
tables in database. Notice that each column in the database represents aTwitter
output variable.
e.g. A Twitter user’s profile description is
stored as description under user in
JSON. This line of code maps the
profile description in JSON to the
database column named
from_user_description.
21
You need to change the file path and file name here
(RECOMMENDED).
If the Python file and your SQLite database are in the
same folder, just paste your database name here.
22
Now, you are ready to run the code. Go to Run, and choose Execute in a new dedicated
Python interpreter. The first option Execute in current Python or IPython interpreter
does not work on my end, but may be working on your computer.
23
Now, look at the right-side bar in Anaconda.
Oops, looks like I am getting error messages!
ERRORS!!
Don’t panic! Its likely you will hit roadblocks
when you run Python codes. So, it is important
to learn to debug.
For this error, it is likely because I saved the
Python file in a folder that is not a default
Python folder.
But what is default Python folder ?
24
the simple way to find out your default
Python folder is
• On a WINDOWS machine, In Start menu, right-click the Computer
and choose Properties
25
Folders listed
here are your
default Python
folders.
26
In my case, C:AnacondaLibsite-packages is my default Python folder. So I moved the
Python code there, edited the file path in the code, and ran it. Here you go, the code is
running and is getting what we want! If you go check the database file, you will see a
new table named typhoon is created (you can change the table name in the Python
code), and it includes the listed users’ recent tweets and profile information.
27
Oops! Error again!
Twitter API has rate limit.
Based on the version ofTwitter API in our
Python code, you can get 300ish users per
15 minutes. Once you hit the limit, you
will see the error message shown in the
screenshot.
There are two ways to deal with the
restriction:
1. wait for 15 minutes for another run;
2. create multipleTwitter apps and get
multiple keys. Once you use up the quota
in one run, paste in a new key to start a
new run!
28
If putting 0 here, the code starts with the user listed in the first row.
Because we will hit rate limit, you will need to run the code multiple times
to complete crawling all users on the list. Make sure to change the starting
row number!
For example, in the first run, you get user (0) to user (150), and hit rate
limit.You should put 151 in the second run to start with the user listed on
the 150th row. 29
A list ofTwitter output variables
Go to SQLite Database Browser and select the table typhoon (again, this is the name we
gave in Python code).You will see output variables across columns.
30
A list ofTwitter output variables
Some key variables related to user profile:
• from_user_screen_name: user’sTwitter screen-name
• from_user_followers_count: how many people are following the user
• from_user_friends_count: how many people this user is following
• from_user_listed_count: how many times the user is listed in other users’ public
lists
• from_user_favourites_count: how many times the user is favored (liked) by
other users
• from_user_statuses_count: how many tweets has the user sent
• from_user_description: the user’s profile bio
• from_user_location: location
• from_user_created_at: when is the account created
31
A list ofTwitter output variables
File – Export –Table as CSV to export the data into csv. format. Make sure to
add the .csv file extension name.
32
Please send your questions and comments to
weiaixu [at] buffalo dot edu
33

More Related Content

PPTX
Five steps to get tweets sent by a list of users
PPTX
Five steps to search and store tweets by keywords
PPTX
R Class: Set up Social Media API
PPTX
Five Steps to Get Facebook Engagement Indicators
PPTX
Python Tutorial-Mining imgur images
PPTX
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
PDF
Collect twitter data using python
PDF
Collect twitter data using python
Five steps to get tweets sent by a list of users
Five steps to search and store tweets by keywords
R Class: Set up Social Media API
Five Steps to Get Facebook Engagement Indicators
Python Tutorial-Mining imgur images
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Collect twitter data using python
Collect twitter data using python

What's hot (8)

PPTX
Android Presentation
PDF
Corporate Secret Challenge - CyberDefenders.org by Azad
PPTX
R project(Analyze Twitter with R)
PPTX
Browser Extensions
PPTX
Rozalia alik task2 math3 (new)
PDF
Introduction to Web Scraping with Python
PPTX
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
PDF
Installing Python on Windows OS
Android Presentation
Corporate Secret Challenge - CyberDefenders.org by Azad
R project(Analyze Twitter with R)
Browser Extensions
Rozalia alik task2 math3 (new)
Introduction to Web Scraping with Python
ESWC 2014 Tutorial Handson 1: Collect Data from Facebook
Installing Python on Windows OS
Ad

Viewers also liked (8)

PDF
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
PPTX
Predicting opinion leadership on twitter
PPTX
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR)
PPTX
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social Media
PPTX
Network Structures For A Better Twitter Community
PDF
Slideshare tutorial
PPTX
Basic tutorial how to use slideshare
PDF
Computational Social Science, Lecture 09: Data Wrangling
Mining Social Web APIs with IPython Notebook - Data Day Texas 2014
Predicting opinion leadership on twitter
How Do We Fight Email Phishing? (ICA2015 - San Juan, PR)
Predicting Social Capital in Nonprofits’ Stakeholder Engagement on Social Media
Network Structures For A Better Twitter Community
Slideshare tutorial
Basic tutorial how to use slideshare
Computational Social Science, Lecture 09: Data Wrangling
Ad

Similar to Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2 (20)

PPTX
Python from zero to hero (Twitter Explorer)
PDF
Unleashing Twitter Data for Fun and Insight
PDF
Unleashing twitter data for fun and insight
PPTX
python programming.pptx
PPTX
3 PYTHON INTERACTION WITH SQLITE (concept of python)
DOCX
Python report on twitter sentiment analysis
PPTX
Chapter -7.pptx
PPTX
ppt 2.pptxandxikcicncmk0wufjepfc09eufcdc
PDF
Introduction To Python
PDF
Mining Georeferenced Data
PDF
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
PDF
Build a Twitter Bot with Basic Python
PDF
AmI 2015 - Databases in Python
PPTX
Python and Oracle : allies for best of data management
PDF
Python Orientation
PPTX
Python Tutorial for Beginner
PDF
PyLecture2 -NetworkX-
PPTX
UNIT V PYTHON.pptx python basics ppt python
PDF
Introduction to python
PPTX
Python ppt
Python from zero to hero (Twitter Explorer)
Unleashing Twitter Data for Fun and Insight
Unleashing twitter data for fun and insight
python programming.pptx
3 PYTHON INTERACTION WITH SQLITE (concept of python)
Python report on twitter sentiment analysis
Chapter -7.pptx
ppt 2.pptxandxikcicncmk0wufjepfc09eufcdc
Introduction To Python
Mining Georeferenced Data
Aaa ped-6-Data manipulation: Data Files, and Data Cleaning & Preparation
Build a Twitter Bot with Basic Python
AmI 2015 - Databases in Python
Python and Oracle : allies for best of data management
Python Orientation
Python Tutorial for Beginner
PyLecture2 -NetworkX-
UNIT V PYTHON.pptx python basics ppt python
Introduction to python
Python ppt

More from Weiai Wayne Xu (6)

PPTX
Big data, small data and everything in between
PPTX
Say search and sales e-cigar and big data
PPTX
Xu talk 3-17-2015
PPTX
The Networked Creativity in the Censored Web 2.0
PPTX
The Networked Cultural Diffusion of Kpop on YouTube
PPTX
What makes an image worth a thousand words NCA2014
Big data, small data and everything in between
Say search and sales e-cigar and big data
Xu talk 3-17-2015
The Networked Creativity in the Censored Web 2.0
The Networked Cultural Diffusion of Kpop on YouTube
What makes an image worth a thousand words NCA2014

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Basic Mud Logging Guide for educational purpose
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Institutional Correction lecture only . . .
PPTX
Cell Types and Its function , kingdom of life
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Pre independence Education in Inndia.pdf
PPTX
Lesson notes of climatology university.
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
master seminar digital applications in india
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Final Presentation General Medicine 03-08-2024.pptx
Basic Mud Logging Guide for educational purpose
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Institutional Correction lecture only . . .
Cell Types and Its function , kingdom of life
Abdominal Access Techniques with Prof. Dr. R K Mishra
O5-L3 Freight Transport Ops (International) V1.pdf
Pre independence Education in Inndia.pdf
Lesson notes of climatology university.
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Module 4: Burden of Disease Tutorial Slides S2 2025
master seminar digital applications in india
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
TR - Agricultural Crops Production NC III.pdf
01-Introduction-to-Information-Management.pdf
Classroom Observation Tools for Teachers
Sports Quiz easy sports quiz sports quiz
Microbial diseases, their pathogenesis and prophylaxis
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf

Curiosity Bits Tutorial: Mining Twitter User Profile on Python V2

  • 1. Created by The Curiosity Bits Blog (curiositybits.com) Download the Python code used in the tutorial Codes provided by Dr. Gregory D. Saxton Mining Twitter User Profile on Python 1
  • 2. Prerequisite Setting up API keys: pg.4-6 Installing necessary Python libraries: pg.7-8 Creating a list ofTwitter screen-names: pg.9 Setting up a SQLite Database to storeTwitter data: pg.10-14 But, if you are a Python newbie, so let’s start with the very basics. 2
  • 3. We assume you are a Python newbie, so let’s start with the very basics. • Choosing the right Python platform: Python is a programing language, but you can use different software packages to write, edit and run Python codes. We choose Anaconda which is free to download, and the Python version is 2.7. • Once you install Anaconda, you can play around Python codes in Spyder 3
  • 4. Setting up API keys • We need keys to getTwitter data throughTwitter API (https://dev.twitter.com/).You need: API Key, API Secret, Access token, Access token secret. • First, go to https://dev.twitter.com/, and sign in yourTwitter account. Go to my applications page to create an application. 4
  • 5. Enter any name that makes sense to you Enter any text that makes sense to you you can enter any legitimate URL, here, I put in the URL of my institution. Same as above, you can enter any legitimate URL, here, I put in the URL of my institution. Setting up API keys 5
  • 6. • After creating the app, go to API Keys page, scroll down to the bottom and click Create my access token. Wait for a few minutes and refresh the page, then you get all your keys! Setting up API keys you need API Key, API Secret, Access token, Access token secret. 6
  • 7. Installing necessary Python libraries Think of Python libraries as the apps running on your operating system.To use our code, you need the following libraries: • Simplejson (https://pypi.python.org/pypi/simplejson) • Sqlite3 (http://sqlite.org/) • Sqlalchemy (http://www.sqlalchemy.org/) • Twython (https://twython.readthedocs.org/en/latest/index.html) 7
  • 8. Installing necessary Python libraries To install the libraries, go to Start menu and type in CMD and run the CMD file as administrator. Once you are on CMD, type in the command line pip install, followed by the name of Python library. For example, to install Twython, you need to type pip install twython, and press enter. Use this procedure to Install all necessary libraries. 8
  • 9. • Our Python code enables gathering profile information for multiple Twitter users. So, first let’s create a list of users.The list should be in .csv format and contains three columns (in accordance to the configuration in our Python code). Specially, it looks like this: Creating a list ofTwitter screen-names The first column lists sequential numbers the second column listsTwitter screen-names you are interested in For the third column, I entered 1 all throughout, but you can leave it blank. 9
  • 10. Setting up a SQLite Database to storeTwitter data You need a storage for incoming data fromTwitterAPI.That is what databases are for.We use SQLite, a Python library based on SQL. SQL is a common relational database management system (RDBMS). In previous steps, you have installed this sqlite library (sqlite3). On top of that, you can download a database browser to view and edit the database just like an Excel file. Go to http://sqlitebrowser.sourceforge.net/ and download SQLite Database Browser. It allows you to view and edit SQLite databases. 10
  • 11. Setting up a SQLite Database to storeTwitter data Once you have the files downloaded, run the following file. 11
  • 12. Setting up a SQLite Database to storeTwitter data Now, we need to import theTwitter users list into a SQLite database.To do that, create a new database. Remember the database file name because we need to write that into Python code. The default file extension for sqlite is .sqlite, to prevent future complications, add the extension .sqlite when you save a file in SQLite database browser,. 12
  • 13. File-Import-Table From CSV File, import the .csv file you saved. Name the imported table as accounts.This table name corresponds to the one we will use in Python code. After you click create, the csv list will be loaded into the database, and you can browse it in Browse Data. Lastly, remember to save the database. Setting up a SQLite Database to storeTwitter data Stay on the database file you just created. 13
  • 14. Setting up a SQLite Database to storeTwitter data Now, we need to modify the imported table. Go to Edit-ModifyTables, then use Edit field to change column names.To correspond to our Python code, name the first column as rowed, and FiledType as Integer; the second column as screen_name, and Field type String, and the third as user_type, and String. In the end, the database table is defined as the screen-shoted. 14
  • 15. Now, moving on to the actual Python code… Download the Python code, and open it inAnaconda 15
  • 16. There are only a few places you need to change, but let’s walk through the code first… The first block of code is to import necessary Python libraries Make sure you have installed all these necessary libraries 16
  • 17. The second block is where you need to enter the keys we have obtained in the beginning. Just copy and paste the keys inside quotation mark. API Key API secret Access token Access token secret 17
  • 18. The third block is where we define columns in SQLite database. For now, we do not need to edit anything here. 18
  • 19. The fourth block is where we ask the Python code to getTwitter user profile information based on a list of users already saved in SQLite database. Here, you will see that table names and the column names correspond to the ones we previously saved in SQLite. 19
  • 20. The fifth block is where we make specific request throughTwitter API to get data: Here, we ask Python to get one recent status from the listed user.This procedure returns the user’s profile information.We will discuss what profile information is available later on. 20
  • 21. The raw output fromTwitter API is in JSON format. JSON is a standardized way of storing information. Now we need to map the information in JSON format to the tables in database. Notice that each column in the database represents aTwitter output variable. e.g. A Twitter user’s profile description is stored as description under user in JSON. This line of code maps the profile description in JSON to the database column named from_user_description. 21
  • 22. You need to change the file path and file name here (RECOMMENDED). If the Python file and your SQLite database are in the same folder, just paste your database name here. 22
  • 23. Now, you are ready to run the code. Go to Run, and choose Execute in a new dedicated Python interpreter. The first option Execute in current Python or IPython interpreter does not work on my end, but may be working on your computer. 23
  • 24. Now, look at the right-side bar in Anaconda. Oops, looks like I am getting error messages! ERRORS!! Don’t panic! Its likely you will hit roadblocks when you run Python codes. So, it is important to learn to debug. For this error, it is likely because I saved the Python file in a folder that is not a default Python folder. But what is default Python folder ? 24
  • 25. the simple way to find out your default Python folder is • On a WINDOWS machine, In Start menu, right-click the Computer and choose Properties 25
  • 26. Folders listed here are your default Python folders. 26
  • 27. In my case, C:AnacondaLibsite-packages is my default Python folder. So I moved the Python code there, edited the file path in the code, and ran it. Here you go, the code is running and is getting what we want! If you go check the database file, you will see a new table named typhoon is created (you can change the table name in the Python code), and it includes the listed users’ recent tweets and profile information. 27
  • 28. Oops! Error again! Twitter API has rate limit. Based on the version ofTwitter API in our Python code, you can get 300ish users per 15 minutes. Once you hit the limit, you will see the error message shown in the screenshot. There are two ways to deal with the restriction: 1. wait for 15 minutes for another run; 2. create multipleTwitter apps and get multiple keys. Once you use up the quota in one run, paste in a new key to start a new run! 28
  • 29. If putting 0 here, the code starts with the user listed in the first row. Because we will hit rate limit, you will need to run the code multiple times to complete crawling all users on the list. Make sure to change the starting row number! For example, in the first run, you get user (0) to user (150), and hit rate limit.You should put 151 in the second run to start with the user listed on the 150th row. 29
  • 30. A list ofTwitter output variables Go to SQLite Database Browser and select the table typhoon (again, this is the name we gave in Python code).You will see output variables across columns. 30
  • 31. A list ofTwitter output variables Some key variables related to user profile: • from_user_screen_name: user’sTwitter screen-name • from_user_followers_count: how many people are following the user • from_user_friends_count: how many people this user is following • from_user_listed_count: how many times the user is listed in other users’ public lists • from_user_favourites_count: how many times the user is favored (liked) by other users • from_user_statuses_count: how many tweets has the user sent • from_user_description: the user’s profile bio • from_user_location: location • from_user_created_at: when is the account created 31
  • 32. A list ofTwitter output variables File – Export –Table as CSV to export the data into csv. format. Make sure to add the .csv file extension name. 32
  • 33. Please send your questions and comments to weiaixu [at] buffalo dot edu 33