SlideShare a Scribd company logo
Harvesting Data from
Twitter: Hands on
Experience
Dr. Nora alTwairesh, Ms. Tarfa alBuhairi, Ms. Mawaheb
alTuwaijri, and Ms. Afnan alMoammar
Content
• Introduction about Twitter API
• Some ready to use tools (no programming)
• Comparison between R and Python
• R
• Python
WHY TWITTER?!
Why Twitter
• Twitter has become a mass information hub that can be
used to study the evolution of any issue matter:
revolutionary machine
• Research disciplines that study Twitter data spanned
the domains of computer science, information science,
communications, business, economics, education,
medicine, political science, and sociology.
• Recent studies show that %60 of daily Arabic tweets
are from Saudi Arabia.
Why Twitter
Hamdy Mubarak and Kareem Darwish. 2014. Using Twitter to collect a multi-dialectal corpus of Arabic. ANLP 2014:1.
Twitter API
• Free access to the tweets posted in the last 7 days within a certain
rate-limit.
• Any tweets posted earlier than 7 days are considered historical
tweets and should be purchased through third party providers
• The Twitter API provides three interfaces for tweet collection:
Streaming API, REST API and Search API
Streaming API
• The Streaming API provides real-time tweets in a live-poll fashion.
• In a Streaming API, requested tweets will be constantly flowing as
they are posted on Twitter. It is delivered in three bandwidths:
“spritzer” :1%, “gardenhose”: 10% and “firehose”: 100% of all
tweets posted on Twitter.
• A regular user wanting to collect tweets will be granted spritzer
access.
REST API
• The REST API was specifically designed for programmatic access
to read and write Twitter data.
• Third party applications that interact with Twitter are provided with
a large set of methods in the REST API to develop these
applications.
• The access of the REST API is also rate-limited, the limit is 150
requests per hour.
Search API
• Similar to the REST API, the Search API is pull-based. It replicates
the search functionality provided on the Twitter website. However,
tweets retrieved are restricted to the past 7 days.
• the Search API is not appropriate for high-throughput real-time data
acquisition. As such Twitter Inc. discourages its use and plans to
discontinue it in the future.
Create a Twitter App
• To access the Twitter API you need to create a twitter app:
follow this simple tutorial to do so:
https://iag.me/socialmedia/how-to-create-a-twitter-app-in-8-
easy-steps/
• you will use the OAUTH settings in both R and Python:
• Consumer Key
• Consumer Secret
• OAuth Access Token
• OAuth Access Token Secret
Tools to Collect Tweets
• Nodexl: https://nodexl.codeplex.com/
• Tweet Archivist : https://www.tweetarchivist.com/
• Twitter Archiving Google Spreadsheet (TAGS):
https://tags.hawksey.info/
Harvesting Data from Twitter Workshop: Hands-on Experience
What is R?
•Roos & Robert.
16
Why R?
Statistics
Machine
Learning
Data
Analysis
Why R?
Statistics
Machine
Learning
Data
Analysis Also:
Programming
Language
R allows you to integrate with
Code
Code
C++
Code
Jave
Code
Python
Code
R
Fastest-growing language
https://www.r-bloggers.com/r-is-the-fastest-growing-language-on-stackoverflow/
fastest-growing language
Examples
Now ..
Open your laptop, please

Steps to install R
1: install R:
• https://cran.r-project.org/bin/windows/base/ ---- http://cran.r-
project.org/bin/macosx/
2: install RStudio (after installing R)
• https://www.rstudio.com/products/rstudio/download3/
3: Install these packages (see the user manual):
• streamR/ ROAuth/ RJSONIO/ RTextTools/ e1071/ SparseM.
User manual:
• http://www.devchakraborty.com/RunningRJafroc.pdf
R Packages list:
• https://cran.r-project.org/web/packages/available_packages_by_date.html
Developing Packages with RStudio:
• https://support.rstudio.com/hc/en-
us/articles/200486488?version=0.99.903&mode=desktop
• https://cran.r-project.org/doc/manuals/R-exts.html
Useful URLs
• https://www.r-bloggers.com
• https://www.r-bloggers.com/how-to-learn-r-2/
• http://www.slideshare.net/ChiuYW/r-language-tutorial
• https://www.rwaq.org/courses/introduction-r-
programming
• https://www.researchgate.net/publication/288485806_Hy
brid_Sentiment_Analyser_for_Arabic_Tweets_using_R
Python
• Two versions: 2.7 3.X
• Twitter packages: twitter -- -tweepy
• IDE :Anaconda: iPython notebook: Jupyter
Installing Python
• Install Anaconda from here
• https://www.continuum.io/downloads
choose Python 2.7 version (only for this tutorial)
• Install the twitter package: From the command line
(terminal) type: pip install twitter
Comparison between R and Python
• https://www.datacamp.com/community/tutorials/r-or-python-for-
data-analysis#gs.GuXGfAc
• http://blog.udacity.com/2015/01/python-vs-r-learn-first.html
• http://www.dataschool.io/python-or-r-for-data-science/
Contact Us
ASA Research Group
Twitter: @ASA__IU
Email: asa@imamu.edu.sa
Website: http://asa.imamu.edu.sa/
IWAN Research Group
Twitter: @IWAN_RG
Email: iwan@ksu.edu.sa
Website: http://iwan.ksu.edu.sa
Thank you,
See you later …
THE END ..

More Related Content

ODP
Twitter
PPTX
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
PPT
Arabic Text mining Classification
PPTX
Subjectivity and sentiment analysis of arabic trends and challenges
PDF
Sentiment Analysis for Arabic tweets
PPTX
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
PPTX
Sentiment analysis of arabic,a survey
ODP
Sentiments Analysis using Python and nltk
Twitter
[ASA] Sentiment Analysis in Twitter, a Study on the Saudi Community
Arabic Text mining Classification
Subjectivity and sentiment analysis of arabic trends and challenges
Sentiment Analysis for Arabic tweets
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
Sentiment analysis of arabic,a survey
Sentiments Analysis using Python and nltk

Viewers also liked (13)

PPTX
Semantic Patterns for Sentiment Analysis of Twitter
PDF
Trend detection and analysis on Twitter
PDF
Sentiment analysis-by-nltk
PPT
How Sentiment Analysis works
PDF
Introduction to Sentiment Analysis
DOCX
Twitter analysis by Kaify Rais
PPTX
Sentiment analysis of tweets
PPT
Twitter sentiment-analysis Jiit2013-14
PPTX
Sentiment Analysis in Twitter
PDF
Sentiment Analysis of Twitter Data
PPTX
Act2 hepn
PPT
Analisis sinyal syarifudin
PDF
Formasi jabatan kaltara
Semantic Patterns for Sentiment Analysis of Twitter
Trend detection and analysis on Twitter
Sentiment analysis-by-nltk
How Sentiment Analysis works
Introduction to Sentiment Analysis
Twitter analysis by Kaify Rais
Sentiment analysis of tweets
Twitter sentiment-analysis Jiit2013-14
Sentiment Analysis in Twitter
Sentiment Analysis of Twitter Data
Act2 hepn
Analisis sinyal syarifudin
Formasi jabatan kaltara
Ad

Similar to Harvesting Data from Twitter Workshop: Hands-on Experience (20)

PPTX
Twitter api
PDF
CSE5656 Complex Networks - Gathering Data from Twitter
PPTX
Sentiment analysis on demonetisation
PPTX
Social Media Data
PPTX
STC Summit 2015: API Documentation, an Example-Based Approach
PDF
PPTX
Development of Twitter Application #1 - Overview
PPTX
Data Collection from Social Media Platforms
PPTX
Rob Procter
PPTX
Webinar: Twitter For Recruiters
PPTX
Mz twitter-1.1-sdl
PPTX
All About Twitter!
PPTX
Twitter: A Hands-On Learning Session for Researcher
PPT
Recruitcamp Twitter
PPT
Recruitcamp Twitter
PPT
ReliefWeb's Journey from RSS Feed to Public API
PDF
PDF
PhishAri: Automatic Realtime Phishing Detection on Twitter
PPTX
Twitter topic trends
PDF
Collecting Twitter Data
Twitter api
CSE5656 Complex Networks - Gathering Data from Twitter
Sentiment analysis on demonetisation
Social Media Data
STC Summit 2015: API Documentation, an Example-Based Approach
Development of Twitter Application #1 - Overview
Data Collection from Social Media Platforms
Rob Procter
Webinar: Twitter For Recruiters
Mz twitter-1.1-sdl
All About Twitter!
Twitter: A Hands-On Learning Session for Researcher
Recruitcamp Twitter
Recruitcamp Twitter
ReliefWeb's Journey from RSS Feed to Public API
PhishAri: Automatic Realtime Phishing Detection on Twitter
Twitter topic trends
Collecting Twitter Data
Ad

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Machine Learning_overview_presentation.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Big Data Technologies - Introduction.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
1. Introduction to Computer Programming.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Tartificialntelligence_presentation.pptx
Machine learning based COVID-19 study performance prediction
SOPHOS-XG Firewall Administrator PPT.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MYSQL Presentation for SQL database connectivity
Programs and apps: productivity, graphics, security and other tools
Machine Learning_overview_presentation.pptx
Getting Started with Data Integration: FME Form 101
Big Data Technologies - Introduction.pptx
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectral efficient network and resource selection model in 5G networks
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology

Harvesting Data from Twitter Workshop: Hands-on Experience

  • 1. Harvesting Data from Twitter: Hands on Experience Dr. Nora alTwairesh, Ms. Tarfa alBuhairi, Ms. Mawaheb alTuwaijri, and Ms. Afnan alMoammar
  • 2. Content • Introduction about Twitter API • Some ready to use tools (no programming) • Comparison between R and Python • R • Python
  • 4. Why Twitter • Twitter has become a mass information hub that can be used to study the evolution of any issue matter: revolutionary machine • Research disciplines that study Twitter data spanned the domains of computer science, information science, communications, business, economics, education, medicine, political science, and sociology.
  • 5. • Recent studies show that %60 of daily Arabic tweets are from Saudi Arabia. Why Twitter Hamdy Mubarak and Kareem Darwish. 2014. Using Twitter to collect a multi-dialectal corpus of Arabic. ANLP 2014:1.
  • 6. Twitter API • Free access to the tweets posted in the last 7 days within a certain rate-limit. • Any tweets posted earlier than 7 days are considered historical tweets and should be purchased through third party providers • The Twitter API provides three interfaces for tweet collection: Streaming API, REST API and Search API
  • 7. Streaming API • The Streaming API provides real-time tweets in a live-poll fashion. • In a Streaming API, requested tweets will be constantly flowing as they are posted on Twitter. It is delivered in three bandwidths: “spritzer” :1%, “gardenhose”: 10% and “firehose”: 100% of all tweets posted on Twitter. • A regular user wanting to collect tweets will be granted spritzer access.
  • 8. REST API • The REST API was specifically designed for programmatic access to read and write Twitter data. • Third party applications that interact with Twitter are provided with a large set of methods in the REST API to develop these applications. • The access of the REST API is also rate-limited, the limit is 150 requests per hour.
  • 9. Search API • Similar to the REST API, the Search API is pull-based. It replicates the search functionality provided on the Twitter website. However, tweets retrieved are restricted to the past 7 days. • the Search API is not appropriate for high-throughput real-time data acquisition. As such Twitter Inc. discourages its use and plans to discontinue it in the future.
  • 10. Create a Twitter App • To access the Twitter API you need to create a twitter app: follow this simple tutorial to do so: https://iag.me/socialmedia/how-to-create-a-twitter-app-in-8- easy-steps/ • you will use the OAUTH settings in both R and Python: • Consumer Key • Consumer Secret • OAuth Access Token • OAuth Access Token Secret
  • 11. Tools to Collect Tweets • Nodexl: https://nodexl.codeplex.com/ • Tweet Archivist : https://www.tweetarchivist.com/ • Twitter Archiving Google Spreadsheet (TAGS): https://tags.hawksey.info/
  • 13. What is R? •Roos & Robert. 16
  • 16. R allows you to integrate with
  • 21. Now .. Open your laptop, please 
  • 22. Steps to install R 1: install R: • https://cran.r-project.org/bin/windows/base/ ---- http://cran.r- project.org/bin/macosx/ 2: install RStudio (after installing R) • https://www.rstudio.com/products/rstudio/download3/ 3: Install these packages (see the user manual): • streamR/ ROAuth/ RJSONIO/ RTextTools/ e1071/ SparseM. User manual: • http://www.devchakraborty.com/RunningRJafroc.pdf R Packages list: • https://cran.r-project.org/web/packages/available_packages_by_date.html Developing Packages with RStudio: • https://support.rstudio.com/hc/en- us/articles/200486488?version=0.99.903&mode=desktop • https://cran.r-project.org/doc/manuals/R-exts.html
  • 23. Useful URLs • https://www.r-bloggers.com • https://www.r-bloggers.com/how-to-learn-r-2/ • http://www.slideshare.net/ChiuYW/r-language-tutorial • https://www.rwaq.org/courses/introduction-r- programming • https://www.researchgate.net/publication/288485806_Hy brid_Sentiment_Analyser_for_Arabic_Tweets_using_R
  • 24. Python • Two versions: 2.7 3.X • Twitter packages: twitter -- -tweepy • IDE :Anaconda: iPython notebook: Jupyter
  • 25. Installing Python • Install Anaconda from here • https://www.continuum.io/downloads choose Python 2.7 version (only for this tutorial) • Install the twitter package: From the command line (terminal) type: pip install twitter
  • 26. Comparison between R and Python • https://www.datacamp.com/community/tutorials/r-or-python-for- data-analysis#gs.GuXGfAc • http://blog.udacity.com/2015/01/python-vs-r-learn-first.html • http://www.dataschool.io/python-or-r-for-data-science/
  • 27. Contact Us ASA Research Group Twitter: @ASA__IU Email: asa@imamu.edu.sa Website: http://asa.imamu.edu.sa/ IWAN Research Group Twitter: @IWAN_RG Email: iwan@ksu.edu.sa Website: http://iwan.ksu.edu.sa
  • 28. Thank you, See you later … THE END ..

Editor's Notes

  • #8: Gardenhose access is granted on special request from Twitter Inc. and the firehose access is granted to third-party business partners of Twitter Inc. which are considered third-party data providers.
  • #9: An example of these applications is the inclusion of a Tweet share button on some websites that allows the reader of this website to share the link of the website on Twitter by posting it as a tweet; this is an example of writing Twitter data. An example of reading Twitter data is when websites display tweets of a certain hashtag or user account in a widget on their website’s pages.
  • #10: With the Search API you can only sent 180 Requests every 15 min timeframe. With a maximum number of 100 tweets per Request this means you can mine for 4 x 180 x 100 = 72.000 tweets per hour. 
  • #14: The project was conceived in 1992, with an initial version released in 1994 and a stable beta version in 2000.
  • #15: R is the leading tool for statistics, data analysis, and machine learning.
  • #16: R is the leading tool for statistics, data analysis, and machine learning.
  • #17: R allows you to integrate with other languages (C/C++, Java, Python)