SlideShare a Scribd company logo
Data Extraction tools
#SMARTdatasprint, 2018
Cristian Ruiz
@CristianCJRuiz
M.A. Communication Sciences
SMART Research Member
iNova Media Lab – NOVA/FCSH
Agenda
Data Extraction Tools
1. SMART Goals.
2. API relevanceand debate.
3. Diversity of extractiontools.
4. How to use Netlytic.
5. How to use DMI: YouTube Data Tools.
6. How to use Netvizz.
7. Tools Output.
SMART goals
Social Media Methods
2. It includes a set of data processes that
implies:
1. Extraction
2. Visualization
3. Analysis
4. Critique
Social Media
Methods
1. What for:
1. Social Sciences.
2. Communication Sciences.
3. Medical Sciences.
4. Geography.
5. Culture.
6. Political Sciences
And a big etc.
Sakaki, T. Okazaki, M. & Matsuo, Y. - Earthquake Shakes TwitterUsers (2010)
Tremaynea M. - Anatomy of Protest in the Digital Era (2014)
Burgess J et al. - Platform Studies (2017)
Lampos, V. Tijl De Bie & Cristianini N. – Flu Detector (2010)
Smith R. & Sanderson J. - I’m Going to Instagram It! (2014)
Del Vicario et al. - The Anatomy of Brexit Debate on Facebook (2016)
Data
Where to find data?
Social Media API
Application
Programming
Interface
Part of the software that provides
a specific library and functions to
external applications.
In Social Media Methods
A data extraction tool interacts with the
platform API to retrieve the queried data.
API debate– Who decides what is public?
Social Media Methods Limitations
What digitalobjects are available for data extraction?
What media content can be part of my analysis?
How far back in time can data be retrieved?
What are the standards output files? (Omena, 2016)
To take in mind when is developed an extraction tool
Tools
For Social Media Platforms
DMI Tools: https://wiki.digitalmethods.net/Dmi/ToolDatabase
Netvizz: https://apps.facebook.com/netvizz/
NodeXL: http://nodexl.codeplex.com/
MédiaLab Tools: http://www.medialab.sciences-po.fr/tools/
SocioViz: http://socioviz.net/SNA/eu/sna/login.jsp
And a big etc: http://socialmediadata.wikidot.com/start
Image from: https://netlytic.org
Tools
Netlytic
Data sources:
Twitter
Facebook
Instagram
Youtube
It needs to link with your Twitter account
It needs to link with your Instagram account
Start creating an account
https://netlytic.org
Datasets
Type of accounts
https://netlytic.org/home/?page_id=10851
Data-set depends of the data source: https://netlytic.org/home/?page_id=10851
Data Source: Extracting from Twitter
It needs to link with your Twitter account (tool)
Then
Name your Dataset.
Use one or more Keywords, Hashtags or @username (And Language) of interest.
To use more than one search term (Keywords, Hashtags etc.)
Use conjunctions and & or.
E.g: #SmartDataSprint and Data Sprint.
So
Go to “Preview Bottom” To see your extracted data.
Now you have got a dataset to export in .CSV or to work inside Netlytic!
Data Source: from Facebook
Name your Dataset.
Groups (ID is required)Pages (URL is required)
Data retrieved posts and posts comments, but not replies of comments.
An e-mail will be sent once the data collection is done. (Or check the status of your dataset)
Now you have got a dataset to export in .CSV or to work inside Netlytic!
Data Source: from Instagram
It needs to link with your Twitter account (tool)
Then
Name your Dataset.
Query by keyword (Hashtag)Query by location
An e-mail will be sent once the data collection is done. (Or check the status of your dataset).
Now you have got a dataset to export in .CSV or to work inside Netlytic!
Data Source: from Youtube
Name your Dataset.
YouTube video ID
It retrieves comments of the video.
Now you have got a dataset to export in .CSV or to work inside Netlytic!
Where to find your datasets
Status:
In progress/Complete/One time collection
Data source
Name of dataset
Subset
Share a colleague
Edit
Download
Tools
DMI: YouTube Data Tools
• A collection of simple tools for extracting data from the YouTube platform via the YouTube API v3.
• Created by Bernhard Rieder as part of the Digital Methods Initiative.
Output
Gephi Files: .gdf
Tab Files: .tab
Getting channel ID
Tools
Netvizz
Output
Tab Files: .tab
Gephi Files: .gdf
Tab Files: .tvs
Output
.gdf; .tab; .cvs; etc.
As a result of the
extraction, will be
gotten a file normally
.gdf; .tab or .cvs.
Depending on the research and analysis, these files
have to be introduced into an specific software.
E.g:
Analysis: Network Analysis
Software: Gephi (.gfd)
What to do with the output?
Go to a software analysis and introduce your files!
Etc.
Image from: https://netlytic.org
Image from: http://santuan.github.io/stn/open-source/
Image from:http://gephi.org/
Then
Social Media Data
(Collected and stored
by the social media
platform.)
Data supplied by API
(Depends on social media
site policies.)
Extraction Tool Requires
(Keywords, Hashtags,
locations, etc.)
Extraction Tool Retrieves
(A dataset.)
Extraction Tool creates an output.
(.gdf; .tab; .cvs; etc.)
Now visualize it!
(Gephi, NodeXL,
DMI tools, etc.)
Data Extraction tools
#SMARTdatasprint, 2018
Cristian Ruiz
@CristianCJRuiz
M.A. Communication Sciences
SMART Research Member
iNova Media Lab – NOVA/FCSH

More Related Content

PDF
Data extraction tools (2019 Version)
PDF
Internet of Robotic Things Ontology catalog, knowledge extraction IEEE P1872....
PDF
Defining iot.schema.org: Using Knowledge Extraction from Existing IoT-based ...
PDF
FiCloud2016 lov4iot extended
PDF
Keynote WFIoT2019 - Data Graph, Knowledge Graphs Ontologies, Internet of Thin...
PDF
Knowledge Extraction for the Web of Things (KE4WoT) Challenge: Co-located wit...
PPTX
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
PDF
Approaches of Data Analysis: Networks generated through Social Media
Data extraction tools (2019 Version)
Internet of Robotic Things Ontology catalog, knowledge extraction IEEE P1872....
Defining iot.schema.org: Using Knowledge Extraction from Existing IoT-based ...
FiCloud2016 lov4iot extended
Keynote WFIoT2019 - Data Graph, Knowledge Graphs Ontologies, Internet of Thin...
Knowledge Extraction for the Web of Things (KE4WoT) Challenge: Co-located wit...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Approaches of Data Analysis: Networks generated through Social Media

Similar to Data extraction tools (20)

PDF
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
PDF
SMA-Unit-I: The Foundation for Analytics
PDF
Online text data for machine learning, data science, and research - Who can p...
PPTX
Social Media Data Collection & Analysis
PPTX
Social Media Data Mining
PDF
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
PPTX
Citizen Sensor Data Mining, Social Media Analytics and Applications
PPTX
2. Social Media Data Sources and advantage .pptx
PPTX
Meltwater enterprise solutions
PDF
Module 2 Data Collection and Management.pdf
PDF
The data we want
PDF
Enterprise Data Sources PowerPoint Presentation Slides
PDF
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
PDF
Digital response tracking solution
PDF
Insight workbench data
PPTX
Spark Social Media
PDF
Fairhair.ai – alan turing institute june '17 (public)
PDF
Data Scientist Toolbox
PPT
The evolution of research on social media
PDF
Social media with big data analytics
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
SMA-Unit-I: The Foundation for Analytics
Online text data for machine learning, data science, and research - Who can p...
Social Media Data Collection & Analysis
Social Media Data Mining
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Citizen Sensor Data Mining, Social Media Analytics and Applications
2. Social Media Data Sources and advantage .pptx
Meltwater enterprise solutions
Module 2 Data Collection and Management.pdf
The data we want
Enterprise Data Sources PowerPoint Presentation Slides
REAL-TIME INGESTING AND TRANSFORMING SENSOR DATA & SOCIAL DATA w/ NIFI + TENS...
Digital response tracking solution
Insight workbench data
Spark Social Media
Fairhair.ai – alan turing institute june '17 (public)
Data Scientist Toolbox
The evolution of research on social media
Social media with big data analytics
Ad

Recently uploaded (20)

PDF
Instant Audience, Long-Term Impact Buy Real Telegram Members
PPTX
Types of Social Media Marketing for Business Success
PDF
Customer Churn Prediction in Digital Banking: A Comparative Study of Xai Tech...
PPTX
Office Administration Courses in Trivandrum That Employers Value.pptx
PDF
Climate Risk and Credit Allocation: How Banks Are Integrating Environmental R...
PDF
25K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf
PDF
The Edge You’ve Been Missing Get the Sociocosmos Edge
PDF
Mastering Social Media Marketing in 2025.pdf
PPT
memimpindegra1uejehejehdksnsjsbdkdndgggwksj
PDF
Presence That Pays Off Activate My Social Growth
PPTX
How Social Media Influencers Repurpose Content (1).pptx
PDF
Subscribe This Channel Subscribe Back You
PDF
StarNetCafeSB2012D3POYNagaworld2-Hotel-Casino-Phnom Entertainment
PDF
Transform Your Social Media, Grow Your Brand
PPTX
Developing lesson plan gejegkavbw gagsgf
PDF
Why Digital Marketing Matters in Today’s World Ask ChatGPT
PPTX
Result-Driven Social Media Marketing Services | Boost ROI
PDF
FINAL-Content-Marketing-Made-Easy-Workbook-Guied-Editable.pdf
PDF
Real Presence. Real Power. Boost with Authenticity
DOCX
Buy Goethe A1 ,B2 ,C1 certificate online without writing
Instant Audience, Long-Term Impact Buy Real Telegram Members
Types of Social Media Marketing for Business Success
Customer Churn Prediction in Digital Banking: A Comparative Study of Xai Tech...
Office Administration Courses in Trivandrum That Employers Value.pptx
Climate Risk and Credit Allocation: How Banks Are Integrating Environmental R...
25K Btc Enabled Cash App Accounts – Safe, Fast, Verified.pdf
The Edge You’ve Been Missing Get the Sociocosmos Edge
Mastering Social Media Marketing in 2025.pdf
memimpindegra1uejehejehdksnsjsbdkdndgggwksj
Presence That Pays Off Activate My Social Growth
How Social Media Influencers Repurpose Content (1).pptx
Subscribe This Channel Subscribe Back You
StarNetCafeSB2012D3POYNagaworld2-Hotel-Casino-Phnom Entertainment
Transform Your Social Media, Grow Your Brand
Developing lesson plan gejegkavbw gagsgf
Why Digital Marketing Matters in Today’s World Ask ChatGPT
Result-Driven Social Media Marketing Services | Boost ROI
FINAL-Content-Marketing-Made-Easy-Workbook-Guied-Editable.pdf
Real Presence. Real Power. Boost with Authenticity
Buy Goethe A1 ,B2 ,C1 certificate online without writing
Ad

Data extraction tools

  • 1. Data Extraction tools #SMARTdatasprint, 2018 Cristian Ruiz @CristianCJRuiz M.A. Communication Sciences SMART Research Member iNova Media Lab – NOVA/FCSH
  • 2. Agenda Data Extraction Tools 1. SMART Goals. 2. API relevanceand debate. 3. Diversity of extractiontools. 4. How to use Netlytic. 5. How to use DMI: YouTube Data Tools. 6. How to use Netvizz. 7. Tools Output.
  • 3. SMART goals Social Media Methods 2. It includes a set of data processes that implies: 1. Extraction 2. Visualization 3. Analysis 4. Critique Social Media Methods 1. What for: 1. Social Sciences. 2. Communication Sciences. 3. Medical Sciences. 4. Geography. 5. Culture. 6. Political Sciences And a big etc. Sakaki, T. Okazaki, M. & Matsuo, Y. - Earthquake Shakes TwitterUsers (2010) Tremaynea M. - Anatomy of Protest in the Digital Era (2014) Burgess J et al. - Platform Studies (2017) Lampos, V. Tijl De Bie & Cristianini N. – Flu Detector (2010) Smith R. & Sanderson J. - I’m Going to Instagram It! (2014) Del Vicario et al. - The Anatomy of Brexit Debate on Facebook (2016) Data
  • 4. Where to find data? Social Media API Application Programming Interface Part of the software that provides a specific library and functions to external applications. In Social Media Methods A data extraction tool interacts with the platform API to retrieve the queried data.
  • 5. API debate– Who decides what is public? Social Media Methods Limitations What digitalobjects are available for data extraction? What media content can be part of my analysis? How far back in time can data be retrieved? What are the standards output files? (Omena, 2016) To take in mind when is developed an extraction tool
  • 6. Tools For Social Media Platforms DMI Tools: https://wiki.digitalmethods.net/Dmi/ToolDatabase Netvizz: https://apps.facebook.com/netvizz/ NodeXL: http://nodexl.codeplex.com/ MédiaLab Tools: http://www.medialab.sciences-po.fr/tools/ SocioViz: http://socioviz.net/SNA/eu/sna/login.jsp And a big etc: http://socialmediadata.wikidot.com/start
  • 8. Data sources: Twitter Facebook Instagram Youtube It needs to link with your Twitter account It needs to link with your Instagram account Start creating an account https://netlytic.org
  • 9. Datasets Type of accounts https://netlytic.org/home/?page_id=10851 Data-set depends of the data source: https://netlytic.org/home/?page_id=10851
  • 10. Data Source: Extracting from Twitter It needs to link with your Twitter account (tool) Then Name your Dataset. Use one or more Keywords, Hashtags or @username (And Language) of interest. To use more than one search term (Keywords, Hashtags etc.) Use conjunctions and & or. E.g: #SmartDataSprint and Data Sprint. So Go to “Preview Bottom” To see your extracted data. Now you have got a dataset to export in .CSV or to work inside Netlytic!
  • 11. Data Source: from Facebook Name your Dataset. Groups (ID is required)Pages (URL is required) Data retrieved posts and posts comments, but not replies of comments. An e-mail will be sent once the data collection is done. (Or check the status of your dataset) Now you have got a dataset to export in .CSV or to work inside Netlytic!
  • 12. Data Source: from Instagram It needs to link with your Twitter account (tool) Then Name your Dataset. Query by keyword (Hashtag)Query by location An e-mail will be sent once the data collection is done. (Or check the status of your dataset). Now you have got a dataset to export in .CSV or to work inside Netlytic!
  • 13. Data Source: from Youtube Name your Dataset. YouTube video ID It retrieves comments of the video. Now you have got a dataset to export in .CSV or to work inside Netlytic!
  • 14. Where to find your datasets Status: In progress/Complete/One time collection Data source Name of dataset Subset Share a colleague Edit Download
  • 15. Tools DMI: YouTube Data Tools • A collection of simple tools for extracting data from the YouTube platform via the YouTube API v3. • Created by Bernhard Rieder as part of the Digital Methods Initiative. Output Gephi Files: .gdf Tab Files: .tab
  • 17. Tools Netvizz Output Tab Files: .tab Gephi Files: .gdf Tab Files: .tvs
  • 18. Output .gdf; .tab; .cvs; etc. As a result of the extraction, will be gotten a file normally .gdf; .tab or .cvs. Depending on the research and analysis, these files have to be introduced into an specific software. E.g: Analysis: Network Analysis Software: Gephi (.gfd)
  • 19. What to do with the output? Go to a software analysis and introduce your files! Etc. Image from: https://netlytic.org Image from: http://santuan.github.io/stn/open-source/ Image from:http://gephi.org/
  • 20. Then Social Media Data (Collected and stored by the social media platform.) Data supplied by API (Depends on social media site policies.) Extraction Tool Requires (Keywords, Hashtags, locations, etc.) Extraction Tool Retrieves (A dataset.) Extraction Tool creates an output. (.gdf; .tab; .cvs; etc.) Now visualize it! (Gephi, NodeXL, DMI tools, etc.)
  • 21. Data Extraction tools #SMARTdatasprint, 2018 Cristian Ruiz @CristianCJRuiz M.A. Communication Sciences SMART Research Member iNova Media Lab – NOVA/FCSH