SlideShare a Scribd company logo
Acquiring Data
Data Science for Beginners, Session 3
Session 3: your 5-7 things
• Finding development data
• Data filetypes
• Using an API
• PDF scrapers
• Web Scrapers
• Getting data ready for science
Finding development
data
Data
• Data files (CSV, Excel, Json, Xml...)
• Databases (sqlite, mysql, oracle, postgresql...)
• APIs
• Report tables (tables on websites, in pdf reports...)
• Text (reports and other documents…)
• Maps and GIS data (openstreetmap, shapefiles, NASA earth images...)
• Images (satellite images, drone footage, pictures, videos…)
Data Sources
• data warehouses and catalogues
• open government data
• NGO websites
• web searches
• online documents, images, maps etc
• people you know who might have data
Creating your own data: People
Creating your own data: Sensors
Be cynical about your data
• Is the data relevant to your problem?
• Where did this data come from?
– Who collected it?
– Why? What for?
– Do they have biases that might show up in the data?
• Are there holes in the data (demographic, geographical, political etc)?
• Do you have supporting data? Is it *really* from a different source?
Data filetypes
Some Data Types
• Structured data:
– Tables (e.g. CSVs, Excel tables)
– Relational data (e.g. json, xml, sqlite)
• Unstructured data:
– Free-text (e.g. Tweets, webpages etc)
• Maps and images:
– Vector data (e.g. shapefiles)
– Raster data (e.g geotiffs)
– Images
CSVs
• Comma-separated values
• Lots of commas
• Sometimes tab-separated (TSVs)
• Most applications read CSVs
Json
• JavaScript Object Notation
• Lots of braces { }
• Structured, i.e. not always row-by-column
• Many APIs output JSON
• Not all applications read JSON
XML
• eXtensible Markup Language
• Lots of brackets < >
• Structured, i.e. not always row-by-column
• Some applications read XML
• HTML is a form of XML
Using an API
APIs
• “Application Programming Interface”
• A way for one computer application to ask
another one for a service
–Usually “give me this data”
–Sometimes “add this to your datasets”
RESTful APIs
http://api.worldbank.org/countries/all/indicators/SP.RUR.TO
TL.ZS?date=2000:2015&format=csv
• Base URL: api.worldbank.org
• What you’re asking for:
countries/all/indicators/SP.RUR.TOTL.ZA
• Details: date=2000:2015, format=csv
curl -X GET <URL>
Using CURL on the command-line
Do this: try these URLs
• http://api.worldbank.org/countries/all/indicators/SP.RUR
.TOTL.ZS?date=2000:2015&format=csv
• http://api.worldbank.org/countries/all/indicators/SP.RUR
.TOTL.ZS?date=2000:2015&format=json
• http://api.worldbank.org/countries/all/indicators/SP.RUR
.TOTL.ZS?date=2000:2015&format=xml
the Python Requests library
import requests
import json
worldbank_url =
"http://api.worldbank.org/countries/all/indicators/SP.RUR.TOTL.ZS?date=2000:20
15&format=json"
r = requests.get(worldbank_url)
jsondata = json.loads(r.text)
print(jsondata[1])
Request errors
r.status_code =
• 200: okay
• 400: bad request
• 401: unauthorised
• 404: page not found
Requests with a password
import requests
r = requests.get('https://api.github.com/user',
auth=('yourgithubname', ‘yourgithubpassword'))
dataset = r.text
PDF Scrapers
Scraping
• Data in files and webpages that’s easy for
humans to read, but difficult for machines
• Don’t scrape unless you have to
–Small dataset: type it in!
–Larger dataset: Look for datasets and APIs online
Development data is often in PDFs
Some PDFs can be Scraped
• Open the PDF file in Acrobat
• Can you cut-and-paste text in the file?
–Y:
• use a PDF scraper
–N:
PDF Table Scrapers
• Cut and paste to Excel
• Tabula: free, open source, offline
• Pdftables: not free, online
• CometDocs: free, online
Web Scrapers
Web Scraping
Design First!
What do you need to scrape?
● Which data values
● From which formats (html table, excel, pdf etc)
Do you need to maintain this?
● Is dataset regularly updated, or is once enough?
● How will you make updated data available to other people?
● Who could edit your code next year (if needed)?
Using Google Spreadsheets
• Open a google spreadsheet
• Put this into cell A1:
=importHtml("http://en.wikipedia.org/wiki/List_of_U.S._stat
es_and_territories_by_population", "table", 2)
Web scraping in Python
● Webpage-grabbing libraries:
o requests
o mechanize
o cookielib
● Element-finding libraries:
o beautifulsoup
Unpicking HTML with Python
url =
"https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population”
import requests
from bs4 import BeautifulSoup
html = requests.get(url)
bsObj = BeautifulSoup(html.text)
tables = bsObj.find_all('table’)
tables[0].find("th")
Getting data ready for
science
Changing Data Formats
• Conversion websites
• Code:
import pandas as pd
df = pd.read_json(“myfilename1.json”)
df.write_csv(“myfilename2.csv”)
Normalising data
Books
• "Web Scraping with Python: Collecting Data from the
Modern Web", O'Reilly
Exercises
Prepare for next week
• Install Tableau
–See install instructions file
Prepare data
• Use your problem statement to look for datasets - what do
you need to answer your questions?
• If you can, convert your data into normalised CSV files
• Think about your data gaps - how can you fill them?

More Related Content

PPTX
F# for Data*
PDF
Let your data shine... with OpenRefine
PDF
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
PDF
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
PPTX
Data and Donuts: Data cleaning with OpenRefine
PDF
Evolution of the Graph Schema
PPTX
Exploratory querying of the Dutch GeoRegisters
PPTX
Exploratory Data Analysis
F# for Data*
Let your data shine... with OpenRefine
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Data and Donuts: Data cleaning with OpenRefine
Evolution of the Graph Schema
Exploratory querying of the Dutch GeoRegisters
Exploratory Data Analysis

What's hot (20)

PDF
Problem Solving with Algorithms and Data Structures
PPTX
OpenRefine Tutorial
PDF
Corpus studio Erwin Komen
PDF
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
PDF
Visualising Data on Interactive Maps
PPTX
Scalable Web Data Management using RDF
PPTX
Chapter4
PPTX
Deriving an Emergent Relational Schema from RDF Data
PDF
Why is JSON-LD Important to Businesses - Franz Inc
PDF
DBpedia+ / DBpedia meeting in Dublin
PPTX
RDF Graph Data Management in Oracle Database and NoSQL Platforms
PDF
The DataTank at ogdcamp Warsaw
PPTX
Reproducible research
PDF
The Power of Machine Learning and Graphs
ODP
OpenRefine - Data Science Training for Librarians
PDF
xlwings reports: Reporting with Excel & Python
PDF
TinkerPop 2020
PPT
Computer-assisted reporting seminar
PDF
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
PDF
2017 ii 5_katharina_schleidt_datacovestatisticalviewer
Problem Solving with Algorithms and Data Structures
OpenRefine Tutorial
Corpus studio Erwin Komen
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
Visualising Data on Interactive Maps
Scalable Web Data Management using RDF
Chapter4
Deriving an Emergent Relational Schema from RDF Data
Why is JSON-LD Important to Businesses - Franz Inc
DBpedia+ / DBpedia meeting in Dublin
RDF Graph Data Management in Oracle Database and NoSQL Platforms
The DataTank at ogdcamp Warsaw
Reproducible research
The Power of Machine Learning and Graphs
OpenRefine - Data Science Training for Librarians
xlwings reports: Reporting with Excel & Python
TinkerPop 2020
Computer-assisted reporting seminar
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
2017 ii 5_katharina_schleidt_datacovestatisticalviewer
Ad

Similar to Session 03 acquiring data (20)

PPTX
Data web analytics scraping 12345_II.pptx
PPT
Data, data, data
PPT
Data Munging in concepts of data mining in DS
PPTX
Web Scraping_ Gathering Data from Websites.pptx
PPTX
Datasets, APIs, and Web Scraping
PPTX
open-data-presentation.pptx
PDF
Python 101 for Data Science to Absolute Beginners
PPTX
Session 01 designing and scoping a data science project
PPTX
Session 01 designing and scoping a data science project
PDF
Data Science: Harnessing Open Data for High Impact Solutions
PDF
Introduction to web scraping
PDF
Getting comfortable with Data
PDF
Data Visualization in the Newsroom
PPTX
Data-Analytics using python (Module 4).pptx
PPTX
Jeremy cabral search marketing summit - scraping data-driven content (1)
PPTX
Linked Open Data
PPTX
Data science.chapter-1,2,3
PDF
When data journalism meets science | Erice, June 10th, 2014
PPTX
Social Media Data Collection & Analysis
PPTX
Open Data Journalism
Data web analytics scraping 12345_II.pptx
Data, data, data
Data Munging in concepts of data mining in DS
Web Scraping_ Gathering Data from Websites.pptx
Datasets, APIs, and Web Scraping
open-data-presentation.pptx
Python 101 for Data Science to Absolute Beginners
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
Data Science: Harnessing Open Data for High Impact Solutions
Introduction to web scraping
Getting comfortable with Data
Data Visualization in the Newsroom
Data-Analytics using python (Module 4).pptx
Jeremy cabral search marketing summit - scraping data-driven content (1)
Linked Open Data
Data science.chapter-1,2,3
When data journalism meets science | Erice, June 10th, 2014
Social Media Data Collection & Analysis
Open Data Journalism
Ad

More from Sara-Jayne Terp (20)

PPTX
Distributed defense against disinformation: disinformation risk management an...
PPTX
Risk, SOCs, and mitigations: cognitive security is coming of age
PPTX
disinformation risk management: leveraging cyber security best practices to s...
PPTX
Cognitive security: all the other things
PPTX
The Business(es) of Disinformation
PPTX
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
PPTX
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
PPTX
2021-02-10_CogSecCollab_UBerkeley
PPTX
Using AMITT and ATT&CK frameworks
PPTX
2020 12 nyu-workshop_cog_sec
PPTX
2020 09-01 disclosure
PDF
2019 11 terp_mansonbulletproof_master copy
PPTX
BSidesLV 2018 talk: social engineering at scale, a community guide
PPTX
Social engineering at scale
PPTX
engineering misinformation
PPTX
Online misinformation: they're coming for our brainz now
PPTX
Sj terp ciwg_nyc2017_credibility_belief
PPT
Belief: learning about new problems from old things
PPT
risks and mitigations of releasing data
PPTX
Session 10 handling bigger data
Distributed defense against disinformation: disinformation risk management an...
Risk, SOCs, and mitigations: cognitive security is coming of age
disinformation risk management: leveraging cyber security best practices to s...
Cognitive security: all the other things
The Business(es) of Disinformation
2021-05-SJTerp-AMITT_disinfoSoc-umaryland
2021 IWC presentation: Risk, SOCs and Mitigations: Cognitive Security is Comi...
2021-02-10_CogSecCollab_UBerkeley
Using AMITT and ATT&CK frameworks
2020 12 nyu-workshop_cog_sec
2020 09-01 disclosure
2019 11 terp_mansonbulletproof_master copy
BSidesLV 2018 talk: social engineering at scale, a community guide
Social engineering at scale
engineering misinformation
Online misinformation: they're coming for our brainz now
Sj terp ciwg_nyc2017_credibility_belief
Belief: learning about new problems from old things
risks and mitigations of releasing data
Session 10 handling bigger data

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Microsoft Core Cloud Services powerpoint
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Transcultural that can help you someday.
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
modul_python (1).pptx for professional and student
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
Modelling in Business Intelligence , information system
PDF
Mega Projects Data Mega Projects Data
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Business Analytics and business intelligence.pdf
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Lecture1 pattern recognition............
PPTX
Managing Community Partner Relationships
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Microsoft Core Cloud Services powerpoint
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Transcultural that can help you someday.
Introduction-to-Cloud-ComputingFinal.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
annual-report-2024-2025 original latest.
modul_python (1).pptx for professional and student
DATA COLLECTION METHODS-ppt for nursing research
Modelling in Business Intelligence , information system
Mega Projects Data Mega Projects Data
ISS -ESG Data flows What is ESG and HowHow
Business Analytics and business intelligence.pdf
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Lecture1 pattern recognition............
Managing Community Partner Relationships

Session 03 acquiring data

  • 1. Acquiring Data Data Science for Beginners, Session 3
  • 2. Session 3: your 5-7 things • Finding development data • Data filetypes • Using an API • PDF scrapers • Web Scrapers • Getting data ready for science
  • 4. Data • Data files (CSV, Excel, Json, Xml...) • Databases (sqlite, mysql, oracle, postgresql...) • APIs • Report tables (tables on websites, in pdf reports...) • Text (reports and other documents…) • Maps and GIS data (openstreetmap, shapefiles, NASA earth images...) • Images (satellite images, drone footage, pictures, videos…)
  • 5. Data Sources • data warehouses and catalogues • open government data • NGO websites • web searches • online documents, images, maps etc • people you know who might have data
  • 6. Creating your own data: People
  • 7. Creating your own data: Sensors
  • 8. Be cynical about your data • Is the data relevant to your problem? • Where did this data come from? – Who collected it? – Why? What for? – Do they have biases that might show up in the data? • Are there holes in the data (demographic, geographical, political etc)? • Do you have supporting data? Is it *really* from a different source?
  • 10. Some Data Types • Structured data: – Tables (e.g. CSVs, Excel tables) – Relational data (e.g. json, xml, sqlite) • Unstructured data: – Free-text (e.g. Tweets, webpages etc) • Maps and images: – Vector data (e.g. shapefiles) – Raster data (e.g geotiffs) – Images
  • 11. CSVs • Comma-separated values • Lots of commas • Sometimes tab-separated (TSVs) • Most applications read CSVs
  • 12. Json • JavaScript Object Notation • Lots of braces { } • Structured, i.e. not always row-by-column • Many APIs output JSON • Not all applications read JSON
  • 13. XML • eXtensible Markup Language • Lots of brackets < > • Structured, i.e. not always row-by-column • Some applications read XML • HTML is a form of XML
  • 15. APIs • “Application Programming Interface” • A way for one computer application to ask another one for a service –Usually “give me this data” –Sometimes “add this to your datasets”
  • 16. RESTful APIs http://api.worldbank.org/countries/all/indicators/SP.RUR.TO TL.ZS?date=2000:2015&format=csv • Base URL: api.worldbank.org • What you’re asking for: countries/all/indicators/SP.RUR.TOTL.ZA • Details: date=2000:2015, format=csv
  • 17. curl -X GET <URL> Using CURL on the command-line
  • 18. Do this: try these URLs • http://api.worldbank.org/countries/all/indicators/SP.RUR .TOTL.ZS?date=2000:2015&format=csv • http://api.worldbank.org/countries/all/indicators/SP.RUR .TOTL.ZS?date=2000:2015&format=json • http://api.worldbank.org/countries/all/indicators/SP.RUR .TOTL.ZS?date=2000:2015&format=xml
  • 19. the Python Requests library import requests import json worldbank_url = "http://api.worldbank.org/countries/all/indicators/SP.RUR.TOTL.ZS?date=2000:20 15&format=json" r = requests.get(worldbank_url) jsondata = json.loads(r.text) print(jsondata[1])
  • 20. Request errors r.status_code = • 200: okay • 400: bad request • 401: unauthorised • 404: page not found
  • 21. Requests with a password import requests r = requests.get('https://api.github.com/user', auth=('yourgithubname', ‘yourgithubpassword')) dataset = r.text
  • 23. Scraping • Data in files and webpages that’s easy for humans to read, but difficult for machines • Don’t scrape unless you have to –Small dataset: type it in! –Larger dataset: Look for datasets and APIs online
  • 24. Development data is often in PDFs
  • 25. Some PDFs can be Scraped • Open the PDF file in Acrobat • Can you cut-and-paste text in the file? –Y: • use a PDF scraper –N:
  • 26. PDF Table Scrapers • Cut and paste to Excel • Tabula: free, open source, offline • Pdftables: not free, online • CometDocs: free, online
  • 29. Design First! What do you need to scrape? ● Which data values ● From which formats (html table, excel, pdf etc) Do you need to maintain this? ● Is dataset regularly updated, or is once enough? ● How will you make updated data available to other people? ● Who could edit your code next year (if needed)?
  • 30. Using Google Spreadsheets • Open a google spreadsheet • Put this into cell A1: =importHtml("http://en.wikipedia.org/wiki/List_of_U.S._stat es_and_territories_by_population", "table", 2)
  • 31. Web scraping in Python ● Webpage-grabbing libraries: o requests o mechanize o cookielib ● Element-finding libraries: o beautifulsoup
  • 32. Unpicking HTML with Python url = "https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population” import requests from bs4 import BeautifulSoup html = requests.get(url) bsObj = BeautifulSoup(html.text) tables = bsObj.find_all('table’) tables[0].find("th")
  • 33. Getting data ready for science
  • 34. Changing Data Formats • Conversion websites • Code: import pandas as pd df = pd.read_json(“myfilename1.json”) df.write_csv(“myfilename2.csv”)
  • 36. Books • "Web Scraping with Python: Collecting Data from the Modern Web", O'Reilly
  • 38. Prepare for next week • Install Tableau –See install instructions file
  • 39. Prepare data • Use your problem statement to look for datasets - what do you need to answer your questions? • If you can, convert your data into normalised CSV files • Think about your data gaps - how can you fill them?

Editor's Notes

  • #2: Today we’re looking at the types of data that are hiding online, and how to bring them out of hiding and into your data science code.
  • #3: So let’s begin. Here are the 6 things we’ll talk about today.
  • #4: Your first problem is finding the data to help answer your questions.
  • #5: A quick recap: these are some of the places where you can find data. Some of them are harder to process than others, but they all contain data.
  • #6: And here are some places to find them - there’s a longer list in the references folder.
  • #7: Development data isn’t always easy to obtain: you might have to create your own, by asking people to contribute information to you through crowdsourcing, in-person surveys, mobile surveys etc.
  • #8: You might also need to generate data for your problem by using sensors.
  • #9: Selection bias = non-random selection of individuals. One example of this is pothole reporting: potholes are more generally reported in more-affluent areas, by people who have both the smartphone apps and the time and energy to report. Missing data = data that you don’t have. You need to be aware of this, and take account of it. If you need more persuading, read about Wald and the bullethole problem.
  • #10: There are many datafile types - here’s a guide to some of them.
  • #11: Tables typically have rows and columns; relational data is typically hierarchical, e.g. can’t be easily converted into row-column form.
  • #12: CSVs are the workhorse of datatypes: almost every data application can read them in.
  • #13: Converting JSON to CSV: Use a conversion website (e.g. http://www.convertcsv.com/json-to-csv.htm) Write some Python code
  • #14: Converting XML to CSV: Use a conversion website, e.g. http://www.convertcsv.com/xml-to-csv.htm Write code
  • #15: One way to obtain data is through an application programming interface (API).
  • #16: More about open APIs: https://en.wikipedia.org/wiki/Open_API
  • #17: REST = Representational State Transfer; a human-readable way to ask APIs for information. At the top is a RESTful URL (web address); you can type this directly into an internet browser to get a datafile. This address has 3 parts to it: The base url, api.worldbank.org a description of what you’re looking for - in this case, the total rural population for all countries in the world Some more details, including filters (only data between 2000 and 2015) and data formats. Try this address, and try “&format=json” instead of “&format=csv” at the end.
  • #18: REST = Representational State Transfer; a human-readable way to ask APIs for information. At the top is a RESTful URL (web address); you can type this directly into an internet browser to get a datafile. This address has 3 parts to it: The base url, api.worldbank.org a description of what you’re looking for - in this case, the total rural population for all countries in the world Some more details, including filters (only data between 2000 and 2015) and data formats. Try this address, and try “&format=json” instead of “&format=csv” at the end.
  • #20: The Python requests library is useful for calling APIs from a python program (e.g. so you can then use or save the information returned from them). If anything goes wrong, try r.status_code You’re maybe wondering how to get this json data into a file. Here’s the code for that: import json fout = open('mynewdata.json', 'w') json.dump(jsondata, fout)
  • #21: See https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
  • #24: Here are places to look first: the website that data’s in, for file copies of the data the website that data’s in, for an api (http://api.theirsitename.com/, http://theirsitename.com/api, Google “site:theirsitename.com api”) related sites for file copies and apis Community warehouses (scraperwiki.com, datahub.io etc.) for other peoples’ scrapers
  • #25: Big PDFs. And we’ll need to get the data out of them. This is where PDF scrapers come in.
  • #29: Web scraping is the process of extracting data from webpages. If you open a webpage (e.g. https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_population) and click on “view source”, you’ll see the view that a computer has of that page. This is where the data is hiding…
  • #31: The pattern for this is: =importHtml(“your-weburl”, “table”, yourtablenumber) More: www.mulinblog.com/basic-web-scraping-data-visualization-using-google-spreadsheets/
  • #32: You’ve already used the Requests library to grab data from the web. Mechanise and Cookielib
  • #34: Your exercises were all built into the class. But if you want more…
  • #35: Most data science and visualisation programs can read CSV data, so if you can easily convert data to that, good. There are websites that will convert to csv; you can also do this by reading data in one format, and writing it out in another. The Pandas library is very helpful for reading in one format, and writing in another, if the data is row-column.
  • #36: We’ll cover data cleaning later, but if you want to try next week’s visualisation techniques on your own data, it will need to at least be normalised. Here’s what we mean by this (and Tableau has a tool for doing this: see http://kb.tableau.com/articles/knowledgebase/denormalize-data).
  • #37: Most data science and visualisation programs can read CSV data, so if you can easily convert data to that, good. There are websites that will convert to csv; you can also do this by reading data in one format, and writing it out in another. The Pandas library is very helpful for reading in one format, and writing in another, if the data is row-column.
  • #38: Your exercises were all built into the class. But if you want more…