SlideShare a Scribd company logo
THE WEB OBSERVATORY
DATA GATHERING WITH
EUGENE SIOW & XIN WANG
29 JANUARY 2016
WHAT IS THE WEB OBSERVATORY?
SEARCH + ACCESS
DATA | WO | APP
OPEN
PRIVATE
NoSQL
SQL
STREAMS
LINKED DATA
JS
PYTHON
NODE
GATHERING DATA WITH SCRAPING
DATA ON WEBPAGES
 DATA I CAN USE
transform
(
 )
THE PROCESS OF SCRAPING
INVESTIGATE THE STRUCTURE OF THE PAGE
CHECK IF THERE IS AN API
{APPLICATION PROGRAMMING INTERFACE}
USE CHROME’S INSPECTOR OR FIREBUG
EXTRACT, TRANSFORM, LOAD
WHAT IS YOUR DESIRED END FORMAT?
HANDS-ON RESOURCES
codepen.io/xgfd/pen/wMyQWb 
github.com/eugenesiow/datathon2016/wiki
DATA-DRIVEN APPS USING THE WO
DATA GATHERING
webobservatory.soton.ac.uk
THE SOTON WEB OBSERVATORY
BACKGROUNDS FROM THE HUBBLE SPACE TELESCOPE

More Related Content

PDF
SC7 Webinar 5 13/12/2017 UoA Presentation "Technical aspects of the 3rd secur...
PPTX
Geolocation analysis using HiveQL
PDF
Graphalytics: A big data benchmark for graph-processing platforms
PDF
Reactive Databases for Big Data applications
PPT
Webtracks at JISC Managing Research Data Meeting
PDF
Graph Computing with Apache TinkerPop
PDF
Building real apps on serverless
PDF
Building stateful apps using serverless
SC7 Webinar 5 13/12/2017 UoA Presentation "Technical aspects of the 3rd secur...
Geolocation analysis using HiveQL
Graphalytics: A big data benchmark for graph-processing platforms
Reactive Databases for Big Data applications
Webtracks at JISC Managing Research Data Meeting
Graph Computing with Apache TinkerPop
Building real apps on serverless
Building stateful apps using serverless

More from Eugene Siow (12)

PDF
Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...
PDF
Pecha Kucha at Southampton ECS WAIS
PDF
PIOTRe: Personal Internet of Things Repository
PDF
A Biological Internet?: Eywa
PDF
WAISFest The Edge of Tomorrow
PDF
SPARQL-to-SQL on Internet of Things Databases and Streams
PDF
Patching Mr Robot: Mitigating IoT-Related Cyber-Social-Disasters by getting F...
PDF
Interoperable & Efficient: Linked Data for the Internet of Things (INSCI16)
PPTX
QGIS TimeManager Heatmap Tutorial
PDF
Rapid Response Linked Data
PDF
Work on Linked Data for the Internet of Things
PPTX
OpenID Connect 1.0 Explained
Patching Mr Robot: Mitigating IoT-Related Cyber-social Disasters by getting F...
Pecha Kucha at Southampton ECS WAIS
PIOTRe: Personal Internet of Things Repository
A Biological Internet?: Eywa
WAISFest The Edge of Tomorrow
SPARQL-to-SQL on Internet of Things Databases and Streams
Patching Mr Robot: Mitigating IoT-Related Cyber-Social-Disasters by getting F...
Interoperable & Efficient: Linked Data for the Internet of Things (INSCI16)
QGIS TimeManager Heatmap Tutorial
Rapid Response Linked Data
Work on Linked Data for the Internet of Things
OpenID Connect 1.0 Explained
Ad

Recently uploaded (20)

PPTX
Cyber Hygine IN organizations in MSME or
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PDF
Slides PDF: The World Game (s) Eco Economic Epochs.pdf
PPT
Ethics in Information System - Management Information System
PDF
Alethe Consulting Corporate Profile and Solution Aproach
PPTX
Internet Safety for Seniors presentation
PDF
Understand the Gitlab_presentation_task.pdf
PPT
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
PPTX
Reading as a good Form of Recreation
PDF
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
PDF
Alethe Consulting Corporate Profile and Solution Aproach
PDF
The Evolution of Traditional to New Media .pdf
PPTX
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PDF
Introduction to the IoT system, how the IoT system works
PPTX
APNIC Report, presented at APAN 60 by Thy Boskovic
DOC
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
PPTX
t_and_OpenAI_Combined_two_pressentations
PPTX
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
PPT
12 Things That Make People Trust a Website Instantly
Cyber Hygine IN organizations in MSME or
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
Slides PDF: The World Game (s) Eco Economic Epochs.pdf
Ethics in Information System - Management Information System
Alethe Consulting Corporate Profile and Solution Aproach
Internet Safety for Seniors presentation
Understand the Gitlab_presentation_task.pdf
415456121-Jiwratrwecdtwfdsfwgdwedvwe dbwsdjsadca-EVN.ppt
Reading as a good Form of Recreation
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
Alethe Consulting Corporate Profile and Solution Aproach
The Evolution of Traditional to New Media .pdf
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
Power Point - Lesson 3_2.pptx grad school presentation
Introduction to the IoT system, how the IoT system works
APNIC Report, presented at APAN 60 by Thy Boskovic
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
t_and_OpenAI_Combined_two_pressentations
module 1-Part 1.pptxdddddddddddddddddddddddddddddddddddd
12 Things That Make People Trust a Website Instantly
Ad

Data Gathering with The Web Observatory