SlideShare a Scribd company logo
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
휌 퐷푣 퐷푡 =−훻푝+훻∙휯+풇
Data 
Acquisition & modelling 
Collaboration and visualisation 
Analysis & data mining 
Dissemination & sharing 
Archiving and preserving 
fourthparadigm.org 
Data-intensive Research
X-Info 
• Data ingest 
• Managing a petabyte 
• Common schema 
• How to organize it 
• How to reorganize it 
• How to share with others 
• Query and Vis tools 
• Building and executing models 
• Integrating data and Literature 
• Documenting experiments 
• Curation and long-term 
preservation 
The Generic Problems 
Experiments & 
Instruments 
Simulations 
Literature 
Other Archives 
facts 
facts 
facts 
facts 
Questions 
Answers
All Scientific Data Online 
•Many disciplines overlap and use data from other sciences. 
•Internet can unify all literature and data 
•Go from literature to computation to data back to literature. 
•Information at your fingertips – 
For everyone, everywhere 
•Increase Scientific Information 
Velocity 
•Huge increase in Science Productivity 
(From Jim Gray’s last talk) 
Literature 
Derived and recombined data 
Raw data
Gartner: http://t.co/Co3EK1ERfN
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Manual Measurement 
Automated Measurement 
Sample Collection 
Historical Photographs 
Counting 
Ubiquitous 
Motes 
Aircraft Surveys 
Model Output 
Typing
Monitoring 
Collation 
Quality assurance 
Aggregation 
Analysis 
Reporting 
Forecasting 
Distribution 
Done poorly, but a few notablecounter-examples 
Done poorly to moderately, not easy to find 
Sometimes done well, generally discoverable and available, but could be improved 
Integration 
(I. Zaslavsky& CSIRO, BOM, WMO)
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Web search: 
“open weather data azure”
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Water depth map of London(~130km2). Storm eventof 60 minutes and 100 years return periodhttp://www.ncl.ac.uk/ceser/researchprogramme/informatics/citycaturbanfloodmodel/
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
http://www.fetchclimate.org/
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Parker MacCready: Univ. of Washington 
Rob Fatland:, Wenming Ye, NelsOscar, Microsoft Research
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Numerical model of 3-D ocean currents and water properties 
•salinity, 
•temperature, 
•biogeochemistry 
Relies on external data sources: 
•Bathymetry 
•Wind and heating 
•Open Ocean BC’s 
•Tides 
•Rivers
Model Validation 
Comparisons are done to an extensive suite of in-situ observations 
•sea surface height 
12 NOAA tide gauges 
•salinity and temperature 
over 2000 CTD casts from ECOHAB, RISE, DOE, NANOOS, Hood Canal, IOS, King County, and NOAA 
•velocity and moored S,T 
7 coastal ADCP / CTD moorings from the ECOHAB and RISE projects, 2 moorings from IOS
Interactive 3-D Model Visualization using WorldWideTelescope, Narwhal and Layerscape 
www.layerscape.
EH4 32 m 
Figure from SA Siedlecki, UW/JISAO; Observations from Connolly et al., 2010 
Validation: Dissolved Oxygen & Temperature
LiveOcean: System Architecture 
HPC 
linux150 cores 
Forecast 
NetCDFfiles 
LiveOcean 
Server 
•Post Processing 
•Pre-make .png“views” 
•Archive NetCDFfiles 
•API for web sites 
•Admin.js 
•Client.js 
Blob Storage: 
Forecast Copy 
Science User 
python 
Azure Table: 
Log Info 
Admin Website 
Client Website 
http://mappable.azurewebsites. net/liveocean/ 
Rivers 
USGS 
Atmosphere 
UW WRF 
Ocean 
HYCOM
http://mappable.azurewebsites.net/liveocean
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Cloud 
Big data 
Aggregation 
Machine 
Learning 
Analytics
The Cloud 
democratizes 
access to scale & 
economies of scale
Commodity at Scale
http://azure.microsoft.com/
http://github.com/windowsazure
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
Research Cloud Ecosystem
www.azure4research.com
Use laptops & desktop computers 
Overwhelmed by data 
Finding analysis ever more difficult; sharing even harder 
www.azure4research.com
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014
The Fourth Paradigm - Deltares Data Science Day, 31 October 2014

More Related Content

PPTX
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
PDF
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
PPT
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
PPTX
FedCentric_Presentation
PPTX
The Roots: Linked data and the foundations of successful Agriculture Data
PPTX
Sources of Change in Modern Knowledge Organization Systems
PDF
The State of Open Research Data
PDF
On community-standards, data curation and scholarly communication - BITS, Ita...
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
FedCentric_Presentation
The Roots: Linked data and the foundations of successful Agriculture Data
Sources of Change in Modern Knowledge Organization Systems
The State of Open Research Data
On community-standards, data curation and scholarly communication - BITS, Ita...

What's hot (20)

PDF
On community-standards, data curation and scholarly communication" Stanford M...
PPTX
Knowledge graph construction for research & medicine
PPTX
HKU Data Curation MLIM7350 Class 8
PPTX
Reproducibility (and the R*) of Science: motivations, challenges and trends
PPT
Who will use the open data? Mark Humphries keynote
PDF
Open Research Data: Licensing | Standards | Future
PPTX
Liberating facts from the scientific literature - Jisc Digifest 2016
PPTX
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
PPTX
Research Data Sharing: A Basic Framework
PPTX
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
PPTX
The need for a transparent data supply chain
PDF
Data citation metrics : best practice to enable new metrics for research data
PPTX
NISO Training Thursday Crafting a Scientific Data Management Plan
PPTX
Introduction to research data management; Lecture 01 for GRAD521
PPTX
Developing data services: a tale from two Oregon universities
PPTX
Winning the Tour de France, Research Data and Data Stewardship
PPTX
THOR Workshop - Data Publishing PLOS
PPT
British Library Datasets Programme Feb 2011
PPTX
The Future of Open Science
PPTX
RDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
On community-standards, data curation and scholarly communication" Stanford M...
Knowledge graph construction for research & medicine
HKU Data Curation MLIM7350 Class 8
Reproducibility (and the R*) of Science: motivations, challenges and trends
Who will use the open data? Mark Humphries keynote
Open Research Data: Licensing | Standards | Future
Liberating facts from the scientific literature - Jisc Digifest 2016
Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture
Research Data Sharing: A Basic Framework
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
The need for a transparent data supply chain
Data citation metrics : best practice to enable new metrics for research data
NISO Training Thursday Crafting a Scientific Data Management Plan
Introduction to research data management; Lecture 01 for GRAD521
Developing data services: a tale from two Oregon universities
Winning the Tour de France, Research Data and Data Stewardship
THOR Workshop - Data Publishing PLOS
British Library Datasets Programme Feb 2011
The Future of Open Science
RDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
Ad

Similar to The Fourth Paradigm - Deltares Data Science Day, 31 October 2014 (20)

PPTX
Green Shoots: Research Data Management Pilot at Imperial College London
PDF
Christine borgman keynote
PPTX
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
PDF
1803-FrenchCWRU-GLEI-Houston.pdf presentation
PPTX
Why data science matters and what we can do with it
PDF
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
PPTX
XLDB South America Keynote: eScience Institute and Myria
PDF
2015 GU-ICBI Poster (third printing)
PPTX
NIST Big Data Public Working Group NBD-PWG
PDF
Accelerating your research with Microsoft Azure
PPTX
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
PDF
AI for Science
PDF
SC13 BoF: RDA and HPC
PDF
Scientific Data Visualizations - Data Doesn't Care What You Believe.
PDF
Accelerating Time to Science: Transforming Research in the Cloud
PPTX
Building a Regional 100G Collaboration Infrastructure
PDF
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
PPTX
Esri and the Scientific Community
PDF
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
PPTX
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Green Shoots: Research Data Management Pilot at Imperial College London
Christine borgman keynote
Research Cyberinfrastructure at UCSD - David Minor - RDAP12
1803-FrenchCWRU-GLEI-Houston.pdf presentation
Why data science matters and what we can do with it
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
XLDB South America Keynote: eScience Institute and Myria
2015 GU-ICBI Poster (third printing)
NIST Big Data Public Working Group NBD-PWG
Accelerating your research with Microsoft Azure
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
AI for Science
SC13 BoF: RDA and HPC
Scientific Data Visualizations - Data Doesn't Care What You Believe.
Accelerating Time to Science: Transforming Research in the Cloud
Building a Regional 100G Collaboration Infrastructure
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
Esri and the Scientific Community
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Ad

More from Microsoft Azure for Research (15)

PDF
Accelerating your Research with Microsoft Azure (June 2015)
PDF
ieee cloud 2015 keynote talk
PDF
Parallel asynchronous inference of word senses with Microsoft Azure
PPTX
A4 r overview deck_1.7
PDF
Cloud hpc-bigdata-challenges
PDF
Environmental Science, Big Data and the Cloud
PDF
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
PDF
Doing Research in the Cloud - NIH Workshop Dennis Gannon
PDF
Big data - from consumers and patients, to the sea and stars
PDF
Reproducible Research and the Cloud
PPTX
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
PPTX
Keynote Presentation at Moscow State University.
Accelerating your Research with Microsoft Azure (June 2015)
ieee cloud 2015 keynote talk
Parallel asynchronous inference of word senses with Microsoft Azure
A4 r overview deck_1.7
Cloud hpc-bigdata-challenges
Environmental Science, Big Data and the Cloud
Keynote IEEE International Workshop on Cloud Analytics. Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Big data - from consumers and patients, to the sea and stars
Reproducible Research and the Cloud
Living Outside the Comfort Zone - Daron green florianopolis 5-7-2014
Keynote Presentation at Moscow State University.

Recently uploaded (20)

PDF
Sciences of Europe No 170 (2025)
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
famous lake in india and its disturibution and importance
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet earth and life
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
An interstellar mission to test astrophysical black holes
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Sciences of Europe No 170 (2025)
. Radiology Case Scenariosssssssssssssss
Comparative Structure of Integument in Vertebrates.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
neck nodes and dissection types and lymph nodes levels
famous lake in india and its disturibution and importance
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Taita Taveta Laboratory Technician Workshop Presentation.pptx
INTRODUCTION TO EVS | Concept of sustainability
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
TOTAL hIP ARTHROPLASTY Presentation.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet earth and life
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Classification Systems_TAXONOMY_SCIENCE8.pptx
2. Earth - The Living Planet Module 2ELS
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
An interstellar mission to test astrophysical black holes
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...

The Fourth Paradigm - Deltares Data Science Day, 31 October 2014

  • 3. 휌 퐷푣 퐷푡 =−훻푝+훻∙휯+풇
  • 4. Data Acquisition & modelling Collaboration and visualisation Analysis & data mining Dissemination & sharing Archiving and preserving fourthparadigm.org Data-intensive Research
  • 5. X-Info • Data ingest • Managing a petabyte • Common schema • How to organize it • How to reorganize it • How to share with others • Query and Vis tools • Building and executing models • Integrating data and Literature • Documenting experiments • Curation and long-term preservation The Generic Problems Experiments & Instruments Simulations Literature Other Archives facts facts facts facts Questions Answers
  • 6. All Scientific Data Online •Many disciplines overlap and use data from other sciences. •Internet can unify all literature and data •Go from literature to computation to data back to literature. •Information at your fingertips – For everyone, everywhere •Increase Scientific Information Velocity •Huge increase in Science Productivity (From Jim Gray’s last talk) Literature Derived and recombined data Raw data
  • 9. Manual Measurement Automated Measurement Sample Collection Historical Photographs Counting Ubiquitous Motes Aircraft Surveys Model Output Typing
  • 10. Monitoring Collation Quality assurance Aggregation Analysis Reporting Forecasting Distribution Done poorly, but a few notablecounter-examples Done poorly to moderately, not easy to find Sometimes done well, generally discoverable and available, but could be improved Integration (I. Zaslavsky& CSIRO, BOM, WMO)
  • 12. Web search: “open weather data azure”
  • 14. Water depth map of London(~130km2). Storm eventof 60 minutes and 100 years return periodhttp://www.ncl.ac.uk/ceser/researchprogramme/informatics/citycaturbanfloodmodel/
  • 22. Parker MacCready: Univ. of Washington Rob Fatland:, Wenming Ye, NelsOscar, Microsoft Research
  • 24. Numerical model of 3-D ocean currents and water properties •salinity, •temperature, •biogeochemistry Relies on external data sources: •Bathymetry •Wind and heating •Open Ocean BC’s •Tides •Rivers
  • 25. Model Validation Comparisons are done to an extensive suite of in-situ observations •sea surface height 12 NOAA tide gauges •salinity and temperature over 2000 CTD casts from ECOHAB, RISE, DOE, NANOOS, Hood Canal, IOS, King County, and NOAA •velocity and moored S,T 7 coastal ADCP / CTD moorings from the ECOHAB and RISE projects, 2 moorings from IOS
  • 26. Interactive 3-D Model Visualization using WorldWideTelescope, Narwhal and Layerscape www.layerscape.
  • 27. EH4 32 m Figure from SA Siedlecki, UW/JISAO; Observations from Connolly et al., 2010 Validation: Dissolved Oxygen & Temperature
  • 28. LiveOcean: System Architecture HPC linux150 cores Forecast NetCDFfiles LiveOcean Server •Post Processing •Pre-make .png“views” •Archive NetCDFfiles •API for web sites •Admin.js •Client.js Blob Storage: Forecast Copy Science User python Azure Table: Log Info Admin Website Client Website http://mappable.azurewebsites. net/liveocean/ Rivers USGS Atmosphere UW WRF Ocean HYCOM
  • 31. Cloud Big data Aggregation Machine Learning Analytics
  • 32. The Cloud democratizes access to scale & economies of scale
  • 41. Use laptops & desktop computers Overwhelmed by data Finding analysis ever more difficult; sharing even harder www.azure4research.com