SlideShare a Scribd company logo
Sharing Big Data - Bob Jones
Sharing big data
15 June 2017
Bob Jones
CERN
Bob.Jones <at> cern.ch
Helix Nebula – The Science Cloud
Helix Nebula – The Science Cloud with Grant Agreement 687614 is a Pre-Commercial Procurement Action
funded by H2020 Framework Programme
Accelerating Science and Innovation
Data in High-Energy Physics
Based on DPHEP Study Group (2009). Data Preservation in High Energy Physics. http://arxiv.org/abs/0912.0255
Patricia Herterich
5EPFL & SDSC visit 2017-03-24
CERN Open Data Portal
• 2015
• 40 TB of 2010 data
• 2016
• 320 TB of 2011 data
• Curation, release of
• Simulated data (MC)
• Trigger information
• Configuration files
http://github.com/cernopendata
Barend Mons, Leiden University Medical Center
In the FAIR Data approach, data should be:
• Findable – Easy to find by both humans and computer systems and
based on mandatory description of the metadata that allow the
discovery of interesting datasets
• Accessible – Stored for long term such that they can be easily
accessed and/or downloaded with well-defined license and access
conditions (Open Access when possible), whether at the level of
metadata, or at the level of the actual data content
• Interoperable – Ready to be combined with other datasets by
humans as well as computer systems
• Reusable – Ready to be used for future research and to be processed
further using computational methods.
https://www.dtls.nl/fair-data/
Peter Doorn, Director DANShttps://www.force11.org/group/fairgroup/fairprinciples
27/06/2017
The Hybrid Cloud Model
Brings together
• research organisations,
• data providers,
• publicly funded e-
infrastructures,
• commercial cloud service
providers
In a hybrid cloud with
procurement and governance
approaches suitable for the
dynamic cloud market In-house
27/06/2017
Data Commons is a Platform that fosters development of a digital Ecosystem
Treats products of research – data, software, methods, papers, training
materials etc. as a digital asset (object)
Digital objects need to conform to FAIR principles
- Findable, Accessible, Interoperable, Reproducible
Digital objects exist in a shared virtual space (initial)
- Find, Deposit, Manage, Share and Reuse: digital assets
Enables interactions between Producers and Consumers of digital assets
Gives currency to digital assets and the people who develop and support
them
Philip E. Bourne, Ph.D. FACMI
Associate Director for Data Science
National Institutes of Health, USA
Data Commons Pilot – connecting the pieces
Co-location of large and/or highly
utilized NIH funded data on the cloud
+ commonly used tools for analyzing
and sharing digital objects
to create an interoperable resource for
the research community.
Investigators will be able to collaborate
and share digital objects within this
environment and connect with others
Impact
Biggest issuer of DOIs for software in the world
Reference material for publications
F1000, Wiley, eLife, PLoS, Elsevier, Nature, etc
Recommended by EC and National programmes
https://www.zenodo.org/
Summary
Sharing big data needs technology, processes
& organisation, people
FAIR principles represent best practice
Findable, Accessible, Interoperable, Reusable
Research communities around the world are
developing science commons to accelerate
the sharing of digital assets
27/06/2017

More Related Content

PDF
HNSciCloud update @ the World LHC Computing Grid deployment board
PDF
The Science Cloud Users: Challenges and Needs
PPTX
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
PDF
Helix Nebula Cloud Procurement Activities
PPTX
Progress of the Helix Nebula Science Cloud PCP Project
PPTX
HNSciCloud Prototype Phase Award - Marc-Elian Begin
PDF
Cross e-Infrastructure collaborations
PPTX
The BlueBRIDGE Project - Pasquale Pagano
HNSciCloud update @ the World LHC Computing Grid deployment board
The Science Cloud Users: Challenges and Needs
HNSciCloud Introduction - Bob Jones - Prototype Phase kickoff meeting
Helix Nebula Cloud Procurement Activities
Progress of the Helix Nebula Science Cloud PCP Project
HNSciCloud Prototype Phase Award - Marc-Elian Begin
Cross e-Infrastructure collaborations
The BlueBRIDGE Project - Pasquale Pagano

What's hot (20)

PPTX
The Helix Nebula Pre-Commercial Procurement - 1° Asterics-Obelics Workshop
PPTX
#2 NCI data services - Fair data webinar 6 Sept 2017
PPT
Report on EDINA Authentication Related Academic Sector Activities
PPT
COBWEB Authentication Workshop
PPT
Free and Open Source Software for Regional Spatial Data Infrastructures
PPTX
Enabling efficient movement of data into & out of a high-performance analysis...
PPT
Jisc support for equipment sharing - update for S-Lab Rothamsted conference J...
PPSX
Collaborative Maintenance of Semantic Networks - Present or Future?
PDF
Summary of the Deployment Scenarios and Functional Requirements
PPTX
NCI Cancer Research Data Commons - Overview
PDF
DESY / XFEL Deployment Scenarios
PDF
Towards Generating Policy-compliant Datasets (poster)
PPTX
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - Brussels
PPTX
Jisc Research Data Shared Service Open Repositories 2018 Paper
PDF
Introducing SURF
PPTX
Research Data Shared Service
PPTX
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
PPTX
The Climate Tagger - a tagging and recommender service for climate informatio...
PDF
Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...
PDF
ATMOSPHERE - Concertation Meeting EUBrasilCloudFORUM
The Helix Nebula Pre-Commercial Procurement - 1° Asterics-Obelics Workshop
#2 NCI data services - Fair data webinar 6 Sept 2017
Report on EDINA Authentication Related Academic Sector Activities
COBWEB Authentication Workshop
Free and Open Source Software for Regional Spatial Data Infrastructures
Enabling efficient movement of data into & out of a high-performance analysis...
Jisc support for equipment sharing - update for S-Lab Rothamsted conference J...
Collaborative Maintenance of Semantic Networks - Present or Future?
Summary of the Deployment Scenarios and Functional Requirements
NCI Cancer Research Data Commons - Overview
DESY / XFEL Deployment Scenarios
Towards Generating Policy-compliant Datasets (poster)
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - Brussels
Jisc Research Data Shared Service Open Repositories 2018 Paper
Introducing SURF
Research Data Shared Service
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
The Climate Tagger - a tagging and recommender service for climate informatio...
Using the EGI Fed-Cloud for Data Analysis - EUDAT Summer School (Giuseppe La ...
ATMOSPHERE - Concertation Meeting EUBrasilCloudFORUM
Ad

Similar to Sharing Big Data - Bob Jones (20)

PPTX
Data Commons Garvan - 2016
PPTX
The Commons: Leveraging the Power of the Cloud for Big Data
PPTX
NIH Data Commons - Note: Presentation has animations
PPTX
Bonazzi commons bd2 k ahm 2016 v2
PPTX
Data commons bonazzi bd2 k fundamentals of science feb 2017
PPTX
The NIH Data Commons - BD2K All Hands Meeting 2015
PPTX
NIH Data Summit - The NIH Data Commons
PPTX
The European Open Science Cloud
PPTX
EMBL Australian Bioinformatics Resource AHM - Data Commons
PPTX
NDS Relevant Update from the NIH Data Science (ADDS) Office
PPTX
Tiziana ferrari icri 2018 v3
PDF
The Open Science Data Cloud: Empowering the Long Tail of Science
PPT
Aaas Data Intensive Science And Grid
PPTX
FAIR play?
PPTX
Bonazzi data commons nhgri council feb 2017
PPTX
Open data pilot
PPTX
Data Infrastructure for Coastal and Estuarine Science
PPTX
Towards cross-domain interoperation in the internet of FAIR data and services
PDF
HNSciCloud Overview
PPTX
Reproducibility: A Funder and Data Science Perspective
Data Commons Garvan - 2016
The Commons: Leveraging the Power of the Cloud for Big Data
NIH Data Commons - Note: Presentation has animations
Bonazzi commons bd2 k ahm 2016 v2
Data commons bonazzi bd2 k fundamentals of science feb 2017
The NIH Data Commons - BD2K All Hands Meeting 2015
NIH Data Summit - The NIH Data Commons
The European Open Science Cloud
EMBL Australian Bioinformatics Resource AHM - Data Commons
NDS Relevant Update from the NIH Data Science (ADDS) Office
Tiziana ferrari icri 2018 v3
The Open Science Data Cloud: Empowering the Long Tail of Science
Aaas Data Intensive Science And Grid
FAIR play?
Bonazzi data commons nhgri council feb 2017
Open data pilot
Data Infrastructure for Coastal and Estuarine Science
Towards cross-domain interoperation in the internet of FAIR data and services
HNSciCloud Overview
Reproducibility: A Funder and Data Science Perspective
Ad

More from Helix Nebula The Science Cloud (20)

PDF
M-PIL-3.2 Public Session
PDF
Deep Learning for Fast Simulation
PDF
Interactive Data Analysis for End Users on HN Science Cloud
PDF
Container Federation Use Cases
PDF
CERN Batch in the HNSciCloud
PDF
LHCb on RHEA and T-Systems
PDF
HNSciCloud CMS status-report
PDF
Helix Nebula Science Cloud usage by ALICE
PDF
Hybrid cloud for science
PDF
HNSciCloud PILOT PLATFORM OVERVIEW
PDF
This Helix Nebula Science Cloud Pilot Phase Open Session
PDF
Cloud Services for Education - HNSciCloud applied to the UP2U project
PDF
Network experiences with Public Cloud Services @ TNC2017
PDF
EOSC in practice - Silvana Muscella (chair EOSC HLEG)
PDF
Helix Nebula Science Cloud Pilot Phase, 6 February 2018, Bologna, Italy
PDF
Pilot phase Award Ceremony - INFN Introduction and welcome
PDF
Early adopter group and closing of webinar - João Fernandes (CERN)
PDF
HNSciCloud pilot phase - Andrea Chierici (INFN)
PDF
Pilot phase Award Ceremony - T-Systems
PDF
Pilot phase Award Ceremony - RHEA
M-PIL-3.2 Public Session
Deep Learning for Fast Simulation
Interactive Data Analysis for End Users on HN Science Cloud
Container Federation Use Cases
CERN Batch in the HNSciCloud
LHCb on RHEA and T-Systems
HNSciCloud CMS status-report
Helix Nebula Science Cloud usage by ALICE
Hybrid cloud for science
HNSciCloud PILOT PLATFORM OVERVIEW
This Helix Nebula Science Cloud Pilot Phase Open Session
Cloud Services for Education - HNSciCloud applied to the UP2U project
Network experiences with Public Cloud Services @ TNC2017
EOSC in practice - Silvana Muscella (chair EOSC HLEG)
Helix Nebula Science Cloud Pilot Phase, 6 February 2018, Bologna, Italy
Pilot phase Award Ceremony - INFN Introduction and welcome
Early adopter group and closing of webinar - João Fernandes (CERN)
HNSciCloud pilot phase - Andrea Chierici (INFN)
Pilot phase Award Ceremony - T-Systems
Pilot phase Award Ceremony - RHEA

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
A Presentation on Artificial Intelligence
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
KodekX | Application Modernization Development
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
A Presentation on Artificial Intelligence
MYSQL Presentation for SQL database connectivity
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KodekX | Application Modernization Development
Chapter 3 Spatial Domain Image Processing.pdf
Modernizing your data center with Dell and AMD
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Sharing Big Data - Bob Jones

  • 2. Sharing big data 15 June 2017 Bob Jones CERN Bob.Jones <at> cern.ch Helix Nebula – The Science Cloud Helix Nebula – The Science Cloud with Grant Agreement 687614 is a Pre-Commercial Procurement Action funded by H2020 Framework Programme
  • 4. Data in High-Energy Physics Based on DPHEP Study Group (2009). Data Preservation in High Energy Physics. http://arxiv.org/abs/0912.0255 Patricia Herterich
  • 5. 5EPFL & SDSC visit 2017-03-24 CERN Open Data Portal • 2015 • 40 TB of 2010 data • 2016 • 320 TB of 2011 data • Curation, release of • Simulated data (MC) • Trigger information • Configuration files http://github.com/cernopendata
  • 6. Barend Mons, Leiden University Medical Center
  • 7. In the FAIR Data approach, data should be: • Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets • Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content • Interoperable – Ready to be combined with other datasets by humans as well as computer systems • Reusable – Ready to be used for future research and to be processed further using computational methods. https://www.dtls.nl/fair-data/ Peter Doorn, Director DANShttps://www.force11.org/group/fairgroup/fairprinciples
  • 9. The Hybrid Cloud Model Brings together • research organisations, • data providers, • publicly funded e- infrastructures, • commercial cloud service providers In a hybrid cloud with procurement and governance approaches suitable for the dynamic cloud market In-house 27/06/2017
  • 10. Data Commons is a Platform that fosters development of a digital Ecosystem Treats products of research – data, software, methods, papers, training materials etc. as a digital asset (object) Digital objects need to conform to FAIR principles - Findable, Accessible, Interoperable, Reproducible Digital objects exist in a shared virtual space (initial) - Find, Deposit, Manage, Share and Reuse: digital assets Enables interactions between Producers and Consumers of digital assets Gives currency to digital assets and the people who develop and support them Philip E. Bourne, Ph.D. FACMI Associate Director for Data Science National Institutes of Health, USA
  • 11. Data Commons Pilot – connecting the pieces Co-location of large and/or highly utilized NIH funded data on the cloud + commonly used tools for analyzing and sharing digital objects to create an interoperable resource for the research community. Investigators will be able to collaborate and share digital objects within this environment and connect with others
  • 12. Impact Biggest issuer of DOIs for software in the world Reference material for publications F1000, Wiley, eLife, PLoS, Elsevier, Nature, etc Recommended by EC and National programmes https://www.zenodo.org/
  • 13. Summary Sharing big data needs technology, processes & organisation, people FAIR principles represent best practice Findable, Accessible, Interoperable, Reusable Research communities around the world are developing science commons to accelerate the sharing of digital assets 27/06/2017

Editor's Notes