SlideShare a Scribd company logo
EOSC-hub receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536.
eosc-hub.eu
@EOSC_eu
Baptiste Grenier / Enol Fernández
EGI Foundation
Open Data analysis with EOSC-hub services
Dissemination level: Public
2
Thanks to the EOSC-hub distributed team!
Onedata and DataHub: Lukasz Dutka,
Lukasz Opiola, Bartosz Kryza, Michal
Orzechowski
EGI FedCloud provider: Boris Parak,
Miroslav Ruda, Zdenek Sustr
EGI Check-in: Nicolas Liampotis
B2HANDLE: Kyriakos Ginis
B2FIND: Tobias Weigel, Claudia Martens
3
• Several of the use cases in EOSC-hub will enable scientific end-users to
perform data analysis experiments on large volumes of data, by exploiting
a PID-enabled, server-side, and parallel approach.
• Users expect easy to use interfaces like Jupyter Notebooks for interacting
with the system.
• Producing reusable results following FAIR guidelines
- Findability, Accessibility, Interoperability, and Reusability.
What do we want to do?
4
● Analysis
○ Notebooks / JupyterLab
○ FedCloud resources
● Data management
○ DataHub / Onedata
■ Space
■ Onezone
■ Oneprovider
■ Oneclient
● AAI (OIDC)
○ Check-in
● PID management
○ B2HANDLE
○ Handle.net
● Cataloguing and discovery
○ B2FIND
How?
5
● Integrating multiple services from the EOSC-hub catalogue to build a new
solution is worth the effort
○ Self-service APIs allow you to get nice combination of services without
overhead, still some steps cannot be automated
○ Support channels with providers are life savers while prototyping
● Need to validate the setup for production with a real research community
● Aim at a completely integrated solution that people can reuse
○ Provide python modules for easy interaction with services
○ Expand the EGI Notebooks service
○ Ensure that all required operations can be done using API calls
Lessons Learned
6
Enabling reproducibility with Notebooks
GitHub
Your
repository
EGI Notebooks
services
Zenodo
Your
laptop
Create repository
Upload ipynb file
Add requirements.txt
Execute
Data repository
MyBinder.org
Re-execute
Obtain GitHub project reference
Provide GitHub project reference
Discover Notebook
(use DOI)
Fellow
researchers
Journal
paper
DOI
7
An Open Science story we aim for…
GitHub
Your
repository
EGI Notebooks
and Binder service
Zenodo
Your
laptop
Create repository
Upload ipynb file
Add requirements.txt
Execute
Data repository Obtain GitHub project reference
Provide GitHub project reference
Discover Notebook
(use DOI)
Fellow
researchers
Journal
paper
DOI
Distributed
big data
DataHub
B2DROP
Etc.
8
- Onedata
▪ https://onedata.org
- EGI DataHub
▪ https://datahub.egi.eu - http://egi-datahub.readthedocs.io/
- EGI Notebooks
▪ https://www.egi.eu/services/notebooks/ - https://notebooks.egi.eu/
- EGI Check-in
▪ https://www.egi.eu/services/check-in/ - https://wiki.egi.eu/wiki/AAI
- B2FIND
▪ https://eudat.eu/services/b2find - http://eudat7-ingest.dkrz.de/
- B2HANDLE
▪ https://eudat.eu/services/b2handle - https://hdl.grnet.gr:8001/api/handles
▪ Binder
▪ https://mybinder.org
Links
eosc-hub.eu @EOSC_eu
Thank you for your
attention!
Questions?
Contact
This material by Parties of the EOSC-hub Consortium is licensed under a Creative Commons Attribution 4.0 International License.
Enol Fernandez - enol.fernandez@egi.eu
Baptiste Grenier - baptiste.grenier@egi.eu
10
1. Authenticating to DataHub using Check-in: https://datahub.egi.eu
a. Showing content of space
2. Authenticating to Notebooks using Check-in: https://cs3.fedcloud-tf.fedcloud.eu
a. Showing content of mounted space
b. Running Wind cast analysis notebook
c. Running PID registration notebook to share and publish notebooks directory
3. B2FIND cataloguing (data collected on a regular basis): http://eudat7-
ingest.dkrz.de/dataset?groups=egidatahub
4. OAI-PMH metadata in DataHub:
5. http://datahub.egi.eu/oai_pmh?verb=ListRecords&metadataPrefix=oai_dc
6. PID in Handle.net registry: http://hdl.handle.net/
7. PID pointing to shared data publicly accessible in Onedata
Demonstration flow
11
DataHub/Onedata Login with Check-in (OIDC)
12
Check-in: IdP Selection and authentication
13
IdP: Information Release consent
14
Check-in: entitlements forwarded to the service
15
DataHub: displaying spaces and providers
16
DataHub: user space content
17
Notebooks: Login with Check-in (OIDC)
18
Notebooks: Jupyter Hub env
19
Notebooks: Onedata space mounted locally
20
Notebooks: wind casting using public dataset
21
Notebooks: publishing data with PID using APIs
22
Notebooks: sharing directory, minting PID
23
B2FIND: discovery of harvested OAI-PMH metadata
24
B2FIND: displaying an entry
25
DataHub: Displaying OAI-PMH metadata
26
Handle.net: the PID in the registry
27
DataHub: the published dataset, from the PID

More Related Content

PDF
Open Data Analysis with EOSC-hub services
PDF
HNSciCloud update @ the World LHC Computing Grid deployment board
PPTX
Sharing Big Data - Bob Jones
PDF
Data management using GeoNetwork at NCI - Jingbo Wang, Kesley Druken (NCI)
PDF
The Science Cloud Users: Challenges and Needs
PPTX
The AGINFRA+ Virtual Research Environment (VRE)
PDF
Zenodo Repository and the Open Research Data in H2020 (OAW2016)
PPTX
Open Science Days 2014 - Becker - Repositories and Linked Data
Open Data Analysis with EOSC-hub services
HNSciCloud update @ the World LHC Computing Grid deployment board
Sharing Big Data - Bob Jones
Data management using GeoNetwork at NCI - Jingbo Wang, Kesley Druken (NCI)
The Science Cloud Users: Challenges and Needs
The AGINFRA+ Virtual Research Environment (VRE)
Zenodo Repository and the Open Research Data in H2020 (OAW2016)
Open Science Days 2014 - Becker - Repositories and Linked Data

What's hot (20)

PPT
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
PPT
Report on EDINA Authentication Related Academic Sector Activities
ODP
e-Infrastructure @ Science
 
PPTX
An Emerging Standard for Research-Quality Images: What IIIF Means for Digital...
PPT
Free and Open Source Software for Regional Spatial Data Infrastructures
PPTX
IPTC Rights Statements For News
PPTX
WEBINAR: "How to manage your data to make them open and fair"
PDF
Data sharing in EOSC-hub: perspectives on “sensitive” data
PPTX
IPTC New Taxonomies Ideas
PDF
WEBINAR: Open Access to publications in Horizon 2020
PPTX
WEBINAR: Open Research Data in Horizon 2020
PDF
Helix Nebula Cloud Procurement Activities
PPTX
IPTC Semantic Web 2012 Spring Working Group
PPTX
3rd DBpedia Community Meeting - ALIGNED
PDF
EOSC-hub and OpenAIRE Advance webinar - introduction
PDF
V discoverdrupal
PPTX
IoT Observatory
PPTX
eROSA Stakeholder WS1: EGI
PPTX
Overview of the OA mandate and OpenAIRE infrastructure, Inge Van Nieuwerburgh...
PPTX
Tuesday 5 May: Definition and Representation of National Web Domains across W...
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
Report on EDINA Authentication Related Academic Sector Activities
e-Infrastructure @ Science
 
An Emerging Standard for Research-Quality Images: What IIIF Means for Digital...
Free and Open Source Software for Regional Spatial Data Infrastructures
IPTC Rights Statements For News
WEBINAR: "How to manage your data to make them open and fair"
Data sharing in EOSC-hub: perspectives on “sensitive” data
IPTC New Taxonomies Ideas
WEBINAR: Open Access to publications in Horizon 2020
WEBINAR: Open Research Data in Horizon 2020
Helix Nebula Cloud Procurement Activities
IPTC Semantic Web 2012 Spring Working Group
3rd DBpedia Community Meeting - ALIGNED
EOSC-hub and OpenAIRE Advance webinar - introduction
V discoverdrupal
IoT Observatory
eROSA Stakeholder WS1: EGI
Overview of the OA mandate and OpenAIRE infrastructure, Inge Van Nieuwerburgh...
Tuesday 5 May: Definition and Representation of National Web Domains across W...
Ad

Similar to Open Data analysis with EOSC-hub services (20)

PDF
Software for data management and exploitation
PDF
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
PPTX
EOSC-hub service portfolio
PPTX
ROHub - Research Object Management Platform Introduction
PPTX
RO-crate-FDO-ROHub
PPTX
European Open Science Cloud: Concept, status and opportunities
PPTX
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
PPTX
ROHub - RO-Crate Research Object Management Platform General Introduction
PPTX
2019 05-21 egi and eosc - final
PPTX
2019 02-12 eosc-hub for eo
PPTX
Introduction to the EOSC-hub project
PDF
Integrating and managing services for the European Open Science Cloud
PPTX
2019 06-18 eosc hub tnc 2019
PPTX
2019 01-15 pa nosc kickoff
PPTX
EGI and EUDAT support to the PaNOSC project
PPTX
EOSC Ecosystem, EOSC-hub week, Prague
PPTX
EOSC-hub in EOSC context
PPTX
Gergely Sipos: Leading e-infra project communities into EOSC
PPTX
ENES Climate Analytics Service (ECAS)
PPTX
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Software for data management and exploitation
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
EOSC-hub service portfolio
ROHub - Research Object Management Platform Introduction
RO-crate-FDO-ROHub
European Open Science Cloud: Concept, status and opportunities
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
ROHub - RO-Crate Research Object Management Platform General Introduction
2019 05-21 egi and eosc - final
2019 02-12 eosc-hub for eo
Introduction to the EOSC-hub project
Integrating and managing services for the European Open Science Cloud
2019 06-18 eosc hub tnc 2019
2019 01-15 pa nosc kickoff
EGI and EUDAT support to the PaNOSC project
EOSC Ecosystem, EOSC-hub week, Prague
EOSC-hub in EOSC context
Gergely Sipos: Leading e-infra project communities into EOSC
ENES Climate Analytics Service (ECAS)
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Ad

More from OpenAIRE (20)

PDF
10th OpenAIRE Content Providers Community Call
PDF
9th Content Providers Community Call\
PPTX
OpenAIRE in the European Open Science Cloud (EOSC)
PDF
8th Content Providers Community Call
PDF
7th Content Providers Community Call
PDF
OpenAIRE PROVIDE Dashboard for Turkish repository managers
PDF
What will it cost to manage and share my data?
PDF
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
PDF
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
PDF
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
PDF
6th Content Providers Community Call
PPTX
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
PPTX
20200504_Research Data & the GDPR: How Open is Open?
PDF
20200504_Data, Data Ownership and Open Science
PPTX
20200429_Research Data & the GDPR: How Open is Open? (updated version)
PDF
20200429_Data, Data Ownership and Open Science
PPTX
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
PDF
COVID-19: Activities, tools, best practice and contact points in Greece
PDF
5th Content Providers Community Call
PDF
4th Content Providers Community Call
10th OpenAIRE Content Providers Community Call
9th Content Providers Community Call\
OpenAIRE in the European Open Science Cloud (EOSC)
8th Content Providers Community Call
7th Content Providers Community Call
OpenAIRE PROVIDE Dashboard for Turkish repository managers
What will it cost to manage and share my data?
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
6th Content Providers Community Call
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_Research Data & the GDPR: How Open is Open?
20200504_Data, Data Ownership and Open Science
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Data, Data Ownership and Open Science
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
COVID-19: Activities, tools, best practice and contact points in Greece
5th Content Providers Community Call
4th Content Providers Community Call

Recently uploaded (20)

PPTX
2. Earth - The Living Planet Module 2ELS
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
neck nodes and dissection types and lymph nodes levels
PPT
Chemical bonding and molecular structure
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Crime Scene Investigation: A Guide for Law Enforcement (2013 Update)
PPTX
Production technology of seed spices,,,,
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
2. Earth - The Living Planet Module 2ELS
bbec55_b34400a7914c42429908233dbd381773.pdf
neck nodes and dissection types and lymph nodes levels
Chemical bonding and molecular structure
The KM-GBF monitoring framework – status & key messages.pptx
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
INTRODUCTION TO EVS | Concept of sustainability
Introduction to Fisheries Biotechnology_Lesson 1.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
HPLC-PPT.docx high performance liquid chromatography
Crime Scene Investigation: A Guide for Law Enforcement (2013 Update)
Production technology of seed spices,,,,
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx

Open Data analysis with EOSC-hub services

  • 1. EOSC-hub receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 777536. eosc-hub.eu @EOSC_eu Baptiste Grenier / Enol Fernández EGI Foundation Open Data analysis with EOSC-hub services Dissemination level: Public
  • 2. 2 Thanks to the EOSC-hub distributed team! Onedata and DataHub: Lukasz Dutka, Lukasz Opiola, Bartosz Kryza, Michal Orzechowski EGI FedCloud provider: Boris Parak, Miroslav Ruda, Zdenek Sustr EGI Check-in: Nicolas Liampotis B2HANDLE: Kyriakos Ginis B2FIND: Tobias Weigel, Claudia Martens
  • 3. 3 • Several of the use cases in EOSC-hub will enable scientific end-users to perform data analysis experiments on large volumes of data, by exploiting a PID-enabled, server-side, and parallel approach. • Users expect easy to use interfaces like Jupyter Notebooks for interacting with the system. • Producing reusable results following FAIR guidelines - Findability, Accessibility, Interoperability, and Reusability. What do we want to do?
  • 4. 4 ● Analysis ○ Notebooks / JupyterLab ○ FedCloud resources ● Data management ○ DataHub / Onedata ■ Space ■ Onezone ■ Oneprovider ■ Oneclient ● AAI (OIDC) ○ Check-in ● PID management ○ B2HANDLE ○ Handle.net ● Cataloguing and discovery ○ B2FIND How?
  • 5. 5 ● Integrating multiple services from the EOSC-hub catalogue to build a new solution is worth the effort ○ Self-service APIs allow you to get nice combination of services without overhead, still some steps cannot be automated ○ Support channels with providers are life savers while prototyping ● Need to validate the setup for production with a real research community ● Aim at a completely integrated solution that people can reuse ○ Provide python modules for easy interaction with services ○ Expand the EGI Notebooks service ○ Ensure that all required operations can be done using API calls Lessons Learned
  • 6. 6 Enabling reproducibility with Notebooks GitHub Your repository EGI Notebooks services Zenodo Your laptop Create repository Upload ipynb file Add requirements.txt Execute Data repository MyBinder.org Re-execute Obtain GitHub project reference Provide GitHub project reference Discover Notebook (use DOI) Fellow researchers Journal paper DOI
  • 7. 7 An Open Science story we aim for… GitHub Your repository EGI Notebooks and Binder service Zenodo Your laptop Create repository Upload ipynb file Add requirements.txt Execute Data repository Obtain GitHub project reference Provide GitHub project reference Discover Notebook (use DOI) Fellow researchers Journal paper DOI Distributed big data DataHub B2DROP Etc.
  • 8. 8 - Onedata ▪ https://onedata.org - EGI DataHub ▪ https://datahub.egi.eu - http://egi-datahub.readthedocs.io/ - EGI Notebooks ▪ https://www.egi.eu/services/notebooks/ - https://notebooks.egi.eu/ - EGI Check-in ▪ https://www.egi.eu/services/check-in/ - https://wiki.egi.eu/wiki/AAI - B2FIND ▪ https://eudat.eu/services/b2find - http://eudat7-ingest.dkrz.de/ - B2HANDLE ▪ https://eudat.eu/services/b2handle - https://hdl.grnet.gr:8001/api/handles ▪ Binder ▪ https://mybinder.org Links
  • 9. eosc-hub.eu @EOSC_eu Thank you for your attention! Questions? Contact This material by Parties of the EOSC-hub Consortium is licensed under a Creative Commons Attribution 4.0 International License. Enol Fernandez - enol.fernandez@egi.eu Baptiste Grenier - baptiste.grenier@egi.eu
  • 10. 10 1. Authenticating to DataHub using Check-in: https://datahub.egi.eu a. Showing content of space 2. Authenticating to Notebooks using Check-in: https://cs3.fedcloud-tf.fedcloud.eu a. Showing content of mounted space b. Running Wind cast analysis notebook c. Running PID registration notebook to share and publish notebooks directory 3. B2FIND cataloguing (data collected on a regular basis): http://eudat7- ingest.dkrz.de/dataset?groups=egidatahub 4. OAI-PMH metadata in DataHub: 5. http://datahub.egi.eu/oai_pmh?verb=ListRecords&metadataPrefix=oai_dc 6. PID in Handle.net registry: http://hdl.handle.net/ 7. PID pointing to shared data publicly accessible in Onedata Demonstration flow
  • 11. 11 DataHub/Onedata Login with Check-in (OIDC)
  • 12. 12 Check-in: IdP Selection and authentication
  • 17. 17 Notebooks: Login with Check-in (OIDC)
  • 19. 19 Notebooks: Onedata space mounted locally
  • 20. 20 Notebooks: wind casting using public dataset
  • 21. 21 Notebooks: publishing data with PID using APIs
  • 23. 23 B2FIND: discovery of harvested OAI-PMH metadata
  • 26. 26 Handle.net: the PID in the registry
  • 27. 27 DataHub: the published dataset, from the PID