SlideShare a Scribd company logo
eXtreme DataCloud is co-funded by the Horizon2020
Framework Program – Grant Agreement 777367
Copyright © Members of the XDC Collaboration, 2017-2020
Data Management for extreme scale computing
CTA USE CASE
Review
Gillardo Frédéric
gillardo@lapp.in2p3.fr
11/04/2019 1
CTA Data Workflow
Reconst
Calib
DL0/Bulk Archive
x
Scientist
Interface
DL3 / High
level data
Canary Island ,
Online
Process
Chile ,
Online
Process
eXtreme DataCloud 3
Use Case Goals
• Objectives:
Archive PBs of data using an ingest method.
Index files using metadata contained in the header files
Query files using metadata parameters
• XDC Services Requirements:
• OneData:
• Onedata
• Metadata management
• QoS (policy definition)
3
Ingest
DL0
Preprocessing
Metadata
database
Query
(Metadata)
List of filenames
files
Data
management
Archive
Producer Consumer
Retrieve
(fileName)
User Stories
As an “Archive Manager”, I can configure the system to extract metadata during the
preprocessing operation.
Related with :
As an “Archive Manager”, I can define rules based on metadata to archive files on tape only or on disc & on tape, at ingest time.
As an “Archive Manager”, I can define rules based on metadata to duplicate archive files on tape on low latency storage to be soon quickly retrieved.
As an “Archive Manager”, I can define rules based on metadata to create replicas on a specific data center, at ingest time.
As an “Archive Manager”, I can define rules based on metadata to prevent deletion of files
As an “Archive Manager”, I can delete files based on metadata values
As a “Archive User”, I can make a query based on metadata parameters to get a list of
logical files names
4
OneData installation at LAPP
and CCIN2P3
POSIX
40TB
LAPP-
PROVIDER
XDC-DEMO1
S3
4TB
CC-PROVIDER-02
POSIX
6TB
5
Use Case Architecture
eXtreme DataCloud 6
Query by
Metadata Restquery
Query by LFMFile
System
preprocessing
CTA HDF5
Extraction
Metada
ingestFile
Generator
Conclusion
What has been demoed:
• Metadata index creation using REST API
• Deploy the « preprocessing’s service » using docker compose
• Ingest a HDF5 files
• Query File ID using HDF5 header
• Retrieve file using CDMI interface and File ID
eXtreme DataCloud 7

More Related Content

PPTX
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
PDF
Expert Roundtable: The Future of Metadata After Hive Metastore
PDF
iRODS/Dataverse Project by Jonathan Crabtree
PDF
ORCID for DSpace
PDF
Globus: Enabling the Open Storage Network
PDF
Distributed Timeseries Database In Go (gophercon India 17)
PDF
So we all have ORCID integrations, now what?
PDF
Enabling Secure Data Discoverability (SC21 Tutorial)
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
Expert Roundtable: The Future of Metadata After Hive Metastore
iRODS/Dataverse Project by Jonathan Crabtree
ORCID for DSpace
Globus: Enabling the Open Storage Network
Distributed Timeseries Database In Go (gophercon India 17)
So we all have ORCID integrations, now what?
Enabling Secure Data Discoverability (SC21 Tutorial)

What's hot (20)

PDF
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
PPT
DataFinder concepts and example: General (20100503)
PPT
Csci12 report aug18
PPT
20090701 Climate Data Staging
PPTX
Entity Framework Core
PPTX
Extending DSpace 7: DSpace-CRIS and DSpace-GLAM for empowered repositories an...
PDF
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
PDF
ORCID Adoption & Integration in DSpace
PPTX
Or2019 DSpace 7 Enhanced submission & workflow
PDF
DSpace-CRIS & OpenAIRE
PPTX
Visualizing Austin's data with Elasticsearch and Kibana
PPT
Semantic Technology In Oracle Database 12c
PPT
Jdk 10 sneak peek
PPTX
Linked Open Data and DANS
 
PPT
Java 9 Security Enhancements in Practice
PPTX
PHD Virtual: Recovering SharePoint and Exchange Server Data
PDF
GlobusWorld 2021 Tutorial: Building with the Globus Platform
PPTX
Db presentation google_megastore
PDF
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
PDF
Data Orchestration at Scale (GlobusWorld Tour West)
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
DataFinder concepts and example: General (20100503)
Csci12 report aug18
20090701 Climate Data Staging
Entity Framework Core
Extending DSpace 7: DSpace-CRIS and DSpace-GLAM for empowered repositories an...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
ORCID Adoption & Integration in DSpace
Or2019 DSpace 7 Enhanced submission & workflow
DSpace-CRIS & OpenAIRE
Visualizing Austin's data with Elasticsearch and Kibana
Semantic Technology In Oracle Database 12c
Jdk 10 sneak peek
Linked Open Data and DANS
 
Java 9 Security Enhancements in Practice
PHD Virtual: Recovering SharePoint and Exchange Server Data
GlobusWorld 2021 Tutorial: Building with the Globus Platform
Db presentation google_megastore
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Data Orchestration at Scale (GlobusWorld Tour West)
Ad

Similar to XDC demo: CTA (20)

PPTX
Sitecore Personalization on websites cached on CDN servers
PPT
Corporate-informatica-training-in-mumbai
PPT
Corporate-informatica-training-in-mumbai
PDF
Data Science with the Help of Metadata
PDF
IRJET - A Secure Access Policies based on Data Deduplication System
PPT
DataCite How To: Use the MDS
PPTX
Elastic storage in the cloud session 5224 final v2
PPTX
Hadoop introduction
PPTX
The basics of remote data replication
PDF
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
PDF
Cloud-Optimized HDF5 Files
PDF
Enterprise guide to building a Data Mesh
PPTX
NetApp & SharePoint Pro Connections Webinar
PDF
A Gen3 Perspective of Disparate Data
PPTX
Se training storage grid webscale technical overview
PPTX
Hadoop File system (HDFS)
PPTX
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
PPTX
Cloud computing UNIT 2.1 presentation in
PPT
Organizing the Data Chaos of Scientists
PDF
HDFCloud Workshop: HDF5 in the Cloud
Sitecore Personalization on websites cached on CDN servers
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
Data Science with the Help of Metadata
IRJET - A Secure Access Policies based on Data Deduplication System
DataCite How To: Use the MDS
Elastic storage in the cloud session 5224 final v2
Hadoop introduction
The basics of remote data replication
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Cloud-Optimized HDF5 Files
Enterprise guide to building a Data Mesh
NetApp & SharePoint Pro Connections Webinar
A Gen3 Perspective of Disparate Data
Se training storage grid webscale technical overview
Hadoop File system (HDFS)
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cloud computing UNIT 2.1 presentation in
Organizing the Data Chaos of Scientists
HDFCloud Workshop: HDF5 in the Cloud
Ad

More from EOSC-hub project (20)

PPTX
EOSC-hub Early Adopter Programme
PPTX
2019 05-21 egi and eosc - final
PPTX
Introduction to service management and FitSM
PPTX
Service management board (SMB), Service providers’ forum (SPF)
PPTX
Joining the EOSC-hub as a Service Provider
PDF
PID services - understandability and findability of data
PDF
Software for data management and exploitation
PDF
Repositories for long-term preservation - certification
PDF
EOSC working group on FAIR
PDF
Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...
PDF
Services to support FAIR data - Introduction
PDF
EOSC-synergy
PDF
PDF
EOSC-Pillar
PDF
NI4OS-Europe
PDF
Excellerat CoE
PDF
Pathways for EOSC-hub and MaX collaboration
PDF
Overview on the HPC CoEs panorama
PDF
Overview of the Onboarding and validation process and the Rules of Participat...
PDF
ELIXIR Competence Centre in EOSC-hub
EOSC-hub Early Adopter Programme
2019 05-21 egi and eosc - final
Introduction to service management and FitSM
Service management board (SMB), Service providers’ forum (SPF)
Joining the EOSC-hub as a Service Provider
PID services - understandability and findability of data
Software for data management and exploitation
Repositories for long-term preservation - certification
EOSC working group on FAIR
Updates on the FAIR Data Maturity Model RDA Working Group & the DG RTD FAIR i...
Services to support FAIR data - Introduction
EOSC-synergy
EOSC-Pillar
NI4OS-Europe
Excellerat CoE
Pathways for EOSC-hub and MaX collaboration
Overview on the HPC CoEs panorama
Overview of the Onboarding and validation process and the Rules of Participat...
ELIXIR Competence Centre in EOSC-hub

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Tartificialntelligence_presentation.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPTX
1. Introduction to Computer Programming.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Electronic commerce courselecture one. Pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MYSQL Presentation for SQL database connectivity
Assigned Numbers - 2025 - Bluetooth® Document
Machine Learning_overview_presentation.pptx
Big Data Technologies - Introduction.pptx
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Diabetes mellitus diagnosis method based random forest with bat algorithm
Tartificialntelligence_presentation.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Spectroscopy.pptx food analysis technology
1. Introduction to Computer Programming.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Electronic commerce courselecture one. Pdf

XDC demo: CTA

  • 1. eXtreme DataCloud is co-funded by the Horizon2020 Framework Program – Grant Agreement 777367 Copyright © Members of the XDC Collaboration, 2017-2020 Data Management for extreme scale computing CTA USE CASE Review Gillardo Frédéric gillardo@lapp.in2p3.fr 11/04/2019 1
  • 2. CTA Data Workflow Reconst Calib DL0/Bulk Archive x Scientist Interface DL3 / High level data Canary Island , Online Process Chile , Online Process
  • 3. eXtreme DataCloud 3 Use Case Goals • Objectives: Archive PBs of data using an ingest method. Index files using metadata contained in the header files Query files using metadata parameters • XDC Services Requirements: • OneData: • Onedata • Metadata management • QoS (policy definition) 3 Ingest DL0 Preprocessing Metadata database Query (Metadata) List of filenames files Data management Archive Producer Consumer Retrieve (fileName)
  • 4. User Stories As an “Archive Manager”, I can configure the system to extract metadata during the preprocessing operation. Related with : As an “Archive Manager”, I can define rules based on metadata to archive files on tape only or on disc & on tape, at ingest time. As an “Archive Manager”, I can define rules based on metadata to duplicate archive files on tape on low latency storage to be soon quickly retrieved. As an “Archive Manager”, I can define rules based on metadata to create replicas on a specific data center, at ingest time. As an “Archive Manager”, I can define rules based on metadata to prevent deletion of files As an “Archive Manager”, I can delete files based on metadata values As a “Archive User”, I can make a query based on metadata parameters to get a list of logical files names 4
  • 5. OneData installation at LAPP and CCIN2P3 POSIX 40TB LAPP- PROVIDER XDC-DEMO1 S3 4TB CC-PROVIDER-02 POSIX 6TB 5
  • 6. Use Case Architecture eXtreme DataCloud 6 Query by Metadata Restquery Query by LFMFile System preprocessing CTA HDF5 Extraction Metada ingestFile Generator
  • 7. Conclusion What has been demoed: • Metadata index creation using REST API • Deploy the « preprocessing’s service » using docker compose • Ingest a HDF5 files • Query File ID using HDF5 header • Retrieve file using CDMI interface and File ID eXtreme DataCloud 7