SlideShare a Scribd company logo
How to expose
research data in EOSC
The “EOSC Dataset Minimum
Information” (EDMI) approach
EUDAT conference, January 22-25
Rafael C Jimenez
ELIXIR & EOSCpilot
How to expose research data in EOSC
Nucleic Acids Research annual Database Issue
and the NAR online Molecular Biology Database Collection in 2012.
MY Galperin, GR Cochrane – Nucleic Acids Research, 2011
Data resources in life science
1800
• Diverse
• Plentiful
• Disperse
databases
Challenges
data
resource
interface
datasets
data
resource
interface
datasets
Many, different & variable …
• Sustainability
• Findability
• Accessibility
• Consistency
• Interoperability
• Completeness
• Redundancy
• Reusability
• Integration
• Replication
• Compute
• …
users services
+
To demonstrate how to ensure availability of
scientific data and data-analysis services
through a cloud infrastructure and design a
stakeholder driven governance framework
Lorenza Saracco and Carmela Asero | European Commission,
DG Research & Innovation | Pisa, 14-15 September 2017
Pilot action for EOSC
EOSC
data
services
access
governance
<data>
<data>
EOSC
data
services
access
governance
<data>
<data>
To demonstrate how to ensure availability of
scientific data and data-analysis services
to users and services
through a cloud infrastructure and design a
stakeholder driven governance framework
EOSCpilot data interoperability
Scope
The work of this working group
and its recommendations
Defined by 13 guiding principles
Reuse: Leverage the rich legacy of Research Infrastructures
• Making data FAIR is the responsibility of the Research Infrastructures and
their data repositories
• We must rely on research infrastructure data catalogues
• We must support an ecosystem of catalogues
• We should provide metadata quality recommendations to feedback to RIs
Least: The least possible metadata for the most benefit
• Findability should come first
• Common and minimum metadata
• Focus on common data types: datasets and data repositories
• Flexible metadata models to embrace domain specifics
• Service requirements and operational metadata first class citizen
Practical: Sustainable and pragmatic delivery
• Engage EOSC demonstrator data repositories
• Propose methods to expose metadata
• Simple to implement, easy to sustain
• Deliver guidelines and demonstrators
Results from RDA-Bluebridge
and OpenFair2017 workshops
1. Making data FAIR is the responsibility of the
Research Infrastructures and their data
resources
2. We must rely on research infrastructure data
resources and data catalogues
3. We must support an ecosystem of catalogues
4. We should provide recommendations to improve
metadata quality
Reuse: Leverage the rich legacy of Research
Infrastructures
1. Making data FAIR is the responsibility of the
Research Infrastructures and their data
repositories
2. We must rely on research infrastructure data
catalogues
3. We must support an ecosystem of catalogues
4. We should provide recommendations to improve
metadata quality
Reuse: Leverage the rich legacy of Research
Infrastructures
Some catalogues in life sciences
Databases
Datasets
Publications
Tools
Training
Events
Ontologies
Standards
Samples
OmicsDI
Identifiers.org
Bio.tools
Biojs
Biocontainers
PSICQUIC
Bio.tools
STM
Ecosystem of catalogues
Google re3data
Generic Research Life Sciences Omics sciences
OmicsDI
+- metadata richness
reuse
… … … …
Ecosystem of catalogues
EDMI
EDMI
EDMI
EDMI
EDMI
EDMI
…
…
EDMI
……
import
import
import
specificgeneric
+- metadata submission and curation
Kimmo Koski, Managing Director, CSC – IT Center for Science & EUDAT Coordinator
5. Findability and accessibility should come first
6. Common and minimum metadata for finding and
accessing data
7. Reuse existing metadata models
8. Focus on common data types: datasets and data
resources
9. Service requirements and operational metadata
first class citizen
Least: The least possible metadata for the
most benefit
5. Findability and accessibility should come first
6. Common and minimum metadata for finding
and accessing data
7. Reuse existing metadata models
8. Focus on common data types: datasets and data
resources
9. Service requirements and operational metadata
first class citizen
Least: The least possible metadata for the
most benefit
Schema B
Schema A Schema C
Schema D
• Common and minimum metadata for finding
and accessing data
• Not aiming to be descriptive
• Reuse existing metadata models
• Crosswalk across existing models
5. Findability and accessibility should come first
6. Common and minimum metadata for finding and
accessing data
7. Reuse existing metadata models
8. Focus on common data types: datasets and data
resources
9. Service requirements and operational metadata
first class citizen
Least: The least possible metadata for the
most benefit
Finding and Accessing datasets and data
resources
Data resource Dataset Records
1..* 1..*
<data>
5. Findability and accessibility should come first
6. Common and minimum metadata for finding and
accessing data
7. Reuse existing metadata models
8. Focus on common data types: datasets and data
resources
9. Service requirements and operational metadata
first class citizen
Least: The least possible metadata for the
most benefit
Scientific
Functional
Researchers
File
Operational
Services
Scientific & Services metadata requirements
for finding and accessing datasets
Practical: Sustainable and pragmatic
delivery
10. Engage EOSC demonstrator data repositories
11. Propose methods to expose metadata
12. Simple to implement, easy to sustain
13. Deliver guidelines and demonstrators
guidelines
• EOSC Dataset Minimum Information Guidelines
• 12 metadata properties
• Facilitating finding and accessing datasets
• Not a data model, but a cross walk to existing dataset models
• Focus on scientific and service requirements
• Compatible with existing data models and interfaces
• Provides pointers to the dataset records and references to their
data standards and access interfaces
• Not mandatory but minimum with the goal to improve status quo
• One way to measure FAIRness of datasets and data resources
EDMI
EDMI metadata properties
Properties Description M/F M/O R/F R/O O/F O/O
MINIMUM
name A descriptive name for the dataset yes
description A short summary describing a dataset yes
identifier The identifier property represents any kind of identifier for any kind of
dataset
yes
url The location of a page describing the dataset yes yes
creator The creator/author of this dataset yes yes
dateCreated The date on which the dataset was created yes yes
license A license under which the dataset is distributed yes yes
dataStandard The standard in which the content of the dataset is represented yes yes
dateModified The date on which the dataset was most recently modified yes
structure The description of the structure of the dataset yes
accessUrl The link to download the dataset yes
accessInterface The type of interface to present the dataset yes
RECOMMENDED
includedeIn A dataset or data catalog which contains the dataset yes yes
measurementTechniqueA technique or technology used in a dataset corresponding to the
method used for measuring the corresponding variables
yes
keywords Keywords or tags used to describe the dataset yes
variablesMeasured The variables that are measured in the dataset yes
format The format in which the content of the dataset is encoded to present
the information, typically a MIME format
yes
scientificType Scientific domain or type of the information provided in the datataset yes
includes A dataset or data catalog contained in the dataset yes
contentType Type of content provided in the dataset based on its origin and type of
processes (raw, processed, summarised)
yes
M/F: Minimum Functional, M/O: Minimum Operational
R/F: Recommended Functional, R/O: Recommended Operational
O/F: Optional Functional, O/O: Optional Operational
Simple architecture, simple adoption
<data>
data
access
metadata
index
metadata
discovery
metadata
access
Data
resources
Dataset
metadata
catalogues
Services
EDMIEDMI
EDMI EOSC
Datasets
Minimum
Information
Metadata
Guideline
Metadata catalogues
RI data
e-infrastructure services
Compute Storage Transfer …
Data resources … …
…
… … … …
Datasets
Scientific
domain A
Scientific
domain B
Scientific
domain C
Generic metadata
catalogue
Data entry … … … … … … … …
Metadata catalogues
RI data
e-infrastructure services
Compute Storage Transfer …
Data resources … …
…
… … … …
Datasets
Scientific
domain A
Scientific
domain B
Scientific
domain C
Data entry … … … … … … … …
Generic metadata
catalogue
Demonstrators
• Evaluate findability and accessibility of datasets via EDMI
functional and operational metadata
• Discovery of compliant data resources and metadata
catalogues
• Research schemas for exposing dataset metadata
• Description and guidelines per metadata property in
collaboration with RDA MIG
Evaluate findability and accessibility of datasets
via EDMI functional and operational metadata
<data>
data
access
metadata
index
metadata
discovery
metadata
access
Data
resources
Dataset
metadata
catalogues
Services
EDMIEDMI
EDMI
EOSC Datasets Minimum Information Guideline
OmicsDI
Discovery of compliant data resources and
metadata catalogues
<data>
data
access
metadata
index
metadata
discovery
metadata
access
Data
resources
Dataset
metadata
catalogues
Services
EDMIEDMI
Data resources
metadata
catalogues
EDMI
metadata
index
metadata
discovery
metadata
discovery
EDMI EOSC
Datasets
Minimum
Information
Metadata
Guideline
metadata
access
OmicsDI
Research schemas for
exposing dataset metadata
<div itemscope itemtype="http://schema.org/Recipe">
<h1 itemprop="name">Classic potato salad</h1>
<div itemprop="nutrition” itemscope
itemtype="http://schema.org/NutritionInformation">
Nutrition facts:
<span itemprop="calories">144 kcal</span>,
</div>
Ingredients:
- <span itemprop="recipeIngredient">800g small new potato</span>
- <span itemprop="recipeIngredient">3 shallot</span>
. . .
Structured data markup for web pages
RDFa
JSON-LD
Microdata With markup
Major data
resource
Small data
resource
Research
schemas
Research
schemas
Research schemas
A community initiative built on top of Schemas.org to
improve Findability and Accessibility in Research
RegistrySearch
engine
Data
aggregator
Evaluate findability and accessibility of datasets
via EDMI functional and operational metadata
<data>
data
access
metadata
index
metadata
discovery
metadata
access
Data
resources
Dataset
metadata
catalogues
Services
EDMIEDMI
EDMI
EOSC Datasets Minimum Information Guideline
OmicsDI
Description and guidelines
per metadata property
in collaboration with RDA MIG
Recommendations and guidelines per metadata property.
Generic and domain specific with examples.
How to expose research data in EOSC
Structure
Access interface
Access URL
Date modified
EOSCpilot data interoperability - summary
Recommendations
Architecture
define
validate
Data int. demonstrators
Adoption
&
testing
Data
resources
discovery
Research
schemas
markup
RDA
metadata
elements
Principles
Scope | Direction
2017
2018
FAIR
FAIR
Goal
Demonstrate how to ensure availability of scientific data
to users and services through a cloud infrastructure
Plan
Thanks for
your attention
Questions?
EOSCpilot data interoperability EC Report Dec 2017
https://drive.google.com/file/d/1UmxB4YLp_LJmjcpZnWHqlu-52gcU7RON/view?usp=sharing
Data access interfaces
Graphical
User
Interface
(GUI)
FTP
access
Database
access
Application
Programmatic
Interface
(API)
Web
Services
Reseachers Developers
<data>
workspace
clouds data tools workflows identities
containers
cloud

More Related Content

PPT
Knowledge Discovery in an Agents Environment
PPT
The eCrystals Federation
PPT
Northumbria University Geospatial Metadata Workshop 20110505
PDF
Engaging Information Professionals in the Process of Authoritative Interlinki...
PPTX
DataONE Education Module 07: Metadata
PPT
Curation and Preservation of Crystallography Data
PDF
Integration of research literature and data (InFoLiS)
PPT
Integrated research data management in the Structural Sciences
Knowledge Discovery in an Agents Environment
The eCrystals Federation
Northumbria University Geospatial Metadata Workshop 20110505
Engaging Information Professionals in the Process of Authoritative Interlinki...
DataONE Education Module 07: Metadata
Curation and Preservation of Crystallography Data
Integration of research literature and data (InFoLiS)
Integrated research data management in the Structural Sciences

What's hot (20)

PPTX
A Big Picture in Research Data Management
PPTX
PPTX
FAIRy stories: the FAIR Data principles in theory and in practice
PPTX
The swings and roundabouts of a decade of fun and games with Research Objects
PPTX
FAIR History and the Future
PPTX
RO-Crate: A framework for packaging research products into FAIR Research Objects
PPTX
Fair data principles for AOASG
PDF
Brislinger, Recker: Keeping data re-usable in the evs
PDF
Alive and kicking! Keeping data re-usable in the European Values Study
PPT
Comeaux RDAP11 Data Archives in Federal Agencies
PDF
Preparing Data for Sharing: The FAIR Principles
PPTX
FAIR Workflows and Research Objects get a Workout
PPTX
Open Access: Open Access Looking for ways to increase the reach and impact of...
PPTX
D4Science Data infrastructure: a facilitator for a FAIR data management
PPTX
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
PPTX
DataONE Education Module 02: Data Sharing
PDF
"Cool" metadata for FAIR data
PDF
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
PPTX
How are we Faring with FAIR? (and what FAIR is not)
PDF
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
A Big Picture in Research Data Management
FAIRy stories: the FAIR Data principles in theory and in practice
The swings and roundabouts of a decade of fun and games with Research Objects
FAIR History and the Future
RO-Crate: A framework for packaging research products into FAIR Research Objects
Fair data principles for AOASG
Brislinger, Recker: Keeping data re-usable in the evs
Alive and kicking! Keeping data re-usable in the European Values Study
Comeaux RDAP11 Data Archives in Federal Agencies
Preparing Data for Sharing: The FAIR Principles
FAIR Workflows and Research Objects get a Workout
Open Access: Open Access Looking for ways to increase the reach and impact of...
D4Science Data infrastructure: a facilitator for a FAIR data management
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
DataONE Education Module 02: Data Sharing
"Cool" metadata for FAIR data
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
How are we Faring with FAIR? (and what FAIR is not)
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Ad

Similar to How to expose research data in EOSC (20)

PPTX
Scholze liber 2015-06-25_final
PPTX
Metadata for Research Objects
PPTX
RDM Training: Publish research data with the Research Data Repository
PPTX
What infrastructure is necessary for successful research data management (RDM...
PPT
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
PPT
The UK National Chemical Database Service – an integration of commercial and ...
PDF
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
PPTX
Paving the way to open and interoperable research data service workflows Prog...
PPTX
eTRIKS Data Harmonization Service Platform
PPTX
Research Data Management at Imperial College London
PPTX
Research Data Service at the University of Edinburgh
PPTX
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
PDF
Metadata as Standard: improving Interoperability through the Research Data Al...
PPTX
Paving the way to open and interoperable research data service workflows
PPTX
L07 metadata
PPTX
Integrating an electronic lab notebook with a data repository; American Chemi...
PDF
Elns and repositories, American Chemical Society, Dallas, March 2014
PPTX
Data Archiving and Sharing
PDF
re3data.org – Registry of Research Data Repositories
PDF
White Manipulating Metadata to Enhance Access
Scholze liber 2015-06-25_final
Metadata for Research Objects
RDM Training: Publish research data with the Research Data Repository
What infrastructure is necessary for successful research data management (RDM...
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...
The UK National Chemical Database Service – an integration of commercial and ...
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...
Paving the way to open and interoperable research data service workflows Prog...
eTRIKS Data Harmonization Service Platform
Research Data Management at Imperial College London
Research Data Service at the University of Edinburgh
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
Metadata as Standard: improving Interoperability through the Research Data Al...
Paving the way to open and interoperable research data service workflows
L07 metadata
Integrating an electronic lab notebook with a data repository; American Chemi...
Elns and repositories, American Chemical Society, Dallas, March 2014
Data Archiving and Sharing
re3data.org – Registry of Research Data Repositories
White Manipulating Metadata to Enhance Access
Ad

More from EUDAT (20)

PDF
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
PDF
EUDAT Booklet Mar22 (2).pdf
PDF
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
PDF
EUDAT Brochure - B2HANDLE.pdf
PDF
EUDAT Brochure - B2DROP.pdf
PDF
EUDAT Brochure - B2SHARE.pdf
PDF
EUDAT Brochure - B2SAFE.pdf
PDF
EUDAT Brochure - B2FIND(1).pdf
PDF
EUDAT Brochure - B2ACCESS.pdf
PDF
Rob Carrillo - Writing effective service documentation for EUDAT services
PDF
Ariyo - EUDAT CDI B2 services documentation
PDF
Introduction to eudat and its services
PPTX
Using B2NOTE: The U.Porto Pilot
PPT
OpenAIRE Advance - Kick off last week
PPT
European Open Science Cloud - Skills workshop
PPT
Linking service capabilities to data stweardship competences for professional...
PPT
FAIRness of training materials
PPT
Training by EOSC-hub - Integrating and Managing services for the European Ope...
PDF
Draft Governance Framework for the EOSC
PDF
Building Interoperable AAI for Researchers
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT Booklet Mar22 (2).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2ACCESS.pdf
Rob Carrillo - Writing effective service documentation for EUDAT services
Ariyo - EUDAT CDI B2 services documentation
Introduction to eudat and its services
Using B2NOTE: The U.Porto Pilot
OpenAIRE Advance - Kick off last week
European Open Science Cloud - Skills workshop
Linking service capabilities to data stweardship competences for professional...
FAIRness of training materials
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Draft Governance Framework for the EOSC
Building Interoperable AAI for Researchers

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPT
Teaching material agriculture food technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
A Presentation on Artificial Intelligence
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Spectroscopy.pptx food analysis technology
Chapter 3 Spatial Domain Image Processing.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
NewMind AI Weekly Chronicles - August'25-Week II
Teaching material agriculture food technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
MIND Revenue Release Quarter 2 2025 Press Release
The Rise and Fall of 3GPP – Time for a Sabbatical?
A Presentation on Artificial Intelligence
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Digital-Transformation-Roadmap-for-Companies.pptx

How to expose research data in EOSC

  • 1. How to expose research data in EOSC The “EOSC Dataset Minimum Information” (EDMI) approach EUDAT conference, January 22-25 Rafael C Jimenez ELIXIR & EOSCpilot
  • 3. Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2012. MY Galperin, GR Cochrane – Nucleic Acids Research, 2011 Data resources in life science 1800 • Diverse • Plentiful • Disperse databases
  • 4. Challenges data resource interface datasets data resource interface datasets Many, different & variable … • Sustainability • Findability • Accessibility • Consistency • Interoperability • Completeness • Redundancy • Reusability • Integration • Replication • Compute • … users services +
  • 5. To demonstrate how to ensure availability of scientific data and data-analysis services through a cloud infrastructure and design a stakeholder driven governance framework Lorenza Saracco and Carmela Asero | European Commission, DG Research & Innovation | Pisa, 14-15 September 2017 Pilot action for EOSC EOSC data services access governance <data> <data>
  • 6. EOSC data services access governance <data> <data> To demonstrate how to ensure availability of scientific data and data-analysis services to users and services through a cloud infrastructure and design a stakeholder driven governance framework EOSCpilot data interoperability
  • 7. Scope The work of this working group and its recommendations Defined by 13 guiding principles Reuse: Leverage the rich legacy of Research Infrastructures • Making data FAIR is the responsibility of the Research Infrastructures and their data repositories • We must rely on research infrastructure data catalogues • We must support an ecosystem of catalogues • We should provide metadata quality recommendations to feedback to RIs Least: The least possible metadata for the most benefit • Findability should come first • Common and minimum metadata • Focus on common data types: datasets and data repositories • Flexible metadata models to embrace domain specifics • Service requirements and operational metadata first class citizen Practical: Sustainable and pragmatic delivery • Engage EOSC demonstrator data repositories • Propose methods to expose metadata • Simple to implement, easy to sustain • Deliver guidelines and demonstrators Results from RDA-Bluebridge and OpenFair2017 workshops
  • 8. 1. Making data FAIR is the responsibility of the Research Infrastructures and their data resources 2. We must rely on research infrastructure data resources and data catalogues 3. We must support an ecosystem of catalogues 4. We should provide recommendations to improve metadata quality Reuse: Leverage the rich legacy of Research Infrastructures
  • 9. 1. Making data FAIR is the responsibility of the Research Infrastructures and their data repositories 2. We must rely on research infrastructure data catalogues 3. We must support an ecosystem of catalogues 4. We should provide recommendations to improve metadata quality Reuse: Leverage the rich legacy of Research Infrastructures
  • 10. Some catalogues in life sciences Databases Datasets Publications Tools Training Events Ontologies Standards Samples OmicsDI Identifiers.org Bio.tools Biojs Biocontainers PSICQUIC Bio.tools STM
  • 11. Ecosystem of catalogues Google re3data Generic Research Life Sciences Omics sciences OmicsDI +- metadata richness reuse … … … …
  • 13. Kimmo Koski, Managing Director, CSC – IT Center for Science & EUDAT Coordinator
  • 14. 5. Findability and accessibility should come first 6. Common and minimum metadata for finding and accessing data 7. Reuse existing metadata models 8. Focus on common data types: datasets and data resources 9. Service requirements and operational metadata first class citizen Least: The least possible metadata for the most benefit
  • 15. 5. Findability and accessibility should come first 6. Common and minimum metadata for finding and accessing data 7. Reuse existing metadata models 8. Focus on common data types: datasets and data resources 9. Service requirements and operational metadata first class citizen Least: The least possible metadata for the most benefit
  • 16. Schema B Schema A Schema C Schema D • Common and minimum metadata for finding and accessing data • Not aiming to be descriptive • Reuse existing metadata models • Crosswalk across existing models
  • 17. 5. Findability and accessibility should come first 6. Common and minimum metadata for finding and accessing data 7. Reuse existing metadata models 8. Focus on common data types: datasets and data resources 9. Service requirements and operational metadata first class citizen Least: The least possible metadata for the most benefit
  • 18. Finding and Accessing datasets and data resources Data resource Dataset Records 1..* 1..* <data>
  • 19. 5. Findability and accessibility should come first 6. Common and minimum metadata for finding and accessing data 7. Reuse existing metadata models 8. Focus on common data types: datasets and data resources 9. Service requirements and operational metadata first class citizen Least: The least possible metadata for the most benefit
  • 20. Scientific Functional Researchers File Operational Services Scientific & Services metadata requirements for finding and accessing datasets
  • 21. Practical: Sustainable and pragmatic delivery 10. Engage EOSC demonstrator data repositories 11. Propose methods to expose metadata 12. Simple to implement, easy to sustain 13. Deliver guidelines and demonstrators
  • 22. guidelines • EOSC Dataset Minimum Information Guidelines • 12 metadata properties • Facilitating finding and accessing datasets • Not a data model, but a cross walk to existing dataset models • Focus on scientific and service requirements • Compatible with existing data models and interfaces • Provides pointers to the dataset records and references to their data standards and access interfaces • Not mandatory but minimum with the goal to improve status quo • One way to measure FAIRness of datasets and data resources EDMI
  • 23. EDMI metadata properties Properties Description M/F M/O R/F R/O O/F O/O MINIMUM name A descriptive name for the dataset yes description A short summary describing a dataset yes identifier The identifier property represents any kind of identifier for any kind of dataset yes url The location of a page describing the dataset yes yes creator The creator/author of this dataset yes yes dateCreated The date on which the dataset was created yes yes license A license under which the dataset is distributed yes yes dataStandard The standard in which the content of the dataset is represented yes yes dateModified The date on which the dataset was most recently modified yes structure The description of the structure of the dataset yes accessUrl The link to download the dataset yes accessInterface The type of interface to present the dataset yes RECOMMENDED includedeIn A dataset or data catalog which contains the dataset yes yes measurementTechniqueA technique or technology used in a dataset corresponding to the method used for measuring the corresponding variables yes keywords Keywords or tags used to describe the dataset yes variablesMeasured The variables that are measured in the dataset yes format The format in which the content of the dataset is encoded to present the information, typically a MIME format yes scientificType Scientific domain or type of the information provided in the datataset yes includes A dataset or data catalog contained in the dataset yes contentType Type of content provided in the dataset based on its origin and type of processes (raw, processed, summarised) yes M/F: Minimum Functional, M/O: Minimum Operational R/F: Recommended Functional, R/O: Recommended Operational O/F: Optional Functional, O/O: Optional Operational
  • 24. Simple architecture, simple adoption <data> data access metadata index metadata discovery metadata access Data resources Dataset metadata catalogues Services EDMIEDMI EDMI EOSC Datasets Minimum Information Metadata Guideline
  • 25. Metadata catalogues RI data e-infrastructure services Compute Storage Transfer … Data resources … … … … … … … Datasets Scientific domain A Scientific domain B Scientific domain C Generic metadata catalogue Data entry … … … … … … … …
  • 26. Metadata catalogues RI data e-infrastructure services Compute Storage Transfer … Data resources … … … … … … … Datasets Scientific domain A Scientific domain B Scientific domain C Data entry … … … … … … … … Generic metadata catalogue
  • 27. Demonstrators • Evaluate findability and accessibility of datasets via EDMI functional and operational metadata • Discovery of compliant data resources and metadata catalogues • Research schemas for exposing dataset metadata • Description and guidelines per metadata property in collaboration with RDA MIG
  • 28. Evaluate findability and accessibility of datasets via EDMI functional and operational metadata <data> data access metadata index metadata discovery metadata access Data resources Dataset metadata catalogues Services EDMIEDMI EDMI EOSC Datasets Minimum Information Guideline OmicsDI
  • 29. Discovery of compliant data resources and metadata catalogues <data> data access metadata index metadata discovery metadata access Data resources Dataset metadata catalogues Services EDMIEDMI Data resources metadata catalogues EDMI metadata index metadata discovery metadata discovery EDMI EOSC Datasets Minimum Information Metadata Guideline metadata access OmicsDI
  • 30. Research schemas for exposing dataset metadata
  • 31. <div itemscope itemtype="http://schema.org/Recipe"> <h1 itemprop="name">Classic potato salad</h1> <div itemprop="nutrition” itemscope itemtype="http://schema.org/NutritionInformation"> Nutrition facts: <span itemprop="calories">144 kcal</span>, </div> Ingredients: - <span itemprop="recipeIngredient">800g small new potato</span> - <span itemprop="recipeIngredient">3 shallot</span> . . . Structured data markup for web pages RDFa JSON-LD Microdata With markup
  • 32. Major data resource Small data resource Research schemas Research schemas Research schemas A community initiative built on top of Schemas.org to improve Findability and Accessibility in Research RegistrySearch engine Data aggregator
  • 33. Evaluate findability and accessibility of datasets via EDMI functional and operational metadata <data> data access metadata index metadata discovery metadata access Data resources Dataset metadata catalogues Services EDMIEDMI EDMI EOSC Datasets Minimum Information Guideline OmicsDI
  • 34. Description and guidelines per metadata property in collaboration with RDA MIG
  • 35. Recommendations and guidelines per metadata property. Generic and domain specific with examples.
  • 38. EOSCpilot data interoperability - summary Recommendations Architecture define validate Data int. demonstrators Adoption & testing Data resources discovery Research schemas markup RDA metadata elements Principles Scope | Direction 2017 2018 FAIR FAIR Goal Demonstrate how to ensure availability of scientific data to users and services through a cloud infrastructure Plan
  • 39. Thanks for your attention Questions? EOSCpilot data interoperability EC Report Dec 2017 https://drive.google.com/file/d/1UmxB4YLp_LJmjcpZnWHqlu-52gcU7RON/view?usp=sharing
  • 41. workspace clouds data tools workflows identities containers cloud