SlideShare a Scribd company logo
RO-Crate: A framework for packaging research
products into FAIR Research Objects
Carole Goble
The University of Manchester
ELIXIR-UK
https://orcid.org/0000-0003-1219-2137
@caroleannegoble
This work is licensed under a
Creative Commons Attribution 4.0 International License
Stian Soiland-Reyes
The University of Manchester
BioExcel Centre of Excellence
The University of Amsterdam
https://orcid.org/0000-0001-9842-9718
@soilandreyes
RDA IG Data Fabric, FAIR Digital Object, 2021-02-25
tl;dr
Web standards-based metadata framework for bundling resources with their
context into citable reproducible packages
Machine actionable Metadata + Identifiers + Web protocols => FAIR
 What andWhy
 Examples
 How andTools
 Alignment with FDOF
Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
https://www.researchobject.org/
Many Objects are the Outcomes of Research
Each object has its own metadata and repositories
All are first class citizens and are required to make research FAIR+R
De-contextualised
Static, Fragmented
Lost Semantic linking
Rebuild to be reproducible
Contextualised
Active, Unified
Semantic linking
Buried in a PDF
figure
Scattered Reporting and Reading
integrated view
over fragmented
resources using
PIDs and metadata
Encapsulated content and references to external resources
The RO package has its own metadata, can be registered and deposited in its own
right, unpackaged and accessed, activated and reproduced if appropriate
self-describing, chiefly metadata, objects
RO Metadata file Structured metadata about the RO and content
files
links to web
resources
RO Content
Archive file format / packaging system
BagIt, zip
OCFL, Git
type
id
description
datePublished
…
directories
license
author organisation
self-describing, chiefly metadata, objects
RO Metadata file Structured metadata about the RO and content
image file
links to web
resources
RO Content
Archive file format / packaging system
directory of data
type, id
description
datePublished
creator
size
format …
https://zenodo.org/record/3541888
https://github.com/o/script
type
id
description
datePublished
…
license
author organisation
Linked Data
approach
Self-describing, chiefly metadata, objects
Strict Structure, Open ended content
How do we describe the metadata?
• PIDs + JSON-LD + Schema.org descriptors
• Opinionated profile of schema.org
• Linked Data by Stealth: JSON with gradual path to
extensibility with LD – e.g. ad-hoc terms
• Example-driven documentation
How can I add additional metadata?
• Schema.org, domain ontologies
How do I define a checklist of what is
expected to be in a type of RO?
• RO-Crate Profiles
http://www.researchobject.org/ro-crate/
Standard Web
Mark-up
https://w3id.org/ro/crate/1.1
JSON with a flat list of:
- Data Entities (e.g. files, dirs, DBs)
- Contextual Entities (e.g. people)
Objects reference each other by @id
Summary: RO-Crate in a nutshell
Practical lightweight approach to packaging research data
entities (any object) with metadata
Aggregate files and/or any URI-addressable content, with
contextual information to aid decisions about re-use: Who
What When Where Why How.
Web Native Machine readable. Human readable. Search
engine friendly. Familiar.
Extensible and Incremental: add additional metadata; nested
and typed by their profile.
Open Community effort
http://www.researchobject.org/ro-crate/
Tooling
https://www.researchobject.org/ro-crate/implementations.html
User facing
Infrastructure facing
Software libraries
https://www.npmjs.com/package/ro-crate
https://github.com/ResearchObject/ro-crate-ruby
https://pypi.org/project/rocrate/
https://uts-eresearch.github.io/describo/
Cultural Heritage: A data curation service for
endangered languages: 500,000 files in
28,624 items and 574 collections
long term preservation and
accessibility of research data objects
https://arkisto-platform.github.io/
[Marco La Rosa, Peter Sefton]
Scalable verified
collections of references
Processing big genomic & clinical data
distributed over multiple locations
NIH Data Commons
[Chard, et al 2016]
https://doi.org/10.1109/BigData.2016.7840618
minids
Retain and archive processed datasets
Reference and transfer large data on demand
Controlled access to sensitive data
[Kesselman, Foster]
minids minids
13 EU Life Science Research Infrastructures
Data and Method Thematic Commons
Sharing data, tools and workflows in
the cloud
Workflow + data + provenance interchange, stewardship,
recording dependencies -> portability &
reuse/reproducibility
https://www.eosc-life.eu/
https://workflowhub.eu/workflows/98
RO-Crate as interchange format
https://workflowhub.eu/
Profile
Same conceptual workflow;
multiple executable flavours
for different workflow engines
and specific use-cases
Embedding into Infrastructures & Standards
EGI-ACE data spaces for
Earth Science researchers
FAIR and GDPR compliant data
storage and sharing fabric Science
Mesh using Cloud Services for
Synchronization and Sharing
exchange between
genomics platform and
repository.
standardise and share
analyses generated
from genome
sequencing.
IEEE 2791-2020
Supporting the Research Life Cycle
Exchange &
Import/Export
Report & Archive
Share & Access
Reuse & Reproduce
Living Objects
Describe the objects
as they are being
created using open
standards and tools
Release the objects
as they are created,
updated and used
Figure: RDMKit, https://rdmkit.elixir-europe.org/
Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
Machine-processable
Standards
Low tech
Graceful degradation
Commodity tooling
Incremental
Multi-platform
Technology Independent
Keep it
simple
E X A M P L E S
Developer
friendliness
Just enough complexity / standards
• sufficient extra benefits from what already
exists…without compromising developer entry-level
experience so they do their own thing
Just Enough Linked Data Just In Time
• simplifications instead of generalizations
Retain Linked Data benefits
• querying, graph stores, vocabularies, clickable URIs)
Plus the developer needs
• documentation, examples, libraries, tools,
community …
Limited flexibility frees up developers
Familiarity is important for uptake
From FAIR data to FAIR Digital Objects
FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units:
https://doi.org/10.3390/publications8020021
https://op.europa.eu/en/publication-detail/-/publication/d787ea54-6a87-11eb-aeb5-01aa75ed71a1/language-en/format-PDF/source-190308283
FAIR Digital Objects
Actionable knowledge unit
Digital butterfly – digital twins
Bags of references
courtesy Dimitris Koureas
Coordinator DiSSCo EU Research
Infrastructure
Specimen object image
courtesy of Alex Hardisty
[Hardisty et al, 2020]
Specimen Data Refinery
Workflows to Digitise Natural History Specimens
FAIR Digital Objects -> Packaged + Actionable
FAIR Digital Object
Framework
Open Digital Specimen
Workflow Infrastructure
RO-Crate
+ Same conceptual workflow;
multiple executable flavours
+
Adapted from
https://doi.org/10.3390/publications8020021
FAIR Digital Object
Adapted from
https://doi.org/10.3390/publications8020021
RO-Crate as FAIR Digital Object
https://w3id.org/
Data Entity
Contextual Entity
RO-Crate
metadata file
RO-Crate
http://schema.org/Dataset
RO-Crate model as UML
schema.org/hasPart
schema.org/about
(describes)
https://www.researchobject.org/ro-crate/1.1/structure.html
A valid RO-Crate JSON-LD graph MUST describe:
1.The RO-Crate Metadata File Descriptor
2.The Root Data Entity
3.Zero or more Data Entities
4.Zero or more Contextual Entities
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
RO-Crate
Root
Data Entity
«schema.org/Dataset»
RO-Crate
Metadata
File
«schema.org/CreativeWork»
«schema.org/Organization»
«schema.org/Place»
«schema.org/IndividualProduct»
«schema.org/Person»
«bioschemas.org/ComputationalWorkflow»
«schema.org/MediaObject»
«schema.org/Dataset» «schema.org/ImageObject»
«schema.org/CreativeWork»
Data
Entities
Data
Entities
«schema.org/Thing»
Contextual
Entities
Contextual
Entities
(is-a)
subClassOf
subClassOf
subClassOf
(describes)
schema.org/mentions
{ "@id": "cp7glop.ai",
"@type": "File",
"name": "Diagram showing trend to increase",
…
},
…
{ "@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
"about": { "@id": "./" }
}
{ "@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
Collection
RO-Crate metadata file descriptor
RO-Crate root dataset
..aggregates Data entities
..described w/ contextual entities
{ "@id": "./",
"identifier": "https://doi.org/10.5281/zenodo.1009240",
"@type": "Dataset",
"hasPart": [
{ "@id": "cp7glop.ai" },
{ "@id": "lots_of_little_files/" },
{ "@id": "communities-2018.csv" },
{ "@id": "https://doi.org/10.4225/59/59672c09f4a4b" },
{ "@id": "SciDataCon Presentations/AAA_Pilot_Project_Abstract.html" }
],
"author": { "@id": "https://orcid.org/0000-0002-8367-6908" },
"publisher": { "@id": "https://ror.org/03f0f6041" },
"citation": { "@id": "https://doi.org/10.1109/TCYB.2014.2386282"},
"name": "Presentation of user survey 2018"
},
Flat list of metadata per entity
JSON-LD preamble
"hasPart": [
{ "@id": "cp7glop.ai" },
{ "@id": "lots_of_little_files/" },
{ "@id": "communities-2018.csv" },
{ "@id": "https://doi.org/10.4225/59/59672c09f4a4b" },
{ "@id": "SciDataCon-Presentations/AAA_Pilot_Abstract.html"}
],
Data and Contextual entities
described within RO-Crate Metadata File
Base vocabulary & types: schema.org
Cross-references to further contextual entities
RO-Crate principle:
Reuse existing PIDs and URLs
..but always describe entities which lack a
human-readable resolution
Metadata
{
"@id": "https://orcid.org/0000-0002-8367-6908",
"@type": "Person",
"affiliation": { "@id": "https://ror.org/03f0f6041" },
"name": "J. Xuan"
}
{
"@id": "https://ror.org/03f0f6041",
"@type": "Organization",
"name": "University of Technology Sydney",
"url": "https://www.uts.edu.au/"
}
{
"@id": "figure.png",
"@type": ["File", "ImageObject"],
"name": "XXL-CT-scan of an XXL Tyrannosaurus rex skull",
"identifier": "https://doi.org/10.5281/zenodo.3479743",
"author": {"@id": "https://orcid.org/0000-0002-8367-6908"},
"encodingFormat": "image/png"
}
No need to put all the metadata in RO-Crate!
31
https://biocompute-objects.github.io/bco-ro-crate/
 Specification provide details on how to extend RO-Crate formally or informally
 However, in many cases a domain-specific format for metadata may already exist
  Co-exist as a side-car metadata file within the Research Object
 Example: BioCompute Object has its own JSON format, typed & indicated from RO-Crate
Persistent IDs
Gradual ascent towards FAIR
Base line: Relative paths from RO-Crate Metadata File
Use cases: Describing dataset on desktop
Ad-hoc web-hosting (e.g. GitHub pages)
Institutional archives (e.g. Oxford Common File Layout)
Reuse existing PIDs and URLs
Use cases: Large data, not a file (e.g. database), reference datasets
Cite/reference existing resources (e.g. via identifiers.org)
Distinguish and crosslink contextual entities
Make paths absolute, using location-independent PIDs
(UUIDs, Naming Things with Hashes, ARCP)
Use cases: Found RO-Crate “in the wild”, ZIP archives, workflow outputs
Assign PIDs to RO-Crate (and its entities)
Use cases: Long-term availability, citations, permalinks
Bit Sequences
Base line: Files on disk within RO-Crate Root folder + URLs
Downloadable Web resources listed w/ content size and access time.
Packaging/archiving RO-Crate Root folder:
 BagIt (RFC8493): Manifest of all (local) files and their checksums
Use case: Ensure all files are transferred/archived
 BDBag: Include external files by ARK/MinID PIDs
Use case: “thin” RO-Crate, Big Data, shared immutable files
 OCFL: Archival storage with revision tracking, infrequent changes
 Git LFS: Tracked frequent changes, collaborative editing
33
Repository
Base line: zip/tar download
 e.g. “supplementary data”, GitHub release
Export from web platforms
 e.g. workflowhub.eu, Galaxy workflow system
Deposit in general/institutional
repositories
 e.g. Zenodo, Mendeley Data
Deposit in domain-specific
repositories
 e.g. GBIF
Deposit individual files
 e.g. ARK, S3, B2SHARE
Deposit metadata
 e.g. WikiData, nanopubs
What can FDO learn from RO-Crate?
Not everything is known before hand
• Existing PID metadata not always relevant
• Allow contextualized metadata
• Types not always known in advance
• Allow “casting” or reinterpretation
• Operations not always known in advance
• Allow open-ended generation and use
Use wheels already invented
• Fit into researchers’ existing working practices and
familiar technologies.
• Aim for gradual improvements.
• Reuse existing technology
• .. But only when not too complex
• Reuse existing PID infrastructure,
including URLs
• Human consumers recognize and click hyperlinks
• Build on existing metadata standards
• Which is both simple and extendable
Data and Contextual entities are equally
important
Remember people … especially Developers
• Keep human consumption in focus
• Ensure metadata is easily rendered and edited
• Provide Best Practice guidance
• Firm, but not too restrictive
• Developer producer and consumer friendly
• Rather than academically elegant
What can RO-Crate learn from FDO?
Provide stronger guidance on PID and availability
 Recommend deposit infrastructure for end-users and framework developers
 Tools to assist – e.g. generate Zenodo Datacite metadata from RO-Crate
Provide stronger typing of RO-Crates and its content
 Profiles as first approach
Expose potential operations on an RO-Crate
 Build general RO-Crate services, e.g. index
Turtles all the way down!
 Document better how RO-Crates can/should be nested
 How to choose granularity of RO-Crates?
 Tools to liberate/reuse an entity from a single RO-Crate
RO-Crate as part of the FDO ecosystem
A bag of references + metadata
Framework for actionable objects
RO-Crates enable active operations for FDOs
FDO offers additional infrastructure and
practice
A bag of references + metadata
A metadata framework for FDO
Developer friendly practical and web-native
implementations
Community
Infrastructure applications
Synthesys+ Project gives a concrete
use case for practically join the two up.
https://about.workflowhub.eu/
Bi-weekly calls &
Slack
Bi-weekly calls &
GitHub
Join us!
https://www.researchobject.org/
ro-crate/
Acknowledgements
 RO-Crate Community
https://www.researchobject.org/ro-crate/community
 Workflow Hub Club
https://about.workflowhub.eu/acknowledgements/
 Funding:
 H2020-INFRAEOSC-2018-2 824087
 H2020-INFRAEDI-2018-1 823830
 H2020-INFRAIA-2017-1 730976
 H2020-INFRADEV-2019-2 871118
 H2020-INFRAIA-2018-1 823827
40
FAIR Principles for Digital Research Objects
FAIR all the way down
Unbounded FAIR
Distributed FAIR
Living FAIR
Analogous to FAIR Software
FAIR RO-Crate is a practical start
Fig from EOSC Interoperability Framework

More Related Content

PPT
RDF and OWL
PDF
Enterprise Knowledge Graph
PPTX
The future of FAIR
PPTX
ADF Mapping Data Flows Training Slides V1
PDF
Modern Data Challenges require Modern Graph Technology
PPTX
PPTX
Controlled vocabularies and ontologies in Dataverse data repository
 
PDF
[XConf Brasil 2020] Data mesh
RDF and OWL
Enterprise Knowledge Graph
The future of FAIR
ADF Mapping Data Flows Training Slides V1
Modern Data Challenges require Modern Graph Technology
Controlled vocabularies and ontologies in Dataverse data repository
 
[XConf Brasil 2020] Data mesh

What's hot (20)

PPTX
The semantic web
 
PDF
Debunking some “RDF vs. Property Graph” Alternative Facts
PPTX
Introduction to Open Science and EOSC
PDF
Developing a Knowledge Graph of your Competency, Skills, and Knowledge at NASA
PDF
Introduction to Knowledge Graphs for Information Architects.pdf
PDF
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
PDF
Data Quality
PDF
Connected Intelligence
PPTX
FAIRy stories: the FAIR Data principles in theory and in practice
PDF
Information Storage and Retrieval : A Case Study
PDF
Framework for understanding quantum computing use cases from a multidisciplin...
PDF
Getting Started with Knowledge Graphs
PPTX
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
PDF
Knowledge Graphs as a Pillar to AI
PDF
Neo4j Webinar: Graphs in banking
PDF
Vector Databases - A Technical Primer.pdf
PPTX
Data Lake Overview
PPTX
Intro to Data Management Plans
PPT
Web ontology language (owl)
PDF
CS6010 Social Network Analysis Unit III
The semantic web
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Introduction to Open Science and EOSC
Developing a Knowledge Graph of your Competency, Skills, and Knowledge at NASA
Introduction to Knowledge Graphs for Information Architects.pdf
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Data Quality
Connected Intelligence
FAIRy stories: the FAIR Data principles in theory and in practice
Information Storage and Retrieval : A Case Study
Framework for understanding quantum computing use cases from a multidisciplin...
Getting Started with Knowledge Graphs
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
Knowledge Graphs as a Pillar to AI
Neo4j Webinar: Graphs in banking
Vector Databases - A Technical Primer.pdf
Data Lake Overview
Intro to Data Management Plans
Web ontology language (owl)
CS6010 Social Network Analysis Unit III
Ad

Similar to RO-Crate: A framework for packaging research products into FAIR Research Objects (20)

PPTX
RO-Crate: packaging metadata love notes into FAIR Digital Objects
PPTX
FAIR Workflows and Research Objects get a Workout
PPTX
The swings and roundabouts of a decade of fun and games with Research Objects
PPTX
Research Objects: more than the sum of the parts
PPTX
FDO as building block for digitization technology stacks
PPTX
FAIRer Research
PPTX
The Research Object Initiative: Frameworks and Use Cases
PDF
Research Shared: researchobject.org
PPTX
FAIR play?
PDF
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
PDF
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
PPTX
FAIRy Stories
PDF
A Clean Slate?
PPTX
IBC FAIR Data Prototype Implementation slideshow
PPTX
Research Object Community Update
PPTX
Achieving FAIR from a repository perspective
PPTX
Metadata modeling principles for life.pptx
ODP
2011 03-provenance-workshop-edingurgh
PPTX
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
PPTX
Scientific data management from the lab to the web
RO-Crate: packaging metadata love notes into FAIR Digital Objects
FAIR Workflows and Research Objects get a Workout
The swings and roundabouts of a decade of fun and games with Research Objects
Research Objects: more than the sum of the parts
FDO as building block for digitization technology stacks
FAIRer Research
The Research Object Initiative: Frameworks and Use Cases
Research Shared: researchobject.org
FAIR play?
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIRy Stories
A Clean Slate?
IBC FAIR Data Prototype Implementation slideshow
Research Object Community Update
Achieving FAIR from a repository perspective
Metadata modeling principles for life.pptx
2011 03-provenance-workshop-edingurgh
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Scientific data management from the lab to the web
Ad

More from Carole Goble (20)

PPTX
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
PPTX
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
PPTX
Research Software Sustainability takes a Village
PPTX
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
PPTX
FAIR Computational Workflows
PPTX
Open Research: Manchester leading and learning
PPTX
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
PPTX
FAIR Computational Workflows
PPTX
FAIR Computational Workflows
PPTX
EOSC-Life Workflow Collaboratory
PPTX
FAIR Computational Workflows
PPTX
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
PPTX
FAIR Computational Workflows
PPTX
How are we Faring with FAIR? (and what FAIR is not)
PPTX
What is Reproducibility? The R* brouhaha and how Research Objects can help
PPTX
FAIR History and the Future
PPTX
ELIXIR UK Node presentation to the ELIXIR Board
PPTX
FAIRy stories: tales from building the FAIR Research Commons
PPTX
Let’s go on a FAIR safari!
PPTX
Reproducible Research: how could Research Objects help
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Research Software Sustainability takes a Village
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
FAIR Computational Workflows
Open Research: Manchester leading and learning
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
FAIR Computational Workflows
FAIR Computational Workflows
EOSC-Life Workflow Collaboratory
FAIR Computational Workflows
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Computational Workflows
How are we Faring with FAIR? (and what FAIR is not)
What is Reproducibility? The R* brouhaha and how Research Objects can help
FAIR History and the Future
ELIXIR UK Node presentation to the ELIXIR Board
FAIRy stories: tales from building the FAIR Research Commons
Let’s go on a FAIR safari!
Reproducible Research: how could Research Objects help

Recently uploaded (20)

PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Sciences of Europe No 170 (2025)
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPT
protein biochemistry.ppt for university classes
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Comparative Structure of Integument in Vertebrates.pptx
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
. Radiology Case Scenariosssssssssssssss
PPT
Chemical bonding and molecular structure
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Sciences of Europe No 170 (2025)
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
protein biochemistry.ppt for university classes
Phytochemical Investigation of Miliusa longipes.pdf
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Taita Taveta Laboratory Technician Workshop Presentation.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Cell Membrane: Structure, Composition & Functions
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Comparative Structure of Integument in Vertebrates.pptx
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Derivatives of integument scales, beaks, horns,.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
. Radiology Case Scenariosssssssssssssss
Chemical bonding and molecular structure
microscope-Lecturecjchchchchcuvuvhc.pptx

RO-Crate: A framework for packaging research products into FAIR Research Objects

  • 1. RO-Crate: A framework for packaging research products into FAIR Research Objects Carole Goble The University of Manchester ELIXIR-UK https://orcid.org/0000-0003-1219-2137 @caroleannegoble This work is licensed under a Creative Commons Attribution 4.0 International License Stian Soiland-Reyes The University of Manchester BioExcel Centre of Excellence The University of Amsterdam https://orcid.org/0000-0001-9842-9718 @soilandreyes RDA IG Data Fabric, FAIR Digital Object, 2021-02-25
  • 2. tl;dr Web standards-based metadata framework for bundling resources with their context into citable reproducible packages Machine actionable Metadata + Identifiers + Web protocols => FAIR  What andWhy  Examples  How andTools  Alignment with FDOF Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
  • 4. Many Objects are the Outcomes of Research Each object has its own metadata and repositories All are first class citizens and are required to make research FAIR+R
  • 5. De-contextualised Static, Fragmented Lost Semantic linking Rebuild to be reproducible Contextualised Active, Unified Semantic linking Buried in a PDF figure Scattered Reporting and Reading
  • 6. integrated view over fragmented resources using PIDs and metadata Encapsulated content and references to external resources The RO package has its own metadata, can be registered and deposited in its own right, unpackaged and accessed, activated and reproduced if appropriate
  • 7. self-describing, chiefly metadata, objects RO Metadata file Structured metadata about the RO and content files links to web resources RO Content Archive file format / packaging system BagIt, zip OCFL, Git type id description datePublished … directories license author organisation
  • 8. self-describing, chiefly metadata, objects RO Metadata file Structured metadata about the RO and content image file links to web resources RO Content Archive file format / packaging system directory of data type, id description datePublished creator size format … https://zenodo.org/record/3541888 https://github.com/o/script type id description datePublished … license author organisation Linked Data approach
  • 9. Self-describing, chiefly metadata, objects Strict Structure, Open ended content How do we describe the metadata? • PIDs + JSON-LD + Schema.org descriptors • Opinionated profile of schema.org • Linked Data by Stealth: JSON with gradual path to extensibility with LD – e.g. ad-hoc terms • Example-driven documentation How can I add additional metadata? • Schema.org, domain ontologies How do I define a checklist of what is expected to be in a type of RO? • RO-Crate Profiles http://www.researchobject.org/ro-crate/ Standard Web Mark-up
  • 10. https://w3id.org/ro/crate/1.1 JSON with a flat list of: - Data Entities (e.g. files, dirs, DBs) - Contextual Entities (e.g. people) Objects reference each other by @id
  • 11. Summary: RO-Crate in a nutshell Practical lightweight approach to packaging research data entities (any object) with metadata Aggregate files and/or any URI-addressable content, with contextual information to aid decisions about re-use: Who What When Where Why How. Web Native Machine readable. Human readable. Search engine friendly. Familiar. Extensible and Incremental: add additional metadata; nested and typed by their profile. Open Community effort http://www.researchobject.org/ro-crate/
  • 12. Tooling https://www.researchobject.org/ro-crate/implementations.html User facing Infrastructure facing Software libraries https://www.npmjs.com/package/ro-crate https://github.com/ResearchObject/ro-crate-ruby https://pypi.org/project/rocrate/ https://uts-eresearch.github.io/describo/
  • 13. Cultural Heritage: A data curation service for endangered languages: 500,000 files in 28,624 items and 574 collections long term preservation and accessibility of research data objects https://arkisto-platform.github.io/ [Marco La Rosa, Peter Sefton]
  • 14. Scalable verified collections of references Processing big genomic & clinical data distributed over multiple locations NIH Data Commons [Chard, et al 2016] https://doi.org/10.1109/BigData.2016.7840618 minids Retain and archive processed datasets Reference and transfer large data on demand Controlled access to sensitive data [Kesselman, Foster] minids minids
  • 15. 13 EU Life Science Research Infrastructures Data and Method Thematic Commons Sharing data, tools and workflows in the cloud Workflow + data + provenance interchange, stewardship, recording dependencies -> portability & reuse/reproducibility https://www.eosc-life.eu/
  • 17. https://workflowhub.eu/ Profile Same conceptual workflow; multiple executable flavours for different workflow engines and specific use-cases
  • 18. Embedding into Infrastructures & Standards EGI-ACE data spaces for Earth Science researchers FAIR and GDPR compliant data storage and sharing fabric Science Mesh using Cloud Services for Synchronization and Sharing exchange between genomics platform and repository. standardise and share analyses generated from genome sequencing. IEEE 2791-2020
  • 19. Supporting the Research Life Cycle Exchange & Import/Export Report & Archive Share & Access Reuse & Reproduce Living Objects Describe the objects as they are being created using open standards and tools Release the objects as they are created, updated and used Figure: RDMKit, https://rdmkit.elixir-europe.org/ Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
  • 20. Machine-processable Standards Low tech Graceful degradation Commodity tooling Incremental Multi-platform Technology Independent Keep it simple E X A M P L E S Developer friendliness Just enough complexity / standards • sufficient extra benefits from what already exists…without compromising developer entry-level experience so they do their own thing Just Enough Linked Data Just In Time • simplifications instead of generalizations Retain Linked Data benefits • querying, graph stores, vocabularies, clickable URIs) Plus the developer needs • documentation, examples, libraries, tools, community … Limited flexibility frees up developers Familiarity is important for uptake
  • 21. From FAIR data to FAIR Digital Objects FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units: https://doi.org/10.3390/publications8020021 https://op.europa.eu/en/publication-detail/-/publication/d787ea54-6a87-11eb-aeb5-01aa75ed71a1/language-en/format-PDF/source-190308283
  • 22. FAIR Digital Objects Actionable knowledge unit Digital butterfly – digital twins Bags of references courtesy Dimitris Koureas Coordinator DiSSCo EU Research Infrastructure Specimen object image courtesy of Alex Hardisty [Hardisty et al, 2020]
  • 23. Specimen Data Refinery Workflows to Digitise Natural History Specimens FAIR Digital Objects -> Packaged + Actionable FAIR Digital Object Framework Open Digital Specimen Workflow Infrastructure RO-Crate + Same conceptual workflow; multiple executable flavours +
  • 25. Adapted from https://doi.org/10.3390/publications8020021 RO-Crate as FAIR Digital Object https://w3id.org/ Data Entity Contextual Entity RO-Crate metadata file RO-Crate http://schema.org/Dataset
  • 26. RO-Crate model as UML schema.org/hasPart schema.org/about (describes) https://www.researchobject.org/ro-crate/1.1/structure.html A valid RO-Crate JSON-LD graph MUST describe: 1.The RO-Crate Metadata File Descriptor 2.The Root Data Entity 3.Zero or more Data Entities 4.Zero or more Contextual Entities "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"}, RO-Crate Root Data Entity «schema.org/Dataset» RO-Crate Metadata File «schema.org/CreativeWork» «schema.org/Organization» «schema.org/Place» «schema.org/IndividualProduct» «schema.org/Person» «bioschemas.org/ComputationalWorkflow» «schema.org/MediaObject» «schema.org/Dataset» «schema.org/ImageObject» «schema.org/CreativeWork» Data Entities Data Entities «schema.org/Thing» Contextual Entities Contextual Entities (is-a) subClassOf subClassOf subClassOf (describes) schema.org/mentions
  • 27. { "@id": "cp7glop.ai", "@type": "File", "name": "Diagram showing trend to increase", … }, … { "@type": "CreativeWork", "@id": "ro-crate-metadata.json", "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"}, "about": { "@id": "./" } } { "@context": "https://w3id.org/ro/crate/1.1/context", "@graph": [ Collection RO-Crate metadata file descriptor RO-Crate root dataset ..aggregates Data entities ..described w/ contextual entities { "@id": "./", "identifier": "https://doi.org/10.5281/zenodo.1009240", "@type": "Dataset", "hasPart": [ { "@id": "cp7glop.ai" }, { "@id": "lots_of_little_files/" }, { "@id": "communities-2018.csv" }, { "@id": "https://doi.org/10.4225/59/59672c09f4a4b" }, { "@id": "SciDataCon Presentations/AAA_Pilot_Project_Abstract.html" } ], "author": { "@id": "https://orcid.org/0000-0002-8367-6908" }, "publisher": { "@id": "https://ror.org/03f0f6041" }, "citation": { "@id": "https://doi.org/10.1109/TCYB.2014.2386282"}, "name": "Presentation of user survey 2018" }, Flat list of metadata per entity JSON-LD preamble "hasPart": [ { "@id": "cp7glop.ai" }, { "@id": "lots_of_little_files/" }, { "@id": "communities-2018.csv" }, { "@id": "https://doi.org/10.4225/59/59672c09f4a4b" }, { "@id": "SciDataCon-Presentations/AAA_Pilot_Abstract.html"} ],
  • 28. Data and Contextual entities described within RO-Crate Metadata File Base vocabulary & types: schema.org Cross-references to further contextual entities RO-Crate principle: Reuse existing PIDs and URLs ..but always describe entities which lack a human-readable resolution Metadata { "@id": "https://orcid.org/0000-0002-8367-6908", "@type": "Person", "affiliation": { "@id": "https://ror.org/03f0f6041" }, "name": "J. Xuan" } { "@id": "https://ror.org/03f0f6041", "@type": "Organization", "name": "University of Technology Sydney", "url": "https://www.uts.edu.au/" } { "@id": "figure.png", "@type": ["File", "ImageObject"], "name": "XXL-CT-scan of an XXL Tyrannosaurus rex skull", "identifier": "https://doi.org/10.5281/zenodo.3479743", "author": {"@id": "https://orcid.org/0000-0002-8367-6908"}, "encodingFormat": "image/png" }
  • 29. No need to put all the metadata in RO-Crate! 31 https://biocompute-objects.github.io/bco-ro-crate/  Specification provide details on how to extend RO-Crate formally or informally  However, in many cases a domain-specific format for metadata may already exist   Co-exist as a side-car metadata file within the Research Object  Example: BioCompute Object has its own JSON format, typed & indicated from RO-Crate
  • 30. Persistent IDs Gradual ascent towards FAIR Base line: Relative paths from RO-Crate Metadata File Use cases: Describing dataset on desktop Ad-hoc web-hosting (e.g. GitHub pages) Institutional archives (e.g. Oxford Common File Layout) Reuse existing PIDs and URLs Use cases: Large data, not a file (e.g. database), reference datasets Cite/reference existing resources (e.g. via identifiers.org) Distinguish and crosslink contextual entities Make paths absolute, using location-independent PIDs (UUIDs, Naming Things with Hashes, ARCP) Use cases: Found RO-Crate “in the wild”, ZIP archives, workflow outputs Assign PIDs to RO-Crate (and its entities) Use cases: Long-term availability, citations, permalinks
  • 31. Bit Sequences Base line: Files on disk within RO-Crate Root folder + URLs Downloadable Web resources listed w/ content size and access time. Packaging/archiving RO-Crate Root folder:  BagIt (RFC8493): Manifest of all (local) files and their checksums Use case: Ensure all files are transferred/archived  BDBag: Include external files by ARK/MinID PIDs Use case: “thin” RO-Crate, Big Data, shared immutable files  OCFL: Archival storage with revision tracking, infrequent changes  Git LFS: Tracked frequent changes, collaborative editing 33
  • 32. Repository Base line: zip/tar download  e.g. “supplementary data”, GitHub release Export from web platforms  e.g. workflowhub.eu, Galaxy workflow system Deposit in general/institutional repositories  e.g. Zenodo, Mendeley Data Deposit in domain-specific repositories  e.g. GBIF Deposit individual files  e.g. ARK, S3, B2SHARE Deposit metadata  e.g. WikiData, nanopubs
  • 33. What can FDO learn from RO-Crate? Not everything is known before hand • Existing PID metadata not always relevant • Allow contextualized metadata • Types not always known in advance • Allow “casting” or reinterpretation • Operations not always known in advance • Allow open-ended generation and use Use wheels already invented • Fit into researchers’ existing working practices and familiar technologies. • Aim for gradual improvements. • Reuse existing technology • .. But only when not too complex • Reuse existing PID infrastructure, including URLs • Human consumers recognize and click hyperlinks • Build on existing metadata standards • Which is both simple and extendable Data and Contextual entities are equally important Remember people … especially Developers • Keep human consumption in focus • Ensure metadata is easily rendered and edited • Provide Best Practice guidance • Firm, but not too restrictive • Developer producer and consumer friendly • Rather than academically elegant
  • 34. What can RO-Crate learn from FDO? Provide stronger guidance on PID and availability  Recommend deposit infrastructure for end-users and framework developers  Tools to assist – e.g. generate Zenodo Datacite metadata from RO-Crate Provide stronger typing of RO-Crates and its content  Profiles as first approach Expose potential operations on an RO-Crate  Build general RO-Crate services, e.g. index Turtles all the way down!  Document better how RO-Crates can/should be nested  How to choose granularity of RO-Crates?  Tools to liberate/reuse an entity from a single RO-Crate
  • 35. RO-Crate as part of the FDO ecosystem A bag of references + metadata Framework for actionable objects RO-Crates enable active operations for FDOs FDO offers additional infrastructure and practice A bag of references + metadata A metadata framework for FDO Developer friendly practical and web-native implementations Community Infrastructure applications Synthesys+ Project gives a concrete use case for practically join the two up.
  • 36. https://about.workflowhub.eu/ Bi-weekly calls & Slack Bi-weekly calls & GitHub Join us! https://www.researchobject.org/ ro-crate/
  • 37. Acknowledgements  RO-Crate Community https://www.researchobject.org/ro-crate/community  Workflow Hub Club https://about.workflowhub.eu/acknowledgements/  Funding:  H2020-INFRAEOSC-2018-2 824087  H2020-INFRAEDI-2018-1 823830  H2020-INFRAIA-2017-1 730976  H2020-INFRADEV-2019-2 871118  H2020-INFRAIA-2018-1 823827 40
  • 38. FAIR Principles for Digital Research Objects FAIR all the way down Unbounded FAIR Distributed FAIR Living FAIR Analogous to FAIR Software FAIR RO-Crate is a practical start Fig from EOSC Interoperability Framework

Editor's Notes

  • #7: Overcome fragmentation
  • #8: Identifiers to locate things Organisation to structure & link things Annotations about the things @type: MUST be Dataset @id: MUST end with / and SHOULD be the string ./ name: SHOULD identify the dataset to humans well enough to disambiguate it from other RO-Crates description: SHOULD further elaborate on the name to provide a summary of the context in which the dataset is important. datePublished: MUST be a string in ISO 8601 date format and SHOULD be specified to at least the precision of a day, MAY be a timestamp down to the millisecond. license: SHOULD link to a Contextual Entity in the RO-Crate Metadata File with a name and description. MAY have a URI (eg for Creative Commons or Open Source licenses). MAY, if necessary be a textual description of how the RO-Crate may be used. This document specifies a method, known as RO-Crate (Research Object Crate), of organizing file-based data with associated metadata, using linked data principles, in both human and machine readable formats, with the ability to include additional domain-specific metadata. The core of RO-Crate is a JSON-LD file, the RO-Crate Metadata File, named ro-crate-metadata.json. This file contains structured metadata about the dataset as a whole (the Root Data Entity) and, optionally, about some or all of its files. This provides a simple way to, for example, assert the authors (e.g. people, organizations) of the RO-Crate or one its files, or to capture more complex provenance for files, such as how they were created using software and equipment. While providing the formal specification for RO-Crate, this document also aims to be a practical guide for software authors to create tools for generating and consuming research data packages, with explanation by examples. At the basic level, an RO-Crate is a collection of files and resources represented as a Schema.org Dataset, that together form a meaningful unit for the purposes of communication, citation, distribution, preservation, etc. The RO-Crate Metadata File describes the RO-Crate, and MUST be stored in the RO-Crate Root. While RO-Crate is well catered for describing a Dataset as files and relevant metadata that are contained by the RO-Crate in the sense of living within the same root directory, RO-Crates can also reference external resources which are stored or accessed separately, via absolute URIs. This is particularly recommended where some resources cannot be co-hosted for practical or legal reasons, or if the RO-Crate itself is primarily web-based. It is important to note that the RO-Crate Metadata File is not an exhaustive manifest or inventory, that is, it does not necessarily list or describe all files in the package. Rather it is focused on providing sufficient amount of metadata to understand and use the content, and is designed to be compatible with existing and future approaches that do have full inventories / manifest and integrity checks, e.g. by using checksums, such as BagIt and Oxford Common File Layout OCFL Objects. The intention is that RO-Crates can work well with a variety of archive file formats, e.g. tar, zip, etc., and approaches to capturing file manifests and file fixity, such as BagIt, OCFL and git (see also appendix Combining with other packaging schemes). An RO-Crate can also be hosted on the web or mainly refer to web resources, although extra care to ensure persistence and consistency should be taken for archiving such RO-Crates.
  • #9: Data might @type: MUST be Dataset @id: MUST end with / and SHOULD be the string ./ name: SHOULD identify the dataset to humans well enough to disambiguate it from other RO-Crates description: SHOULD further elaborate on the name to provide a summary of the context in which the dataset is important. datePublished: MUST be a string in ISO 8601 date format and SHOULD be specified to at least the precision of a day, MAY be a timestamp down to the millisecond. license: SHOULD link to a Contextual Entity in the RO-Crate Metadata File with a name and description. MAY have a URI (eg for Creative Commons or Open Source licenses). MAY, if necessary be a textual description of how the RO-Crate may be used. This document specifies a method, known as RO-Crate (Research Object Crate), of organizing file-based data with associated metadata, using linked data principles, in both human and machine readable formats, with the ability to include additional domain-specific metadata. The core of RO-Crate is a JSON-LD file, the RO-Crate Metadata File, named ro-crate-metadata.json. This file contains structured metadata about the dataset as a whole (the Root Data Entity) and, optionally, about some or all of its files. This provides a simple way to, for example, assert the authors (e.g. people, organizations) of the RO-Crate or one its files, or to capture more complex provenance for files, such as how they were created using software and equipment. While providing the formal specification for RO-Crate, this document also aims to be a practical guide for software authors to create tools for generating and consuming research data packages, with explanation by examples. At the basic level, an RO-Crate is a collection of files and resources represented as a Schema.org Dataset, that together form a meaningful unit for the purposes of communication, citation, distribution, preservation, etc. The RO-Crate Metadata File describes the RO-Crate, and MUST be stored in the RO-Crate Root. While RO-Crate is well catered for describing a Dataset as files and relevant metadata that are contained by the RO-Crate in the sense of living within the same root directory, RO-Crates can also reference external resources which are stored or accessed separately, via absolute URIs. This is particularly recommended where some resources cannot be co-hosted for practical or legal reasons, or if the RO-Crate itself is primarily web-based. It is important to note that the RO-Crate Metadata File is not an exhaustive manifest or inventory, that is, it does not necessarily list or describe all files in the package. Rather it is focused on providing sufficient amount of metadata to understand and use the content, and is designed to be compatible with existing and future approaches that do have full inventories / manifest and integrity checks, e.g. by using checksums, such as BagIt and Oxford Common File Layout OCFL Objects. The intention is that RO-Crates can work well with a variety of archive file formats, e.g. tar, zip, etc., and approaches to capturing file manifests and file fixity, such as BagIt, OCFL and git (see also appendix Combining with other packaging schemes). An RO-Crate can also be hosted on the web or mainly refer to web resources, although extra care to ensure persistence and consistency should be taken for archiving such RO-Crates.
  • #10: More flexible than XML Schemas More straightforward than OWL More flexible than XML Schemas More straightforward than OWL Linked Data “exotics” there for when the time is right if needed by the right people Portland Common Data Model for digital library & repository content Metadata is Flat JSON file Descriptions of the Objects Descriptions of the RO-Crate Descriptions relating the Objects human and machine readable formats, additional domain-specific metadata.
  • #15: PARADISEC and The University of Melbourne University of Technology Sydney PARADISEC (the Pacific And Regional Archive for Digital Sources in Endangered Cultures) has operated a data curation service for endangered languages for 18 years. Currently consisting of 500,000 files in 28,624 items and 574 collections, there is over 90TB of data under management. The repository, whilst still functional and fit for purpose, is showing signs of its age. Using the Arkisto Platform, the Modern PARADISEC project is updating the catalog whilst also enabling a path for smaller regional repositories to exist and to easily move their content across collections and institutions.
  • #16: NIH Data Commons transferring and archiving very large HTS datasets in a location-independent way keep the context of data content together when its scattered. Scalability
  • #20: ReproZip can automatically pack your research along with all necessary data files, libraries, environment variables and options into a self-contained bundle. https://www.reprozip.org/ One pressure is towards bundling objects within objects. There’s nothing in DRS for a structured manifest within a bundle. RO-Crate as interchange format and integration with Zenodo, using Describo Data Cubes for efficient and scalable structured data access and discovery Research Objects as the mechanism to manage scientific research activities and connect associated resources RO snapshot/releases published to EOSC repositories and scholarly communication platforms (e.g., B2Share, Zenodo
  • #23: We started in theory, but we work in practice ROs sit in the middle
  • #24: To be FAIR each digital object type has its own metadata requirements, and may have its own repositories and registries https://op.europa.eu/en/publication-detail/-/publication/d787ea54-6a87-11eb-aeb5-01aa75ed71a1/language-en/format-PDF/source-190308283 https://www.eoscsecretariat.eu/sites/default/files/eosc-interoperability-framework-v1.0.pdf Schwardmann (2020), Digital Objects – FAIR Digital Objects: Which Services Are Required? Data Science Journal EOSC Interoperability Framework Draft (2020) Hardisty A, et al (2020) Conceptual design blueprint for the DiSSCo digitization infrastructure RIO 6: e54280. DONA Digital Object Architecture Digital Object Interface Protocol (2018) https://fairdigitalobjectframework.org/
  • #25: [Hardisty A, et al (2020) Conceptual design blueprint for the DiSSCo digitization infrastructure RIO 6: e54280. How to make this practical!
  • #27: So let’s do a quick reminder about what are the key components of FAIR Digital Objects: Multiple levels of Digital objects: - A Persistent Identifier to locate the DO - A set of Operations to use the DO - Metadata about the DO – itself a DO - All of these represented as bit sequences that can be transferred or stored in repository - Finally a Collection is a type of DO that aggregates other DOs and of course has its own metadata
  • #28: For RO-Crate, how do we approach FDO? We have implementations for each of these concepts: The RO-Crate dataset itself is of course what we can call a Collection. It aggregates a set of data entities, but also contextual entities. I’ll come back to those. We describe the RO-crate and the entities in a metadata file We can assign persistent identiers to the RO-Crate and parts of it We can store the bit sequences, ideally with something like BagIt, to ensure completeness And deposit all of this in regular repositories like Zenodo or GitHub
  • #39: RO-Crate recommends how to structure metadata.  FDO RO-Crate profile What are the operations on RO-Crate metadata? Re-visit Wf4Ever Research Object APIs? (basically CRUD) RO-Crate as a Linked Data Platform (LDP) Container? Registration of RO-Crate types/profiles How can RO-Crate users use FDO to resolve PID/storage challenges?