SlideShare a Scribd company logo
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and
DANS Research Data Repositories
Slava Tykhonov; Jerry de Vries; Eko Indarto; Andrea Scharnhorst; Femmy Admiraal; Mike
Priddy (DANS-KNAW)
Presentation at ISKO Knowledge Organisation Research Observatory
24 Nov 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES
AND DOMAIN NEEDS
Content
- Introduction
- Who we are
- CMDI in the ‘wild’ - CLARIN data collections in EASY
- Creating FAIR metadata and semantic services - case of CMDI Pipeline
- Extract-Transform-Load (CMDI) metadata into Dataverse
- Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR
- Lessons learned and pointers
Introduction
DANS - Royal Netherlands Academy of Arts and
Sciences - research data expertise center - long-term
preservation - archive (www.easy.dans.knaw;
Dataverse.nl; Narcis.nl)
CLARIAH.nl Large Scale Infrastructure Project for
Humanities (CLARIN+DARIAH)
DARIAH: Digital Research Infrastructure for the Arts and
Humanities
CLARIN: Common Language Resources and
Technology Infrastructure
CMDI: Component MetaData Infrastructure: a
framework to describe and reuse metadata blueprints
SKOSMOS: Open source web-based SKOS browser
and publishing tool
Challenge
How to make the datasets from the CLARIN
community in the long-term archive discoverable in
the CLARIN infrastructure?
How to search ‘in data’ ? = How to achieve richer
indexing of metadata?
The problem - on a generic level
Vision: Semantic interoperability on the infrastructure level
We envision a situation where thousands of Dataverse instances (due to EOSC) on the
web can be simultaneously search for data.
The old dream of Federated search/Universal catalogue can only be realised if:
(1) Cross -walks; mapping across different metadata schemes are implemented
(2) In metadata schemes we seek for ways to enrich indexes with values from controlled
vocabularies
Standard response = standardisation and harmonisation = repository software, certain
metadata standards, or certain controlled vocabularies
New response = explore agile solutions (Proof of Concept) which can be implemented by
different communities (even smaller ones), so we keep variety and still enable integration.
The problem - on a concrete level
CMDI ‘in the wild’
Content
- Introduction
- Who we are
- CMDI in the ‘wild’ - CLARIN data collections in EASY
- Creating FAIR metadata and semantic services - case of CMDI Pipeline
- Extract-Transform-Load (CMDI) metadata into Dataverse
- Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR
- Lessons learned and pointers
Conceptual approach: Semantic interoperability on the
infrastructure level - building common solutions for everyone
Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6
“Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format -
following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of
metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field
storage architecture). This new API also allows for the update of terms metadata“.
External controlled vocabularies support is being developed by DANS in SSHOC project and
already integrated in Dataverse core in release 5.7.
Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/
Interfaces: http://github.com/gdcc/dataverse-external-vocab-support
Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
CMDI Pipeline
- Backbone of our pipeline: Extract-Transform-Load (CMDI) metadata into Dataverse
- One block relevant for semantic services: Mapping across metadata standards
- Another block: Look-up for values in controlled vocabulary registers - enrich indexing
SEMAF: A Proposal for a Flexible Semantic Mapping Framework
Proposal: https://zenodo.org/record/4651421#.YT9lyC8RpZI
POC: https://github.com/Dans-labs/semaf-poc
Coming close to the implementation
1. Use Data Catalog Vocabulary (DCAT) mappings for CMDI metadata fields
2. Simple Knowledge Organization System (SKOS) to model a thesauri-like
resources with simple skos:broader, skos:narrower and skos:related
properties
3. Load CMDI properties and attributes and build a Knowledge Graph out of all
elements
4. Enrich the Knowledge Graph with concept URIs from various controlled
vocabularies like Skosmos hosted or Wikidata
5. Use different format data-serialization formats suitable for the integration with
different systems. For example, json-ld suitable for Dataverse, turtle for Jena
Fuseki, RDF for LoD frameworks
Complexity of CMDI is unfolding
After the implementation
● Complexity in CMDI becomes more
visible
● Identify core concepts which can be
mapped to standard bibliographic
schemes as DCAT (red box)
● Possibility to match values of CMDI
concepts to other controlled
vocabularies (green box)
How does it look when implemented in Dataverse?
Every field can be linked to the appropriate controlled vocabularies in FAIR way!
Greater vision:
Dataverse metadata schemas ingested into a Knowledge Graph
Compound keyword field with SKOS
We use SKOS relationships to keep the
hierarchy and relationships between
metadata fields
Other Dataverse schemas: https://github.com/Dans-labs/semaf-client/tree/cmdi/schema
Once in a Knowledge graph: what can we do?
Pipeline managed to establish some relationships
to Wikidata concepts and automatically updated the
dataset with new conceptURIs!
The example of automatic enrichment with Wikidata
Content
- Introduction
- Who we are
- CMDI in the ‘wild’ - CLARIN data collections in EASY
- Creating FAIR metadata and semantic services - case of CMDI Pipeline
- Extract-Transform-Load (CMDI) metadata into Dataverse
- Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR
- Lessons learned and pointers
Lessons learned (I)
Scientific communities and archives have different perspectives on standardisation, and semantic services.
In research formalisation (including KOS, ontologies, any ‘model’) is a heuristic device, agile to new research
questions, and so intrinsically ‘not interoperable’. In other words, there is a difference between research needs
and information needs.
Lessons learned (II)
- We provided a solution for our CMDI problem - by creating a CLARIN
compatible Dataverse solution, which via an API can be harvested by the
CLARIN search service; we also created another perspective on the CDMI
‘challenge’
- We used Dataverse is a platform due to an open active community;
- The examples we showed you some of are results of a ‘Vision Lab’ - proof of
concepts - funded in projects as SSHOC, CLARIAH, EOSC
- The results are envisioned be implemented locally.
- But, in principle the solutions are platform agnostic.
Lessons learned (III)
- In the future, repositories might become nodes in a large searchable
knowledge graph and semantic links might enable pathways for
contextual/semantic search.
- Part of this future will be automatically supported semantic enrichment at the
local instantiations (automatic indexing in a net instead of in an index)
- Problem: keep provenance, authority (trust) - governance between those
(micro-service) providers need to be organised. What can we learn from
history?
References - pointers
CMDI exploration tool DANS CMDI converter github
CMDI properties frequency VLO top profiles
CMDI core metadata proposal Core metadata components design for use cases
DANS CMDI metadata generator CMDI metadata model published as TSV files
Convertor can extract and show the hierarchy of all fields
CLARIAH compliant Dataverse Docker module Dataverse Docker with CMDI metadata schema
Core metadata components design guidelines Guidelines link
Semantic Gateway as plugin app Dataverse gateway Semantic Gateway API
Dataverse metadata schema ingested into
Graph
https://github.com/Dans-labs/semaf-
client/tree/cmdi/schema
References - pointers
Dataverse 5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7
Semantic Gateway: https://github.com/Dans-labs/semantic-gateway
SSHOC task 5.2 http://github.com/SSHOC
SEMAF client https://github.com/Dans-labs/semaf-client
CMDI data model and namespaces: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from
CLARIN to Linked Open Data
Flexible Metadata Schemes for Research Data Repositories. / de Vries, Jerry; Tykhonov, Vyacheslav;
Scharnhorst, Andrea; Admiraal, Femmy; Indarto, Eko; Priddy, Mike.
2021. Abstract from CLARIN Annual Conference 2021. https://www.clarin.eu/content/programme-clarin-annual-
conference-2021
Questions?
Slava Tykhonov <vyacheslav.tykhonov@dans.knaw.nl>
Andrea Scharnhorst <andrea.scharnhorst@dans.knaw.nl>

More Related Content

PDF
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
PPTX
Metaverse for Dataverse
 
PPTX
External CV support in Dataverse 5.7
 
PPTX
5 years of Dataverse evolution
 
PPTX
CLARIN CMDI use case and flexible metadata schemes
 
PPTX
Controlled vocabularies and ontologies in Dataverse data repository
 
PPTX
Ontologies, controlled vocabularies and Dataverse
 
PPTX
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Metaverse for Dataverse
 
External CV support in Dataverse 5.7
 
5 years of Dataverse evolution
 
CLARIN CMDI use case and flexible metadata schemes
 
Controlled vocabularies and ontologies in Dataverse data repository
 
Ontologies, controlled vocabularies and Dataverse
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 

What's hot (20)

PPTX
CLARIN CMDI support in Dataverse
 
PPTX
Technical integration of data repositories status and challenges
 
PPTX
Building COVID-19 Knowledge Graph at CoronaWhy
 
PPTX
Setting up Dataverse repository for research data
 
PPTX
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
PPTX
External controlled vocabularies support in Dataverse
 
PPTX
CLARIAH CMDI use case and flexible metadata schemes
PPTX
The world of Docker and Kubernetes
 
PPTX
Building COVID-19 Museum as Open Science Project
 
PPTX
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
PPTX
Flexible metadata schemes for research data repositories - Clarin Conference...
PPTX
Fighting COVID-19 with Artificial Intelligence
 
PPTX
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
PDF
Dataverse opportunities
 
PPSX
Linked Data to Improve the OER Experience
PDF
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
PPT
Cs 1023 lec 13 web (week 4)
PPT
Library Web Services for Discovery and Delivery of Scientific Information
PPTX
Virtuoso -- The Prometheus of RDF
PPT
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
CLARIN CMDI support in Dataverse
 
Technical integration of data repositories status and challenges
 
Building COVID-19 Knowledge Graph at CoronaWhy
 
Setting up Dataverse repository for research data
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
External controlled vocabularies support in Dataverse
 
CLARIAH CMDI use case and flexible metadata schemes
The world of Docker and Kubernetes
 
Building COVID-19 Museum as Open Science Project
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Fighting COVID-19 with Artificial Intelligence
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
Dataverse opportunities
 
Linked Data to Improve the OER Experience
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Cs 1023 lec 13 web (week 4)
Library Web Services for Discovery and Delivery of Scientific Information
Virtuoso -- The Prometheus of RDF
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
Ad

Similar to Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DANS Research Data Repositories (20)

PPTX
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
PDF
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
PPTX
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
PPTX
Semantic Mapping in CLARIN Component Metadata.
PDF
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
PPTX
IBC FAIR Data Prototype Implementation slideshow
PPTX
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
PDF
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
PPTX
CMDI2RDF
PPT
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
PPTX
Decentralised identifiers and knowledge graphs
 
PDF
Dn31766773
PPTX
FAIRy stories: the FAIR Data principles in theory and in practice
PPTX
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
PPT
Towards Semantic APIs for Research Data Services (Invited Talk)
PPTX
Force11 JDDCP workshop presentation, @ Force2015, Oxford
PPT
Advanced Knowledge Technologies (AKT) -highlights 2006
PPTX
Clarin nl odijk-final_event_2015-03-13
PPTX
dublin_core_2025 Key Standards for Building Data Catalogue
PDF
Semaphore Case Studies presented for MarkLogic World 2024
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Semantic Mapping in CLARIN Component Metadata.
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
IBC FAIR Data Prototype Implementation slideshow
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
CMDI2RDF
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
Decentralised identifiers and knowledge graphs
 
Dn31766773
FAIRy stories: the FAIR Data principles in theory and in practice
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
Towards Semantic APIs for Research Data Services (Invited Talk)
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Advanced Knowledge Technologies (AKT) -highlights 2006
Clarin nl odijk-final_event_2015-03-13
dublin_core_2025 Key Standards for Building Data Catalogue
Semaphore Case Studies presented for MarkLogic World 2024
Ad

More from vty (10)

PPTX
Decentralisation and knowledge graphs
 
PPTX
Decentralised identifiers for CLARIAH infrastructure
 
PPTX
Dataverse repository for research data in the COVID-19 Museum
 
PPTX
SSHOC Dataverse in the European Open Science Cloud
 
PPTX
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
PPTX
Building an electronic repository and archives on Dataverse in the European O...
 
PPTX
Dataverse in the European Open Science Cloud
 
PPTX
Data standardization process for social sciences and humanities
 
PPTX
Development in Dataverse SSHOC project
 
PPTX
DataverseEU as multilingual repository
 
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
 
SSHOC Dataverse in the European Open Science Cloud
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Building an electronic repository and archives on Dataverse in the European O...
 
Dataverse in the European Open Science Cloud
 
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
 
DataverseEU as multilingual repository
 

Recently uploaded (20)

PDF
The scientific heritage No 166 (166) (2025)
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
An interstellar mission to test astrophysical black holes
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PPT
veterinary parasitology ````````````.ppt
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
Seminar Hypertension and Kidney diseases.pptx
The scientific heritage No 166 (166) (2025)
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Biophysics 2.pdffffffffffffffffffffffffff
An interstellar mission to test astrophysical black holes
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Science Quipper for lesson in grade 8 Matatag Curriculum
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
CORDINATION COMPOUND AND ITS APPLICATIONS
veterinary parasitology ````````````.ppt
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Phytochemical Investigation of Miliusa longipes.pdf
Introcution to Microbes Burton's Biology for the Health
Seminar Hypertension and Kidney diseases.pptx

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DANS Research Data Repositories

  • 1. Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DANS Research Data Repositories Slava Tykhonov; Jerry de Vries; Eko Indarto; Andrea Scharnhorst; Femmy Admiraal; Mike Priddy (DANS-KNAW) Presentation at ISKO Knowledge Organisation Research Observatory 24 Nov 2021 RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
  • 2. Content - Introduction - Who we are - CMDI in the ‘wild’ - CLARIN data collections in EASY - Creating FAIR metadata and semantic services - case of CMDI Pipeline - Extract-Transform-Load (CMDI) metadata into Dataverse - Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR - Lessons learned and pointers
  • 3. Introduction DANS - Royal Netherlands Academy of Arts and Sciences - research data expertise center - long-term preservation - archive (www.easy.dans.knaw; Dataverse.nl; Narcis.nl) CLARIAH.nl Large Scale Infrastructure Project for Humanities (CLARIN+DARIAH) DARIAH: Digital Research Infrastructure for the Arts and Humanities CLARIN: Common Language Resources and Technology Infrastructure CMDI: Component MetaData Infrastructure: a framework to describe and reuse metadata blueprints SKOSMOS: Open source web-based SKOS browser and publishing tool Challenge How to make the datasets from the CLARIN community in the long-term archive discoverable in the CLARIN infrastructure? How to search ‘in data’ ? = How to achieve richer indexing of metadata?
  • 4. The problem - on a generic level
  • 5. Vision: Semantic interoperability on the infrastructure level We envision a situation where thousands of Dataverse instances (due to EOSC) on the web can be simultaneously search for data. The old dream of Federated search/Universal catalogue can only be realised if: (1) Cross -walks; mapping across different metadata schemes are implemented (2) In metadata schemes we seek for ways to enrich indexes with values from controlled vocabularies Standard response = standardisation and harmonisation = repository software, certain metadata standards, or certain controlled vocabularies New response = explore agile solutions (Proof of Concept) which can be implemented by different communities (even smaller ones), so we keep variety and still enable integration.
  • 6. The problem - on a concrete level CMDI ‘in the wild’
  • 7. Content - Introduction - Who we are - CMDI in the ‘wild’ - CLARIN data collections in EASY - Creating FAIR metadata and semantic services - case of CMDI Pipeline - Extract-Transform-Load (CMDI) metadata into Dataverse - Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR - Lessons learned and pointers
  • 8. Conceptual approach: Semantic interoperability on the infrastructure level - building common solutions for everyone Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6 “Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format - following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field storage architecture). This new API also allows for the update of terms metadata“. External controlled vocabularies support is being developed by DANS in SSHOC project and already integrated in Dataverse core in release 5.7. Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/ Interfaces: http://github.com/gdcc/dataverse-external-vocab-support Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
  • 9. CMDI Pipeline - Backbone of our pipeline: Extract-Transform-Load (CMDI) metadata into Dataverse - One block relevant for semantic services: Mapping across metadata standards - Another block: Look-up for values in controlled vocabulary registers - enrich indexing
  • 10. SEMAF: A Proposal for a Flexible Semantic Mapping Framework Proposal: https://zenodo.org/record/4651421#.YT9lyC8RpZI POC: https://github.com/Dans-labs/semaf-poc
  • 11. Coming close to the implementation 1. Use Data Catalog Vocabulary (DCAT) mappings for CMDI metadata fields 2. Simple Knowledge Organization System (SKOS) to model a thesauri-like resources with simple skos:broader, skos:narrower and skos:related properties 3. Load CMDI properties and attributes and build a Knowledge Graph out of all elements 4. Enrich the Knowledge Graph with concept URIs from various controlled vocabularies like Skosmos hosted or Wikidata 5. Use different format data-serialization formats suitable for the integration with different systems. For example, json-ld suitable for Dataverse, turtle for Jena Fuseki, RDF for LoD frameworks
  • 12. Complexity of CMDI is unfolding After the implementation ● Complexity in CMDI becomes more visible ● Identify core concepts which can be mapped to standard bibliographic schemes as DCAT (red box) ● Possibility to match values of CMDI concepts to other controlled vocabularies (green box)
  • 13. How does it look when implemented in Dataverse? Every field can be linked to the appropriate controlled vocabularies in FAIR way!
  • 14. Greater vision: Dataverse metadata schemas ingested into a Knowledge Graph Compound keyword field with SKOS We use SKOS relationships to keep the hierarchy and relationships between metadata fields Other Dataverse schemas: https://github.com/Dans-labs/semaf-client/tree/cmdi/schema
  • 15. Once in a Knowledge graph: what can we do? Pipeline managed to establish some relationships to Wikidata concepts and automatically updated the dataset with new conceptURIs! The example of automatic enrichment with Wikidata
  • 16. Content - Introduction - Who we are - CMDI in the ‘wild’ - CLARIN data collections in EASY - Creating FAIR metadata and semantic services - case of CMDI Pipeline - Extract-Transform-Load (CMDI) metadata into Dataverse - Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR - Lessons learned and pointers
  • 17. Lessons learned (I) Scientific communities and archives have different perspectives on standardisation, and semantic services. In research formalisation (including KOS, ontologies, any ‘model’) is a heuristic device, agile to new research questions, and so intrinsically ‘not interoperable’. In other words, there is a difference between research needs and information needs.
  • 18. Lessons learned (II) - We provided a solution for our CMDI problem - by creating a CLARIN compatible Dataverse solution, which via an API can be harvested by the CLARIN search service; we also created another perspective on the CDMI ‘challenge’ - We used Dataverse is a platform due to an open active community; - The examples we showed you some of are results of a ‘Vision Lab’ - proof of concepts - funded in projects as SSHOC, CLARIAH, EOSC - The results are envisioned be implemented locally. - But, in principle the solutions are platform agnostic.
  • 19. Lessons learned (III) - In the future, repositories might become nodes in a large searchable knowledge graph and semantic links might enable pathways for contextual/semantic search. - Part of this future will be automatically supported semantic enrichment at the local instantiations (automatic indexing in a net instead of in an index) - Problem: keep provenance, authority (trust) - governance between those (micro-service) providers need to be organised. What can we learn from history?
  • 20. References - pointers CMDI exploration tool DANS CMDI converter github CMDI properties frequency VLO top profiles CMDI core metadata proposal Core metadata components design for use cases DANS CMDI metadata generator CMDI metadata model published as TSV files Convertor can extract and show the hierarchy of all fields CLARIAH compliant Dataverse Docker module Dataverse Docker with CMDI metadata schema Core metadata components design guidelines Guidelines link Semantic Gateway as plugin app Dataverse gateway Semantic Gateway API Dataverse metadata schema ingested into Graph https://github.com/Dans-labs/semaf- client/tree/cmdi/schema
  • 21. References - pointers Dataverse 5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7 Semantic Gateway: https://github.com/Dans-labs/semantic-gateway SSHOC task 5.2 http://github.com/SSHOC SEMAF client https://github.com/Dans-labs/semaf-client CMDI data model and namespaces: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from CLARIN to Linked Open Data Flexible Metadata Schemes for Research Data Repositories. / de Vries, Jerry; Tykhonov, Vyacheslav; Scharnhorst, Andrea; Admiraal, Femmy; Indarto, Eko; Priddy, Mike. 2021. Abstract from CLARIN Annual Conference 2021. https://www.clarin.eu/content/programme-clarin-annual- conference-2021
  • 22. Questions? Slava Tykhonov <vyacheslav.tykhonov@dans.knaw.nl> Andrea Scharnhorst <andrea.scharnhorst@dans.knaw.nl>

Editor's Notes

  • #7: We have far more than 29 datasets with CMDI information in them, but those additional metadata are added as files not indexed, and so not searchable; we use Dublin Core like metadata, and the CMDI metadata are often in XML form
  • #11: If there is a lot to be said about SEMAF slide 9 can be a bullet on another slide - 10? Mike
  • #15: And the clue is also to give power to the communities to create their own specific scheme, they only need to find the connector to the higher level