SlideShare a Scribd company logo
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and
DANS Research Data Repositories
Slava Tykhonov; Jerry de Vries; Eko Indarto; Andrea Scharnhorst; Femmy Admiraal; Mike
Priddy (DANS-KNAW)
Presentation at ISKO Knowledge Organisation Research Observatory
24 Nov 2021
RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES
AND DOMAIN NEEDS
Content
- Introduction
- Who we are
- CMDI in the ‘wild’ - CLARIN data collections in EASY
- Creating FAIR metadata and semantic services - case of CMDI Pipeline
- Extract-Transform-Load (CMDI) metadata into Dataverse
- Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR
- Lessons learned and pointers
Introduction
DANS - Royal Netherlands Academy of Arts and
Sciences - research data expertise center - long-term
preservation - archive (www.easy.dans.knaw;
Dataverse.nl; Narcis.nl)
CLARIAH.nl Large Scale Infrastructure Project for
Humanities (CLARIN+DARIAH)
DARIAH: Digital Research Infrastructure for the Arts and
Humanities
CLARIN: Common Language Resources and
Technology Infrastructure
CMDI: Component MetaData Infrastructure: a framework
to describe and reuse metadata blueprints
SKOSMOS: Open source web-based SKOS browser
and publishing tool
Challenge
How to make the datasets from the CLARIN
community in the long-term archive discoverable in
the CLARIN infrastructure?
How to search ‘in data’ ? = How to achieve richer
indexing of metadata?
The problem - on a generic level
Vision: Semantic interoperability on the infrastructure level
We envision a situation where thousands of Dataverse instances (due to EOSC) on the
web can be simultaneously search for data.
The old dream of Federated search/Universal catalogue can only be realised if:
(1) Cross -walks; mapping across different metadata schemes are implemented
(2) In metadata schemes we seek for ways to enrich indexes with values from controlled
vocabularies
Standard response = standardisation and harmonisation = repository software, certain
metadata standards, or certain controlled vocabularies
New response = explore agile solutions (Proof of Concept) which can be implemented by
different communities (even smaller ones), so we keep variety and still enable integration.
The problem - on a concrete level
CMDI ‘in the wild’
Content
- Introduction
- Who we are
- CMDI in the ‘wild’ - CLARIN data collections in EASY
- Creating FAIR metadata and semantic services - case of CMDI Pipeline
- Extract-Transform-Load (CMDI) metadata into Dataverse
- Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR
- Lessons learned and pointers
Conceptual approach: Semantic interoperability on the
infrastructure level - building common solutions for everyone
Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6
“Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format -
following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of
metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field
storage architecture). This new API also allows for the update of terms metadata“.
External controlled vocabularies support is being developed by DANS in SSHOC project and
already integrated in Dataverse core in release 5.7.
Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/
Interfaces: http://github.com/gdcc/dataverse-external-vocab-support
Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
CMDI Pipeline
- Backbone of our pipeline: Extract-Transform-Load (CMDI) metadata into Dataverse
- One block relevant for semantic services: Mapping across metadata standards
- Another block: Look-up for values in controlled vocabulary registers - enrich indexing
SEMAF: A Proposal for a Flexible Semantic Mapping Framework
Proposal: https://zenodo.org/record/4651421#.YT9lyC8RpZI
POC: https://github.com/Dans-labs/semaf-poc
Coming close to the implementation
1. Use Data Catalog Vocabulary (DCAT) mappings for CMDI metadata fields
2. Simple Knowledge Organization System (SKOS) to model a thesauri-like
resources with simple skos:broader, skos:narrower and skos:related
properties
3. Load CMDI properties and attributes and build a Knowledge Graph out of all
elements
4. Enrich the Knowledge Graph with concept URIs from various controlled
vocabularies like Skosmos hosted or Wikidata
5. Use different format data-serialization formats suitable for the integration with
different systems. For example, json-ld suitable for Dataverse, turtle for Jena
Fuseki, RDF for LoD frameworks
Complexity of CMDI is unfolding
After the implementation
● Complexity in CMDI becomes more
visible
● Identify core concepts which can be
mapped to standard bibliographic
schemes as DCAT (red box)
● Possibility to match values of CMDI
concepts to other controlled
vocabularies (green box)
How does it look when implemented in Dataverse?
Every field can be linked to the appropriate controlled vocabularies in FAIR way!
Greater vision:
Dataverse metadata schemas ingested into a Knowledge Graph
Compound keyword field with SKOS
We use SKOS relationships to keep the
hierarchy and relationships between
metadata fields
Other Dataverse schemas: https://github.com/Dans-labs/semaf-client/tree/cmdi/schema
Once in a Knowledge graph: what can we do?
Pipeline managed to establish some relationships
to Wikidata concepts and automatically updated the
dataset with new conceptURIs!
The example of automatic enrichment with Wikidata
Content
- Introduction
- Who we are
- CMDI in the ‘wild’ - CLARIN data collections in EASY
- Creating FAIR metadata and semantic services - case of CMDI Pipeline
- Extract-Transform-Load (CMDI) metadata into Dataverse
- Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR
- Lessons learned and pointers
Lessons learned (I)
Scientific communities and archives have different perspectives on standardisation, and semantic services.
In research formalisation (including KOS, ontologies, any ‘model’) is a heuristic device, agile to new research
questions, and so intrinsically ‘not interoperable’. In other words, there is a difference between research needs
and information needs.
Lessons learned (II)
- We provided a solution for our CMDI problem - by creating a CLARIN
compatible Dataverse solution, which via an API can be harvested by the
CLARIN search service; we also created another perspective on the CDMI
‘challenge’
- We used Dataverse is a platform due to an open active community;
- The examples we showed you some of are results of a ‘Vision Lab’ - proof of
concepts - funded in projects as SSHOC, CLARIAH, EOSC
- The results are envisioned be implemented locally.
- But, in principle the solutions are platform agnostic.
Lessons learned (III)
- In the future, repositories might become nodes in a large searchable
knowledge graph and semantic links might enable pathways for
contextual/semantic search.
- Part of this future will be automatically supported semantic enrichment at the
local instantiations (automatic indexing in a net instead of in an index)
- Problem: keep provenance, authority (trust) - governance between those
(micro-service) providers need to be organised. What can we learn from
history?
References - pointers
CMDI exploration tool DANS CMDI converter github
CMDI properties frequency VLO top profiles
CMDI core metadata proposal Core metadata components design for use cases
DANS CMDI metadata generator CMDI metadata model published as TSV files
Convertor can extract and show the hierarchy of all fields
CLARIAH compliant Dataverse Docker module Dataverse Docker with CMDI metadata schema
Core metadata components design guidelines Guidelines link
Semantic Gateway as plugin app Dataverse gateway Semantic Gateway API
Dataverse metadata schema ingested into
Graph
https://github.com/Dans-labs/semaf-client/tree/cmdi/sche
ma
References - pointers
Dataverse 5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7
Semantic Gateway: https://github.com/Dans-labs/semantic-gateway
SSHOC task 5.2 http://github.com/SSHOC
SEMAF client https://github.com/Dans-labs/semaf-client
CMDI data model and namespaces: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from
CLARIN to Linked Open Data
Flexible Metadata Schemes for Research Data Repositories. / de Vries, Jerry; Tykhonov, Vyacheslav;
Scharnhorst, Andrea; Admiraal, Femmy; Indarto, Eko; Priddy, Mike.
2021. Abstract from CLARIN Annual Conference 2021.
https://www.clarin.eu/content/programme-clarin-annual-conference-2021
Questions?
Slava Tykhonov <vyacheslav.tykhonov@dans.knaw.nl>
Andrea Scharnhorst <andrea.scharnhorst@dans.knaw.nl>

More Related Content

PPTX
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
PPTX
External CV support in Dataverse 5.7
 
PPTX
CLARIAH CMDI use case and flexible metadata schemes
PPTX
CLARIN CMDI use case and flexible metadata schemes
 
PPTX
Flexible metadata schemes for research data repositories - Clarin Conference...
PPTX
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
PPTX
Ontologies, controlled vocabularies and Dataverse
 
PPTX
CLARIN CMDI support in Dataverse
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
External CV support in Dataverse 5.7
 
CLARIAH CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Ontologies, controlled vocabularies and Dataverse
 
CLARIN CMDI support in Dataverse
 

What's hot (19)

PPTX
5 years of Dataverse evolution
 
PPTX
Building COVID-19 Knowledge Graph at CoronaWhy
 
PPTX
Setting up Dataverse repository for research data
 
PPTX
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
PPTX
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
PPTX
The world of Docker and Kubernetes
 
PPTX
Building COVID-19 Museum as Open Science Project
 
PPTX
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
PPTX
DataverseEU as multilingual repository
 
PPT
Library Web Services for Discovery and Delivery of Scientific Information
PPT
D4Science scientific data infrastructure promoting interoperability by embrac...
 
PDF
WhatIsData-Blitz
PDF
IRJET- Secured Hadoop Environment
PPTX
Extending DSpace 7: DSpace-CRIS and DSpace-GLAM for empowered repositories an...
PDF
BrainSpa Paper
PPTX
Transient and persistent RDF views over relational databases in the context o...
PDF
Leveraging IoT as part of your digital transformation
PDF
Fast Synchronization In IVR Using REST API For HTML5 And AJAX
PDF
LCI2009-Tutorial
5 years of Dataverse evolution
 
Building COVID-19 Knowledge Graph at CoronaWhy
 
Setting up Dataverse repository for research data
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
The world of Docker and Kubernetes
 
Building COVID-19 Museum as Open Science Project
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
DataverseEU as multilingual repository
 
Library Web Services for Discovery and Delivery of Scientific Information
D4Science scientific data infrastructure promoting interoperability by embrac...
 
WhatIsData-Blitz
IRJET- Secured Hadoop Environment
Extending DSpace 7: DSpace-CRIS and DSpace-GLAM for empowered repositories an...
BrainSpa Paper
Transient and persistent RDF views over relational databases in the context o...
Leveraging IoT as part of your digital transformation
Fast Synchronization In IVR Using REST API For HTML5 And AJAX
LCI2009-Tutorial
Ad

Similar to Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the DANS EASY Research Data Repository (20)

PPTX
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
PDF
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
PPTX
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
PDF
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
PPTX
Semantic Mapping in CLARIN Component Metadata.
PPTX
IBC FAIR Data Prototype Implementation slideshow
PPTX
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
PDF
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
PPTX
CMDI2RDF
PPTX
Metaverse for Dataverse
 
PDF
Dn31766773
PPTX
Decentralised identifiers and knowledge graphs
 
PPT
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
PPTX
FAIRy stories: the FAIR Data principles in theory and in practice
PPTX
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
PPT
Towards Semantic APIs for Research Data Services (Invited Talk)
PPT
Advanced Knowledge Technologies (AKT) -highlights 2006
PPTX
Force11 JDDCP workshop presentation, @ Force2015, Oxford
PPTX
Clarin nl odijk-final_event_2015-03-13
PPTX
A Finnish perspective on FAIRsFAIR outputs
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
Semantic Mapping in CLARIN Component Metadata.
IBC FAIR Data Prototype Implementation slideshow
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
8. (Semantic Interoperability in the CLARIN infrastructure. Menzo Windhouwer....
CMDI2RDF
Metaverse for Dataverse
 
Dn31766773
Decentralised identifiers and knowledge graphs
 
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
FAIRy stories: the FAIR Data principles in theory and in practice
smartAPIs: EUDAT Semantic Working Group Presentation @ RDA 9th Plenary
Towards Semantic APIs for Research Data Services (Invited Talk)
Advanced Knowledge Technologies (AKT) -highlights 2006
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Clarin nl odijk-final_event_2015-03-13
A Finnish perspective on FAIRsFAIR outputs
Ad

More from Andrea Scharnhorst (20)

PDF
The Polifonia portal: a confluence of user stories, research pilots, data man...
POTX
Floating classifications - Knowledge Organization Systems in past, present an...
PDF
Digging into the Knowledge Graph (2017-2020)
PPTX
Dilemmata of research infrastructures
PDF
DARIAH Contributions 2019
PPTX
Data curation and data archiving at different stages of the research process
PPTX
SUSTAINABILITY BEYOND GUIDELINES
PPT
Information science in practice - research at a Trusted Digital Archive
PPTX
How to use science maps to navigate large information spaces? What is the lin...
PPTX
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
PPTX
Why do we need to model the science system?
PPTX
Humanities and ICT
PPTX
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
PPTX
Between  information  retrieval  services  and bibliometrics  research. New  ...
PPTX
Knowledge maps for libraries and archives - uses and use cases
PPTX
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
PPTX
Rare (and emergent) disciplines in the light of science studies
PPT
Drowning in information – the need of macroscopes for research funding
PPT
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...
PPT
Mapping Digital Humanities projects. A pilot of a DH project registry for The...
The Polifonia portal: a confluence of user stories, research pilots, data man...
Floating classifications - Knowledge Organization Systems in past, present an...
Digging into the Knowledge Graph (2017-2020)
Dilemmata of research infrastructures
DARIAH Contributions 2019
Data curation and data archiving at different stages of the research process
SUSTAINABILITY BEYOND GUIDELINES
Information science in practice - research at a Trusted Digital Archive
How to use science maps to navigate large information spaces? What is the lin...
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Why do we need to model the science system?
Humanities and ICT
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Between  information  retrieval  services  and bibliometrics  research. New  ...
Knowledge maps for libraries and archives - uses and use cases
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Rare (and emergent) disciplines in the light of science studies
Drowning in information – the need of macroscopes for research funding
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...
Mapping Digital Humanities projects. A pilot of a DH project registry for The...

Recently uploaded (20)

PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
master seminar digital applications in india
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Classroom Observation Tools for Teachers
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Sports Quiz easy sports quiz sports quiz
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
O7-L3 Supply Chain Operations - ICLT Program
Anesthesia in Laparoscopic Surgery in India
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPH.pptx obstetrics and gynecology in nursing
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
master seminar digital applications in india
Renaissance Architecture: A Journey from Faith to Humanism
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
RMMM.pdf make it easy to upload and study
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pre independence Education in Inndia.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
O5-L3 Freight Transport Ops (International) V1.pdf
Classroom Observation Tools for Teachers
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Sports Quiz easy sports quiz sports quiz

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the DANS EASY Research Data Repository

  • 1. Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DANS Research Data Repositories Slava Tykhonov; Jerry de Vries; Eko Indarto; Andrea Scharnhorst; Femmy Admiraal; Mike Priddy (DANS-KNAW) Presentation at ISKO Knowledge Organisation Research Observatory 24 Nov 2021 RESEARCH REPOSITORIES AND DATAVERSE: NEGOTIATING METADATA, VOCABULARIES AND DOMAIN NEEDS
  • 2. Content - Introduction - Who we are - CMDI in the ‘wild’ - CLARIN data collections in EASY - Creating FAIR metadata and semantic services - case of CMDI Pipeline - Extract-Transform-Load (CMDI) metadata into Dataverse - Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR - Lessons learned and pointers
  • 3. Introduction DANS - Royal Netherlands Academy of Arts and Sciences - research data expertise center - long-term preservation - archive (www.easy.dans.knaw; Dataverse.nl; Narcis.nl) CLARIAH.nl Large Scale Infrastructure Project for Humanities (CLARIN+DARIAH) DARIAH: Digital Research Infrastructure for the Arts and Humanities CLARIN: Common Language Resources and Technology Infrastructure CMDI: Component MetaData Infrastructure: a framework to describe and reuse metadata blueprints SKOSMOS: Open source web-based SKOS browser and publishing tool Challenge How to make the datasets from the CLARIN community in the long-term archive discoverable in the CLARIN infrastructure? How to search ‘in data’ ? = How to achieve richer indexing of metadata?
  • 4. The problem - on a generic level
  • 5. Vision: Semantic interoperability on the infrastructure level We envision a situation where thousands of Dataverse instances (due to EOSC) on the web can be simultaneously search for data. The old dream of Federated search/Universal catalogue can only be realised if: (1) Cross -walks; mapping across different metadata schemes are implemented (2) In metadata schemes we seek for ways to enrich indexes with values from controlled vocabularies Standard response = standardisation and harmonisation = repository software, certain metadata standards, or certain controlled vocabularies New response = explore agile solutions (Proof of Concept) which can be implemented by different communities (even smaller ones), so we keep variety and still enable integration.
  • 6. The problem - on a concrete level CMDI ‘in the wild’
  • 7. Content - Introduction - Who we are - CMDI in the ‘wild’ - CLARIN data collections in EASY - Creating FAIR metadata and semantic services - case of CMDI Pipeline - Extract-Transform-Load (CMDI) metadata into Dataverse - Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR - Lessons learned and pointers
  • 8. Conceptual approach: Semantic interoperability on the infrastructure level - building common solutions for everyone Dataverse Semantic API in release 5.6: https://github.com/IQSS/dataverse/releases/tag/v5.6 “Dataset metadata can be retrieved, set, and updated using a new, flatter JSON-LD format - following the format of an OAI-ORE export (RDA-conformant Bags), allowing for easier transfer of metadata to/from other systems (i.e. without needing to know Dataverse's metadata block and field storage architecture). This new API also allows for the update of terms metadata“. External controlled vocabularies support is being developed by DANS in SSHOC project and already integrated in Dataverse core in release 5.7. Proposal: https://docs.google.com/document/d/1txdcFuxskRx_tLsDQ7KKLFTMR_r9IBhorDu3V_r445w/ Interfaces: http://github.com/gdcc/dataverse-external-vocab-support Integrations: Wikidata, ORCID, MeSH, Skosmos vocabularies
  • 9. CMDI Pipeline - Backbone of our pipeline: Extract-Transform-Load (CMDI) metadata into Dataverse - One block relevant for semantic services: Mapping across metadata standards - Another block: Look-up for values in controlled vocabulary registers - enrich indexing
  • 10. SEMAF: A Proposal for a Flexible Semantic Mapping Framework Proposal: https://zenodo.org/record/4651421#.YT9lyC8RpZI POC: https://github.com/Dans-labs/semaf-poc
  • 11. Coming close to the implementation 1. Use Data Catalog Vocabulary (DCAT) mappings for CMDI metadata fields 2. Simple Knowledge Organization System (SKOS) to model a thesauri-like resources with simple skos:broader, skos:narrower and skos:related properties 3. Load CMDI properties and attributes and build a Knowledge Graph out of all elements 4. Enrich the Knowledge Graph with concept URIs from various controlled vocabularies like Skosmos hosted or Wikidata 5. Use different format data-serialization formats suitable for the integration with different systems. For example, json-ld suitable for Dataverse, turtle for Jena Fuseki, RDF for LoD frameworks
  • 12. Complexity of CMDI is unfolding After the implementation ● Complexity in CMDI becomes more visible ● Identify core concepts which can be mapped to standard bibliographic schemes as DCAT (red box) ● Possibility to match values of CMDI concepts to other controlled vocabularies (green box)
  • 13. How does it look when implemented in Dataverse? Every field can be linked to the appropriate controlled vocabularies in FAIR way!
  • 14. Greater vision: Dataverse metadata schemas ingested into a Knowledge Graph Compound keyword field with SKOS We use SKOS relationships to keep the hierarchy and relationships between metadata fields Other Dataverse schemas: https://github.com/Dans-labs/semaf-client/tree/cmdi/schema
  • 15. Once in a Knowledge graph: what can we do? Pipeline managed to establish some relationships to Wikidata concepts and automatically updated the dataset with new conceptURIs! The example of automatic enrichment with Wikidata
  • 16. Content - Introduction - Who we are - CMDI in the ‘wild’ - CLARIN data collections in EASY - Creating FAIR metadata and semantic services - case of CMDI Pipeline - Extract-Transform-Load (CMDI) metadata into Dataverse - Workflow for linking external concepts to (CMDI) metadata values to make metadata FAIR - Lessons learned and pointers
  • 17. Lessons learned (I) Scientific communities and archives have different perspectives on standardisation, and semantic services. In research formalisation (including KOS, ontologies, any ‘model’) is a heuristic device, agile to new research questions, and so intrinsically ‘not interoperable’. In other words, there is a difference between research needs and information needs.
  • 18. Lessons learned (II) - We provided a solution for our CMDI problem - by creating a CLARIN compatible Dataverse solution, which via an API can be harvested by the CLARIN search service; we also created another perspective on the CDMI ‘challenge’ - We used Dataverse is a platform due to an open active community; - The examples we showed you some of are results of a ‘Vision Lab’ - proof of concepts - funded in projects as SSHOC, CLARIAH, EOSC - The results are envisioned be implemented locally. - But, in principle the solutions are platform agnostic.
  • 19. Lessons learned (III) - In the future, repositories might become nodes in a large searchable knowledge graph and semantic links might enable pathways for contextual/semantic search. - Part of this future will be automatically supported semantic enrichment at the local instantiations (automatic indexing in a net instead of in an index) - Problem: keep provenance, authority (trust) - governance between those (micro-service) providers need to be organised. What can we learn from history?
  • 20. References - pointers CMDI exploration tool DANS CMDI converter github CMDI properties frequency VLO top profiles CMDI core metadata proposal Core metadata components design for use cases DANS CMDI metadata generator CMDI metadata model published as TSV files Convertor can extract and show the hierarchy of all fields CLARIAH compliant Dataverse Docker module Dataverse Docker with CMDI metadata schema Core metadata components design guidelines Guidelines link Semantic Gateway as plugin app Dataverse gateway Semantic Gateway API Dataverse metadata schema ingested into Graph https://github.com/Dans-labs/semaf-client/tree/cmdi/sche ma
  • 21. References - pointers Dataverse 5.7 https://github.com/IQSS/dataverse/releases/tag/v5.7 Semantic Gateway: https://github.com/Dans-labs/semantic-gateway SSHOC task 5.2 http://github.com/SSHOC SEMAF client https://github.com/Dans-labs/semaf-client CMDI data model and namespaces: M. Windhouwer, E. Indarto, D. Broeder. CMD2RDF: Building a Bridge from CLARIN to Linked Open Data Flexible Metadata Schemes for Research Data Repositories. / de Vries, Jerry; Tykhonov, Vyacheslav; Scharnhorst, Andrea; Admiraal, Femmy; Indarto, Eko; Priddy, Mike. 2021. Abstract from CLARIN Annual Conference 2021. https://www.clarin.eu/content/programme-clarin-annual-conference-2021
  • 22. Questions? Slava Tykhonov <vyacheslav.tykhonov@dans.knaw.nl> Andrea Scharnhorst <andrea.scharnhorst@dans.knaw.nl>