SlideShare a Scribd company logo
Enrichment of DDI support in the
Dataverse data repository
Slava Tykhonov, Marion Wittenberg (DANS-KNAW)
EDDI 2019, Tampere, Finland, December 3, 2019
Creative Commons Attribution 4.0 International (CC BY 4.0)
SSHOC objective and deliverables
Objective
Development of a research data repository service on EOSC, for SSH
institutions currently without such a facility for their designated communities
Deliverables
After 38 months: Data repository service running on EOSC
After 40 months: Report on principles of governance and sustainability of
the data repository service
Development process
DataverseSSHOC project has two parallel tracks of the development:
● Core development team is working on the modification and extension
of the Dataverse core functionality.
● The application development team will create new or will integrate
existent tools that will be published on Dataverse App Store website.
Our goal is to build the distributed and mature data infrastructure based on
sustainable microservices.
Services in European Open Science Cloud (EOSC)
● EOSC requires the level 8 of maturity
(at least)
● we need the highest quality of software
to be accepted as a service
● clear and transparent evaluation of
services is essential
● the evidence of technical maturity is the
key to success
● the limited warranty will allow to stop
out-of-warranty services
Applications maturity level
Every software package should follow the same CESSDA Maturity Model to
be accepted as a service.
https://zenodo.org/record/2591055#.XKR6ny2B2u5
Must have: k8s infrastructure with upstream Docker images, warranty
statement, documentation, unit tests, Selenium tests, jenkins pipeline.
Dataverse external applications with enough maturity that are deployed as a
Cloud services can be connected to any Dataverse repository by using API
Token.
Dataverse App Store
We’re building a different services out of tools!
Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML,
Images, video render, audio, JSON, GeoJSON/Shapefiles/Map, XML
Interoperability: external controlled vocabularies (CESSDA CV Manager)
Data processing: NESSTAR DDI migration tool
Linked Data: RDF compliance including SPARQL endpoint
Federated login: eduGAIN, PIONIER ID
DDI Converter tool
It usually takes a lot of efforts and time to migrate metadata and data to any
data repository like NESSTAR or DSpace to another repository.
The main idea of the DDI Converter is to separate mappings from the
conversion process and let metadata specialist to do it separately from the
DDI migration pipeline.
DDI Converter has a Docker infrastructure that allows to deploy it as image
on Kubernetes or other Cloud platforms. You don’t need any development
capacity to use it, just create mappings and the tool will do the rest!
Dataverse Metadata Crosswalk
Source: https://docs.google.com/spreadsheets/d/10Luzti7svVTVKTA-px27oq3RxCUM-QbiTkm8iMd5C54/edit#gid=0
Why XSLT mappings?
● XSLT (1998) is a language designed
primarily for transforming human
readable documents into other self
describing documents.
● DDI community is already using XSLT to
map metadata from one format to
another and collected a lot of mappings
that can be reused.
● XSLT mappings for different DDI standards
can be managed in the same github
repository
● At the moment the knowledge of XSLT is a
common job requirement for metadata
specialists.
DDI Converter in a nutshell
● Developed in Python3 as Flask application with pyDataverse module
(AUSSDA)
● DDI Converter uses XSLT mappings stored in github
● all CESSDA DDI transformations are also supported
https://github.com/MetadataTransform/ddi-xslt
● Swagger framework allows to use the tool as a manual deposit form
and in the same time as a microservice builtin in the migration pipeline
● Docker image deployed locally or on Cloud can connect DDI Converter
to any Dataverse instance by API
● You can migrate your data even if Dataverse instance is maintained by
someone else. Just copy API Token from your Dataverse account and
put in DDI Converter, and it will do the job for you!
Using Swagger as dataset deposit form
Import steps:
1. Open Swagger page
2. Upload DDI file
3. Select XSLT mapping from
github
4. Copy API Token from user
page in Dataverse
5. Choose a subdataverse where
dataset shoud go
6. Start migration process in one
click
7. Check result in Dataverse
Interested?
https://github.com/IQSS/dataverse-
ddi-converter-tool
What’s next? DDI explorer as a service
DDI Explorer is a Dataverse
application developed by
Scholars Portal
dataverse.scholarsportal.info
Dataverse SSHOC project got
it integrated in Docker image
and incorporated in the
Kubernetes infrastructure
Dataverse-docker module
DDI explorer will be delivered
as a Cloud service that can be
connected to any Dataverse
instance!
Spreadsheet previewer
This tool was contributed by
Dataverse SSHOC project and
integrated by Harvard IQSS in
Dataverse 4.18
It allows to browse through
web interface for viewing
data directly without
download.
Spreadsheet viewer can
increase chances to find a
proper data and to get a
citation - more FAIRness!
Partners
CLARIN/UiT DARIAH/PSNC
DARIAH/SUB
E-RIHS/CNR
CESSDA/DANS-KNAW
(lead)
Join our
community
https://www.sshopencloud.eu
info@sshopencloud.eu
@SSHOpenClou
d/in/sshopencloud

More Related Content

PPTX
Building an electronic repository and archives on Dataverse in the European O...
 
PPTX
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
PPTX
Dataverse in the European Open Science Cloud
 
PPTX
Building COVID-19 Museum as Open Science Project
 
PPTX
Ontologies, controlled vocabularies and Dataverse
 
PPTX
Flexible metadata schemes for research data repositories - Clarin Conference...
PPTX
CLARIN CMDI use case and flexible metadata schemes
 
PPTX
5 years of Dataverse evolution
 
Building an electronic repository and archives on Dataverse in the European O...
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Dataverse in the European Open Science Cloud
 
Building COVID-19 Museum as Open Science Project
 
Ontologies, controlled vocabularies and Dataverse
 
Flexible metadata schemes for research data repositories - Clarin Conference...
CLARIN CMDI use case and flexible metadata schemes
 
5 years of Dataverse evolution
 

What's hot (20)

PPTX
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
PPTX
External CV support in Dataverse 5.7
 
PPTX
Building COVID-19 Knowledge Graph at CoronaWhy
 
PPTX
Setting up Dataverse repository for research data
 
PPTX
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
PPTX
The world of Docker and Kubernetes
 
PPTX
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
PPTX
Technical integration of data repositories status and challenges
 
PPTX
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
PPTX
External controlled vocabularies support in Dataverse
 
PPTX
CLARIAH CMDI use case and flexible metadata schemes
PPTX
Fighting COVID-19 with Artificial Intelligence
 
PPTX
SSHOC Dataverse in the European Open Science Cloud
 
PDF
Dataverse opportunities
 
PDF
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
PPTX
Linked Open Data and DANS
 
PPTX
DataverseNL as structured data hub
 
PPTX
DataverseEU as multilingual repository
 
PPTX
Persistent identifiers in DataverseEU project
 
PPTX
Towards Digital Twin standards following an open source approach
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
External CV support in Dataverse 5.7
 
Building COVID-19 Knowledge Graph at CoronaWhy
 
Setting up Dataverse repository for research data
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
The world of Docker and Kubernetes
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Technical integration of data repositories status and challenges
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
External controlled vocabularies support in Dataverse
 
CLARIAH CMDI use case and flexible metadata schemes
Fighting COVID-19 with Artificial Intelligence
 
SSHOC Dataverse in the European Open Science Cloud
 
Dataverse opportunities
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Linked Open Data and DANS
 
DataverseNL as structured data hub
 
DataverseEU as multilingual repository
 
Persistent identifiers in DataverseEU project
 
Towards Digital Twin standards following an open source approach
Ad

Similar to Dataverse SSHOC enrichment of DDI support at EDDI'19 2 (20)

PPTX
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
PPTX
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
PDF
TPC-DI - The First Industry Benchmark for Data Integration
DOCX
Toward a System Building Agenda for Data Integration(and Dat.docx
PDF
Data Services and the Modern Data Ecosystem (Middle East)
PDF
Data Platform in the Cloud
PPTX
DIACHRON Project Overview
PDF
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
PDF
Time to Talk about Data Mesh
PDF
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
PPTX
Metaverse for Dataverse
 
PDF
Deconstructing Monoliths with Domain Driven Design
PPTX
CLARIN CMDI support in Dataverse
 
PDF
DXC Industrialized A.I. – Von der Data Story zum industrialisierten A.I. Service
PPTX
OpenDataForge - SledgeHammer EDDI 2013 presentation
PPTX
Evolutionary evnt-driven-architecture-for-accelerated-digital-transformation
PDF
General concepts: DDI
PDF
Evolving Hadoop into an Operational Platform with Data Applications
PPTX
Big data meet_up_08042016
PDF
DDDP 2019 - Brown to Green
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Enabling Self-service Data Provisioning Through Semantic Enrichment of Data |...
TPC-DI - The First Industry Benchmark for Data Integration
Toward a System Building Agenda for Data Integration(and Dat.docx
Data Services and the Modern Data Ecosystem (Middle East)
Data Platform in the Cloud
DIACHRON Project Overview
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Time to Talk about Data Mesh
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Metaverse for Dataverse
 
Deconstructing Monoliths with Domain Driven Design
CLARIN CMDI support in Dataverse
 
DXC Industrialized A.I. – Von der Data Story zum industrialisierten A.I. Service
OpenDataForge - SledgeHammer EDDI 2013 presentation
Evolutionary evnt-driven-architecture-for-accelerated-digital-transformation
General concepts: DDI
Evolving Hadoop into an Operational Platform with Data Applications
Big data meet_up_08042016
DDDP 2019 - Brown to Green
Ad

More from vty (7)

PPTX
Decentralised identifiers and knowledge graphs
 
PPTX
Decentralisation and knowledge graphs
 
PPTX
Decentralised identifiers for CLARIAH infrastructure
 
PPTX
Dataverse repository for research data in the COVID-19 Museum
 
PPTX
Controlled vocabularies and ontologies in Dataverse data repository
 
PPTX
Data standardization process for social sciences and humanities
 
PPTX
Development in Dataverse SSHOC project
 
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
 
Controlled vocabularies and ontologies in Dataverse data repository
 
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
 

Recently uploaded (20)

PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Microbiology with diagram medical studies .pptx
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PPT
protein biochemistry.ppt for university classes
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
An interstellar mission to test astrophysical black holes
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
2. Earth - The Living Planet earth and life
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
famous lake in india and its disturibution and importance
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
Sciences of Europe No 170 (2025)
Biophysics 2.pdffffffffffffffffffffffffff
Microbiology with diagram medical studies .pptx
7. General Toxicologyfor clinical phrmacy.pptx
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
protein biochemistry.ppt for university classes
Introduction to Fisheries Biotechnology_Lesson 1.pptx
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Phytochemical Investigation of Miliusa longipes.pdf
An interstellar mission to test astrophysical black holes
HPLC-PPT.docx high performance liquid chromatography
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
INTRODUCTION TO EVS | Concept of sustainability
AlphaEarth Foundations and the Satellite Embedding dataset
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
2. Earth - The Living Planet earth and life
Derivatives of integument scales, beaks, horns,.pptx
famous lake in india and its disturibution and importance
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Sciences of Europe No 170 (2025)

Dataverse SSHOC enrichment of DDI support at EDDI'19 2

  • 1. Enrichment of DDI support in the Dataverse data repository Slava Tykhonov, Marion Wittenberg (DANS-KNAW) EDDI 2019, Tampere, Finland, December 3, 2019 Creative Commons Attribution 4.0 International (CC BY 4.0)
  • 2. SSHOC objective and deliverables Objective Development of a research data repository service on EOSC, for SSH institutions currently without such a facility for their designated communities Deliverables After 38 months: Data repository service running on EOSC After 40 months: Report on principles of governance and sustainability of the data repository service
  • 3. Development process DataverseSSHOC project has two parallel tracks of the development: ● Core development team is working on the modification and extension of the Dataverse core functionality. ● The application development team will create new or will integrate existent tools that will be published on Dataverse App Store website. Our goal is to build the distributed and mature data infrastructure based on sustainable microservices.
  • 4. Services in European Open Science Cloud (EOSC) ● EOSC requires the level 8 of maturity (at least) ● we need the highest quality of software to be accepted as a service ● clear and transparent evaluation of services is essential ● the evidence of technical maturity is the key to success ● the limited warranty will allow to stop out-of-warranty services
  • 5. Applications maturity level Every software package should follow the same CESSDA Maturity Model to be accepted as a service. https://zenodo.org/record/2591055#.XKR6ny2B2u5 Must have: k8s infrastructure with upstream Docker images, warranty statement, documentation, unit tests, Selenium tests, jenkins pipeline. Dataverse external applications with enough maturity that are deployed as a Cloud services can be connected to any Dataverse repository by using API Token.
  • 6. Dataverse App Store We’re building a different services out of tools! Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML, Images, video render, audio, JSON, GeoJSON/Shapefiles/Map, XML Interoperability: external controlled vocabularies (CESSDA CV Manager) Data processing: NESSTAR DDI migration tool Linked Data: RDF compliance including SPARQL endpoint Federated login: eduGAIN, PIONIER ID
  • 7. DDI Converter tool It usually takes a lot of efforts and time to migrate metadata and data to any data repository like NESSTAR or DSpace to another repository. The main idea of the DDI Converter is to separate mappings from the conversion process and let metadata specialist to do it separately from the DDI migration pipeline. DDI Converter has a Docker infrastructure that allows to deploy it as image on Kubernetes or other Cloud platforms. You don’t need any development capacity to use it, just create mappings and the tool will do the rest!
  • 8. Dataverse Metadata Crosswalk Source: https://docs.google.com/spreadsheets/d/10Luzti7svVTVKTA-px27oq3RxCUM-QbiTkm8iMd5C54/edit#gid=0
  • 9. Why XSLT mappings? ● XSLT (1998) is a language designed primarily for transforming human readable documents into other self describing documents. ● DDI community is already using XSLT to map metadata from one format to another and collected a lot of mappings that can be reused. ● XSLT mappings for different DDI standards can be managed in the same github repository ● At the moment the knowledge of XSLT is a common job requirement for metadata specialists.
  • 10. DDI Converter in a nutshell ● Developed in Python3 as Flask application with pyDataverse module (AUSSDA) ● DDI Converter uses XSLT mappings stored in github ● all CESSDA DDI transformations are also supported https://github.com/MetadataTransform/ddi-xslt ● Swagger framework allows to use the tool as a manual deposit form and in the same time as a microservice builtin in the migration pipeline ● Docker image deployed locally or on Cloud can connect DDI Converter to any Dataverse instance by API ● You can migrate your data even if Dataverse instance is maintained by someone else. Just copy API Token from your Dataverse account and put in DDI Converter, and it will do the job for you!
  • 11. Using Swagger as dataset deposit form Import steps: 1. Open Swagger page 2. Upload DDI file 3. Select XSLT mapping from github 4. Copy API Token from user page in Dataverse 5. Choose a subdataverse where dataset shoud go 6. Start migration process in one click 7. Check result in Dataverse Interested? https://github.com/IQSS/dataverse- ddi-converter-tool
  • 12. What’s next? DDI explorer as a service DDI Explorer is a Dataverse application developed by Scholars Portal dataverse.scholarsportal.info Dataverse SSHOC project got it integrated in Docker image and incorporated in the Kubernetes infrastructure Dataverse-docker module DDI explorer will be delivered as a Cloud service that can be connected to any Dataverse instance!
  • 13. Spreadsheet previewer This tool was contributed by Dataverse SSHOC project and integrated by Harvard IQSS in Dataverse 4.18 It allows to browse through web interface for viewing data directly without download. Spreadsheet viewer can increase chances to find a proper data and to get a citation - more FAIRness!