SlideShare a Scribd company logo
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SSHOC
Setting up Dataverse repository for
research data
Slava Tykhonov, Senior Information Scientist
DANS-KNAW, The Royal Netherlands Academy of
Arts and Sciences
LIBSENSE webinar
10 March 2021
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
About me: DANS-KNAW projects (2016-2021)
● CLARIAH+ (ongoing)
● EOSC Synergy (ongoing)
● SSHOC Dataverse (ongoing)
● CESSDA DataverseEU 2018
● Time Machine Europe Supervisor at DANS-KNAW
● PARTHENOS Horizon 2020
● CESSDA PID (Personal Identifiers) Horizon 2020
● CLARIAH
● RDA (Research Data Alliance) PITTS Horizon 2020
● CESSDA SaW H2020-EU.1.4.1.1 Horizon 2020
2
Source: LinkedIn
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
About DANS-KNAW
DANS is the Dutch national centre of expertise and repository for research data.
We help researchers make their data available for reuse. This allows researchers
to use the data for new research and makes published research verifiable and
reproducible. With more than 150,000 datasets and a staff of 60, DANS is one of
the leading repositories in Europe.
Three pillars of DANS 2021-2025 programme: ‘Focus on FAIR”
•Centre of expertise for FAIR research data
•Versatile data repository: DANS data stations
•Active collaborator
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
What is Dataverse?
● Open source project developed by IQSS of Harvard University and published
on github
● Great product with very long history (from 2006)
● Very dynamic and experienced development team working in the Agile
environment (community call scheduled once in two weeks)
● Clear vision and understanding of research communities requirements,
public roadmap
● Strong community behind of Dataverse is helping to improve the basic
functionality and develop it further
● Dataverse has been selected as a data repository infrastructure by countries
from all continents
● Well developed architecture with rich API endpoints to build application
layers around Dataverse
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Federated Dataverse data repositories worldwide
Source: Merce Crosas, Harvard Data Commons
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DANS Dataverse 3.x migration (2016)
Basic DataverseNL services:
• Federated login for Netherlands
institutions
• Persistent Identifier Services (DOI and
handle)
• Integration with archival systems
Applications:
• Modern and historical world maps
visualisations
• Data API and Geo API services for
projects with data
• Panel datasets constructor
• Time series plot
• Treemaps
• Pie and chart visualizations
• Descriptive statistics tools
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DataverseNL collaborative data network
Source: https://dataverse.nl
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DataverseNL partners
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Cooperative Model DataverseNL
DANS provides and manages the system & storage, organizes the meetings
(back-office)
•Research institutions run their own RDM-support for their researchers /end users
(front-office)
•Every partner institute is responsible for their own data
•Shared costs (service membership + storage)
•Advisory Board consisting of partner representatives decides on the general
policy
•Administrators committee to discuss technical and functional issues
•Cooperation agreement, Service Level Agreement, Processor agreement
(GDPR), General Terms of use (end users)
Source: Marion Wittenberg, DataverseNL
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
DANS Data Stations - Future Data Services
Dataverse is API based and a key framework for Open Innovation!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Major challenges to provide services for researchers
● Maintenance concerns - who will be in charge after project is finished?
● Infrastructure problems - how to install and run tools for researchers?
● Various Interoperability issues - how to leverage data exchange between
different systems and services
Software updates and bug fixing, licences, technical staff training, legal aspects
and so on...
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Installation Manual
https://guides.dataverse.org/en/latest/installation/index.html
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Docker module (CESSDA Dataverse, 2018)
Source: https://github.com/IQSS/dataverse-docker
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Kubernetes
Project maintained by Oliver Bertuch (FZ Julich) and available in Global
Dataverse Community Consortium github (GDCC)
Google Cloud, Amazon AWS, Microsoft Azure platforms supported
Open Source, community pull requests are welcome
http://github.com/IQSS/dataverse-kubernetes
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SSHOC project task 5.2 Hosting and sharing data repositories
• Makes use of Dataverse software
• 4 ERICs: DARIAH, CLARIN, EHRIS and CESSDA
• Building mature infrastructure based on requirements of involved communities
• Developing external applications integrated with Dataverse (Dataverse Store)
• Investigating sustainable governance models
• Training Service Providers and institutes how to use Dataverse as a service
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Data Commons is essential for integrations
Source: Merce Crosas, “Harvard Data Commons”
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
FAIR Dataverse
Source:
Mercè Crosas,
“FAIR principles and
beyond:
implementation in
Dataverse”
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Our goals to increase Dataverse interoperability
Provide a custom FAIR metadata schema for European research communities:
● CESSDA metadata (Consortium of European Social Science Data Archives)
● Component MetaData Infrastructure (CMDI) metadata from CLARIN
linguistics community
Connect metadata to ontologies and CVs:
● link metadata fields to common ontologies (Dublin Core, DCAT)
● define semantic relationships between (new) metadata fields (SKOS)
● select available external controlled vocabularies for the specific fields
● provide multilingual access to controlled vocabularies
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
SKOSMOS framework to discover ontologies
20
● SKOSMOS is developed in
Europe by the National Library
of Finland (NLF)
● active global user community
● search and browsing interface
for SKOS concept
● multilingual vocabularies
support
● used for different use cases
(publish vocabularies, build
discovery systems, vocabulary
visualization)
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
External CV support: metadata field could be linked to many ontologies
Language switch in Dataverse will change the language terms!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse App Store
Let’s build different services out of tools!
Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML,
Images, video render, audio, JSON, GeoJSON/Shapefiles/Map, XML
Interoperability: external controlled vocabularies (CESSDA CV Manager)
Data processing: NESSTAR DDI migration tool
Linked Data: RDF compliance (FAIR Data Point)
Federated login as a service (OAuth/Shibboleth in the same installation)
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Spreadsheet Previewer
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse and CLARIN tools integration
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Make Data Count metrics
Make Data Count is part of a broader Research Data Alliance (RDA) Data Usage Metrics
Working Group which helped to produce a specification called the COUNTER Code of
Practice for Research Data.
The following metrics can be downloaded directly from the DataCite hub for datasets hosted
by Dataverse installations:
● Total Views for a Dataset
● Unique Views for a Dataset
● Total Downloads for a Dataset
● Downloads for a Dataset
● Citations for a Dataset (via Crossref)
Dataverse Metrics API is a powerful source for BI tools used for the Data Landscape
monitoring.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse Metrics from 30+ repositories
Source: Metrics
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Multilingual support
Dataverse SSHOC will run Weblate as a service for the user interface,
metadata schema and SOLR translation.
We’ve developed an experimental but adjustable pipeline for multilingual
support that allows to download and synchronize all translations available in
Dataverse Consortium github and provides easy access for translators to
keep all properties up-to-date.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Dataverse localization with Weblate
● service to connect files to Weblate in order to
translate them in a structured way
● several options for project visibility: accept
translations by the crowd, or only give access
to a select group of translators.
● Weblate indicates untranslated strings,
strings with failing checks, and strings that
need approval.
● when new strings are added with an upgrade
of Dataverse, Weblate can indicate which
strings are new and untranslated.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
GUI translation with Weblate as a service
Source: SSHOC Weblate
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Services in European Open Science Cloud (EOSC)
● EOSC requires the level 8 of maturity (at
least)
● we need the highest quality of software to
be accepted as a service
● clear and transparent evaluation of
services is essential
● the evidence of technical maturity is the
key to success
● the limited warranty will allow to stop out-
of-warranty services
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Applications maturity level
Every software package should follow the same CESSDA Maturity Model to
be accepted as a service.
Must have: k8s infrastructure with upstream Docker images, warranty
statement, documentation, unit tests, Selenium tests, jenkins pipeline
Running demonstration service will allow to create the connection to your
own Dataverse.
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
CI/CD pipeline with SQAaaS (S)
1
2
3
git
push
Push GCP
container
registry
webhook
Create
docker
image
Kubernetes
Deployment
git clone
Jenkins pipeline (Jenkinsfile)
9
7
Run SQA
S 8
1. Developer pushes code to GitHub
2. Jenkins receives notification - build trigger
3. Jenkins clones the workspace
4. (S) Runs SQA tests and does FAIRness check
5. (S) Issuing digital badge according to the results
6. (S) SQAaaS API triggers appropriate workflow
7. Creates docker image if success
8. Pushes new docker image to container registry
9. Updates the kubernetes deployment
32
Source: EOSC Synergy project
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Who is going to benefit from SSHOC Dataverse project?
• (SSH) institutes and researchers will be offered a Dataverse installation
on the cloud
• (SSH) institutes will be offered a Dataverse archive in a box solution for
their own purposes
• Many of the features to be developed in SSHOC will benefit also other
Dataverse installations / communities
All developments will be available for Dataverse community members!
This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
Questions?
Slava Tykhonov (DANS-KNAW)
Senior Information Scientist
vyacheslav.tykhonov@dans.knaw.nl
Co-Chair:
Dataverse Working Group (WG) on Controlled Vocabularies and Ontologies
Dataverse WG on Registries
dataverse.org

More Related Content

PPTX
External controlled vocabularies support in Dataverse
 
PPTX
Technical integration of data repositories status and challenges
 
PPTX
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
PPTX
Controlled vocabularies and ontologies in Dataverse data repository
 
PPTX
CLARIN CMDI support in Dataverse
 
PPTX
Building COVID-19 Knowledge Graph at CoronaWhy
 
PPTX
5 years of Dataverse evolution
 
PPTX
The world of Docker and Kubernetes
 
External controlled vocabularies support in Dataverse
 
Technical integration of data repositories status and challenges
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Controlled vocabularies and ontologies in Dataverse data repository
 
CLARIN CMDI support in Dataverse
 
Building COVID-19 Knowledge Graph at CoronaWhy
 
5 years of Dataverse evolution
 
The world of Docker and Kubernetes
 

What's hot (20)

PPTX
Ontologies, controlled vocabularies and Dataverse
 
PPTX
Fighting COVID-19 with Artificial Intelligence
 
PPTX
External CV support in Dataverse 5.7
 
PPTX
Metaverse for Dataverse
 
PPTX
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
PPTX
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
PPTX
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
PPTX
Building COVID-19 Museum as Open Science Project
 
PPTX
Building an electronic repository and archives on Dataverse in the European O...
 
PPTX
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
PPTX
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
PPTX
Flexible metadata schemes for research data repositories - Clarin Conference...
PPTX
SSHOC Dataverse in the European Open Science Cloud
 
PPTX
Dataverse in the European Open Science Cloud
 
PDF
Dataverse opportunities
 
PPT
PPTX
LOD2 Webinar Series: 3rd relase of the Stack
PPT
LOD2 Webinar Series: D2R and Sparqlify
PDF
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
Ontologies, controlled vocabularies and Dataverse
 
Fighting COVID-19 with Artificial Intelligence
 
External CV support in Dataverse 5.7
 
Metaverse for Dataverse
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Building COVID-19 Museum as Open Science Project
 
Building an electronic repository and archives on Dataverse in the European O...
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Flexible metadata schemes for research data repositories - Clarin Conference...
SSHOC Dataverse in the European Open Science Cloud
 
Dataverse in the European Open Science Cloud
 
Dataverse opportunities
 
LOD2 Webinar Series: 3rd relase of the Stack
LOD2 Webinar Series: D2R and Sparqlify
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
Ad

Similar to Setting up Dataverse repository for research data (20)

PPTX
Dataverse repository for research data in the COVID-19 Museum
 
PPTX
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
PDF
DataCite and its Members: Connecting Research and Identifying Knowledge
PPTX
Decentralised identifiers and knowledge graphs
 
PPTX
FAIR Dataverse
 
PDF
FAIR Data Management and FAIR Data Sharing
PDF
Dataverse hpdm symposium
PDF
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
PDF
Data analysis in dataverse & visualization of datasets on historical maps
 
PPTX
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
 
PDF
Dataverse, Cloud Dataverse, and DataTags
PPTX
Research methods group accelarating impact by sharing data
PDF
Managing, Sharing and Curating Your Research Data in a Digital Environment
PPTX
DataverseNL as structured data hub
 
PDF
Dataverse as a FAIR Data Repository (Mercè Crosas)
PPTX
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
PPTX
Global Research Data Initiatives
PDF
Dealing with Data Diversity in a Smart City Data Hub
PDF
FAIR data_ Superior data visibility and reuse without warehousing.pdf
PPTX
Research data management: DMP & repository
Dataverse repository for research data in the COVID-19 Museum
 
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
DataCite and its Members: Connecting Research and Identifying Knowledge
Decentralised identifiers and knowledge graphs
 
FAIR Dataverse
 
FAIR Data Management and FAIR Data Sharing
Dataverse hpdm symposium
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
Data analysis in dataverse & visualization of datasets on historical maps
 
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
 
Dataverse, Cloud Dataverse, and DataTags
Research methods group accelarating impact by sharing data
Managing, Sharing and Curating Your Research Data in a Digital Environment
DataverseNL as structured data hub
 
Dataverse as a FAIR Data Repository (Mercè Crosas)
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Global Research Data Initiatives
Dealing with Data Diversity in a Smart City Data Hub
FAIR data_ Superior data visibility and reuse without warehousing.pdf
Research data management: DMP & repository
Ad

More from vty (7)

PPTX
Decentralisation and knowledge graphs
 
PPTX
Decentralised identifiers for CLARIAH infrastructure
 
PPTX
CLARIN CMDI use case and flexible metadata schemes
 
PPTX
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
PPTX
Data standardization process for social sciences and humanities
 
PPTX
Development in Dataverse SSHOC project
 
PPTX
DataverseEU as multilingual repository
 
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
 
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
 
DataverseEU as multilingual repository
 

Recently uploaded (20)

PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
2. Earth - The Living Planet earth and life
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
The scientific heritage No 166 (166) (2025)
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
famous lake in india and its disturibution and importance
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPT
protein biochemistry.ppt for university classes
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Biophysics 2.pdffffffffffffffffffffffffff
AlphaEarth Foundations and the Satellite Embedding dataset
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
2. Earth - The Living Planet earth and life
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Classification Systems_TAXONOMY_SCIENCE8.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
The scientific heritage No 166 (166) (2025)
ECG_Course_Presentation د.محمد صقران ppt
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
famous lake in india and its disturibution and importance
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
protein biochemistry.ppt for university classes
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Derivatives of integument scales, beaks, horns,.pptx
Introduction to Cardiovascular system_structure and functions-1
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS

Setting up Dataverse repository for research data

  • 1. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SSHOC Setting up Dataverse repository for research data Slava Tykhonov, Senior Information Scientist DANS-KNAW, The Royal Netherlands Academy of Arts and Sciences LIBSENSE webinar 10 March 2021
  • 2. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 About me: DANS-KNAW projects (2016-2021) ● CLARIAH+ (ongoing) ● EOSC Synergy (ongoing) ● SSHOC Dataverse (ongoing) ● CESSDA DataverseEU 2018 ● Time Machine Europe Supervisor at DANS-KNAW ● PARTHENOS Horizon 2020 ● CESSDA PID (Personal Identifiers) Horizon 2020 ● CLARIAH ● RDA (Research Data Alliance) PITTS Horizon 2020 ● CESSDA SaW H2020-EU.1.4.1.1 Horizon 2020 2 Source: LinkedIn
  • 3. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 About DANS-KNAW DANS is the Dutch national centre of expertise and repository for research data. We help researchers make their data available for reuse. This allows researchers to use the data for new research and makes published research verifiable and reproducible. With more than 150,000 datasets and a staff of 60, DANS is one of the leading repositories in Europe. Three pillars of DANS 2021-2025 programme: ‘Focus on FAIR” •Centre of expertise for FAIR research data •Versatile data repository: DANS data stations •Active collaborator
  • 4. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 What is Dataverse? ● Open source project developed by IQSS of Harvard University and published on github ● Great product with very long history (from 2006) ● Very dynamic and experienced development team working in the Agile environment (community call scheduled once in two weeks) ● Clear vision and understanding of research communities requirements, public roadmap ● Strong community behind of Dataverse is helping to improve the basic functionality and develop it further ● Dataverse has been selected as a data repository infrastructure by countries from all continents ● Well developed architecture with rich API endpoints to build application layers around Dataverse
  • 5. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Federated Dataverse data repositories worldwide Source: Merce Crosas, Harvard Data Commons
  • 6. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 DANS Dataverse 3.x migration (2016) Basic DataverseNL services: • Federated login for Netherlands institutions • Persistent Identifier Services (DOI and handle) • Integration with archival systems Applications: • Modern and historical world maps visualisations • Data API and Geo API services for projects with data • Panel datasets constructor • Time series plot • Treemaps • Pie and chart visualizations • Descriptive statistics tools
  • 7. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 DataverseNL collaborative data network Source: https://dataverse.nl
  • 8. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 DataverseNL partners
  • 9. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Cooperative Model DataverseNL DANS provides and manages the system & storage, organizes the meetings (back-office) •Research institutions run their own RDM-support for their researchers /end users (front-office) •Every partner institute is responsible for their own data •Shared costs (service membership + storage) •Advisory Board consisting of partner representatives decides on the general policy •Administrators committee to discuss technical and functional issues •Cooperation agreement, Service Level Agreement, Processor agreement (GDPR), General Terms of use (end users) Source: Marion Wittenberg, DataverseNL
  • 10. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 DANS Data Stations - Future Data Services Dataverse is API based and a key framework for Open Innovation!
  • 11. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Major challenges to provide services for researchers ● Maintenance concerns - who will be in charge after project is finished? ● Infrastructure problems - how to install and run tools for researchers? ● Various Interoperability issues - how to leverage data exchange between different systems and services Software updates and bug fixing, licences, technical staff training, legal aspects and so on...
  • 12. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Installation Manual https://guides.dataverse.org/en/latest/installation/index.html
  • 13. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Docker module (CESSDA Dataverse, 2018) Source: https://github.com/IQSS/dataverse-docker
  • 14. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Kubernetes Project maintained by Oliver Bertuch (FZ Julich) and available in Global Dataverse Community Consortium github (GDCC) Google Cloud, Amazon AWS, Microsoft Azure platforms supported Open Source, community pull requests are welcome http://github.com/IQSS/dataverse-kubernetes
  • 15. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782
  • 16. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SSHOC project task 5.2 Hosting and sharing data repositories • Makes use of Dataverse software • 4 ERICs: DARIAH, CLARIN, EHRIS and CESSDA • Building mature infrastructure based on requirements of involved communities • Developing external applications integrated with Dataverse (Dataverse Store) • Investigating sustainable governance models • Training Service Providers and institutes how to use Dataverse as a service
  • 17. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Data Commons is essential for integrations Source: Merce Crosas, “Harvard Data Commons”
  • 18. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 FAIR Dataverse Source: Mercè Crosas, “FAIR principles and beyond: implementation in Dataverse”
  • 19. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Our goals to increase Dataverse interoperability Provide a custom FAIR metadata schema for European research communities: ● CESSDA metadata (Consortium of European Social Science Data Archives) ● Component MetaData Infrastructure (CMDI) metadata from CLARIN linguistics community Connect metadata to ontologies and CVs: ● link metadata fields to common ontologies (Dublin Core, DCAT) ● define semantic relationships between (new) metadata fields (SKOS) ● select available external controlled vocabularies for the specific fields ● provide multilingual access to controlled vocabularies
  • 20. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 SKOSMOS framework to discover ontologies 20 ● SKOSMOS is developed in Europe by the National Library of Finland (NLF) ● active global user community ● search and browsing interface for SKOS concept ● multilingual vocabularies support ● used for different use cases (publish vocabularies, build discovery systems, vocabulary visualization)
  • 21. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 External CV support: metadata field could be linked to many ontologies Language switch in Dataverse will change the language terms!
  • 22. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse App Store Let’s build different services out of tools! Data preview: DDI Explorer, Spreadsheet/CSV, PDF, Text files, HTML, Images, video render, audio, JSON, GeoJSON/Shapefiles/Map, XML Interoperability: external controlled vocabularies (CESSDA CV Manager) Data processing: NESSTAR DDI migration tool Linked Data: RDF compliance (FAIR Data Point) Federated login as a service (OAuth/Shibboleth in the same installation)
  • 23. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Spreadsheet Previewer
  • 24. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse and CLARIN tools integration
  • 25. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Make Data Count metrics Make Data Count is part of a broader Research Data Alliance (RDA) Data Usage Metrics Working Group which helped to produce a specification called the COUNTER Code of Practice for Research Data. The following metrics can be downloaded directly from the DataCite hub for datasets hosted by Dataverse installations: ● Total Views for a Dataset ● Unique Views for a Dataset ● Total Downloads for a Dataset ● Downloads for a Dataset ● Citations for a Dataset (via Crossref) Dataverse Metrics API is a powerful source for BI tools used for the Data Landscape monitoring.
  • 26. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse Metrics from 30+ repositories Source: Metrics
  • 27. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Multilingual support Dataverse SSHOC will run Weblate as a service for the user interface, metadata schema and SOLR translation. We’ve developed an experimental but adjustable pipeline for multilingual support that allows to download and synchronize all translations available in Dataverse Consortium github and provides easy access for translators to keep all properties up-to-date.
  • 28. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Dataverse localization with Weblate ● service to connect files to Weblate in order to translate them in a structured way ● several options for project visibility: accept translations by the crowd, or only give access to a select group of translators. ● Weblate indicates untranslated strings, strings with failing checks, and strings that need approval. ● when new strings are added with an upgrade of Dataverse, Weblate can indicate which strings are new and untranslated.
  • 29. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 GUI translation with Weblate as a service Source: SSHOC Weblate
  • 30. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Services in European Open Science Cloud (EOSC) ● EOSC requires the level 8 of maturity (at least) ● we need the highest quality of software to be accepted as a service ● clear and transparent evaluation of services is essential ● the evidence of technical maturity is the key to success ● the limited warranty will allow to stop out- of-warranty services
  • 31. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Applications maturity level Every software package should follow the same CESSDA Maturity Model to be accepted as a service. Must have: k8s infrastructure with upstream Docker images, warranty statement, documentation, unit tests, Selenium tests, jenkins pipeline Running demonstration service will allow to create the connection to your own Dataverse.
  • 32. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 CI/CD pipeline with SQAaaS (S) 1 2 3 git push Push GCP container registry webhook Create docker image Kubernetes Deployment git clone Jenkins pipeline (Jenkinsfile) 9 7 Run SQA S 8 1. Developer pushes code to GitHub 2. Jenkins receives notification - build trigger 3. Jenkins clones the workspace 4. (S) Runs SQA tests and does FAIRness check 5. (S) Issuing digital badge according to the results 6. (S) SQAaaS API triggers appropriate workflow 7. Creates docker image if success 8. Pushes new docker image to container registry 9. Updates the kubernetes deployment 32 Source: EOSC Synergy project
  • 33. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Who is going to benefit from SSHOC Dataverse project? • (SSH) institutes and researchers will be offered a Dataverse installation on the cloud • (SSH) institutes will be offered a Dataverse archive in a box solution for their own purposes • Many of the features to be developed in SSHOC will benefit also other Dataverse installations / communities All developments will be available for Dataverse community members!
  • 34. This project is funded from the EU Horizon 2020 Research and Innovation Programme (2014-2020) under Grant Agreement No. 823782 Questions? Slava Tykhonov (DANS-KNAW) Senior Information Scientist vyacheslav.tykhonov@dans.knaw.nl Co-Chair: Dataverse Working Group (WG) on Controlled Vocabularies and Ontologies Dataverse WG on Registries dataverse.org