SlideShare a Scribd company logo
dans.knaw.nl
DANS is een instituut van KNAW en NWO
Building an electronic repository and archives on Dataverse
in the European Open Science Cloud
Vyacheslav Tykhonov
Senior Information Scientist
Data Archiving and Networked Services
(DANS-KNAW, Netherlands)
XVIII International Scientific and Practical conference
"BUILDING OF INFORMATION SOCIETY: RESOURCES AND TECHNOLOGIES"
September 19, 2019 in Kyiv
About me
• was born in Kyiv in 1979
• studied in the National Technical University of
Ukraine – Kyiv Polytechnic Institute (MSc,
2002)
• used to work for international search engines
companies and media monitoring agencies in
the past (1999-2010)
• started to work for the Royal Netherlands
Academy of Arts and Sciences (KNAW) in 2011
• Senior Data Scientist at DANS-KNAW from 2016
• currently leading the technical development of
DataverseEU cloud efforts in SSHOC Dataverse
and other projects
DANS-KNAW core services
Why Dataverse?
• Open source project developed by IQSS of Harvard University
and published on github
• Great product with very long history (from 2006)
• Very dynamic and experienced development team working in the
Agile environment (community call scheduled once in two weeks)
• Clear vision and understanding of research communities
requirements, public roadmap
• Strong community behind of Dataverse is helping to improve the
basic functionality and develop it further
• Dataverse has been selected as a data repository infrastructure
by countries from all continents
• Well developed architecture with rich API endpoints to build
application layers around Dataverse
Dataverse and API economy
Dataverse is data repository platform with 4 API endpoints:
- Native API
- SWORD API
- Search API
- Data Access API
API token is the key to connect Dataverse with unlimited
amount of tools developed by different research communities
and integrate it with other repositories.
DataverseNL as a shared service
Datasets container for Leiden University
DataverseNL as collaboration platform
• DataverseNL is a shared service provided by the participating institutions
and DANS. DANS performs back office tasks, including server and software
maintenance and administrative support.
• The participating institutions are responsible for managing the deposited
data and the content. Every institution has own data manager.
• User friendly:users at participating institutions simply log in and
DataverseNL will be ready for use.
• Reliable and safe: in cooperation with the participating institutions and
universities, standard procedures have been established which ensure
sound data management. Data are stored in the Netherlands.
• Accessible: the service can be accessed online, from anywhere and at any
time. Just open dataverse.nl!
Dataset submission form
Published dataset in Ukrainian
SSHOC DataverseEU project
SSHOC is Social Sciences and Humanities Open Cloud
The goal of SSHOC Dataverse project (CESSDA, DARIAH and CLARIN) is to create a reliable and production
ready Open Source data infrastructure that everybody can install and reuse for their own needs and
requirements.
We’re developing multilingual web interface and localizing metadata fields and developed data
standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager
services.
DataverseEU countries:
• Hungary (TARKI)
• Sweden(SND)
• Slovenia (ADP)
• Germany (GESIS)
• France (SciencesPro)
• Austria (AUSSDA)
• United Kingdom (UKDA)
• Italy (UniData)
• Belgium (SODA)
• Latvia (LSZDA)
• Netherlands (DANS-KNAW)
SSHOC Dataverse project has two parallel tracks of the development:
• Core development team is working on the modification and extension
of the Dataverse core functionality.
• The application development team will create new or will integrate
existent tools that will be published on Dataverse App Store website.
Our goal is to build the distributed and mature data infrastructure based on
sustainable microservices.
Development process
Maturity evaluation of DataverseEU services
• testing process should be compliant with CESSDA services maturity
model https://zenodo.org/record/2591055#.XKR6ny2B2u5
• every change of Dataverse functionality should be supplied with unit test,
changes of external functionality should get Selenium scenarios.
• the service should score as high as possible according to CESSDA
maturity model
Services in European Open Science Cloud (EOSC)
• EOSC requires the level 8 of
maturity (at least)
• we need the highest quality of
software to be accepted as a
service
• clear and transparent evaluation
of services is essential
• the evidence of technical maturity
is the key to success
• the limited warranty will allow to
stop out-of-warranty services
Research data management
Data standardization process plays a key role in the data
management plan of any organization but current situation in
research data management is very complex:
• too much data chaos in datasets
• no data transparency
• sometimes no standards available
• no provenance information attached to data
• homonyms, synonyms, generalizations, specializations,
spelling variations and mistakes, language versions are all
complicating the keyword-based search and retrieval of
information
Controlled vocabulary and thesaurus
• Linked data is one step forward (or actually backward in the right
direction) on solving some of standardization problems.
• By having shared controlled vocabularies (CV) created and
maintained by experts on various domains, the digital items can
be annotated with them and easily retrieved by other experts
from the same domain without being librarian. It’s clear
indication which vocabulary is good enough and shared by a
critical mass.
• A thesaurus is a semantic network of unique concepts, including
relationships between synonyms, broader and narrower
(parent/child) contexts, and other related concepts. Thesaurus is
hierarchy for controlled vocabularies.
CESSDA CV Service
External controlled vocabularies in Dataverse
Standardized metadata in Dataverse
Weblate as a multilingual support service
Managing translations with Weblate
Questions?
Contact me:
Slava Tykhonov
vyacheslav.tykhonov@dans.knaw.nl
https://www.linkedin.com/in/vyacheslavtikhonov/
https://twitter.com/4tykhonov
Watch SSHOC Dataverse presentation at Harvard University:
https://www.youtube.com/watch?v=vAPpKuDQUDY
Try now!
https://dataverse.harvard.edu and https://dataverse.nl
http://dataverse.org.ua (Ukrainian portal)
http://github.com/IQSS/dataverse (application source code)
http://github.com/IQSS/dataverse-docker (Cloud release for Kubernetes)

More Related Content

PPTX
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
PPTX
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
PPTX
Dataverse in the European Open Science Cloud
 
PPTX
Building COVID-19 Museum as Open Science Project
 
PPTX
Ontologies, controlled vocabularies and Dataverse
 
PPTX
Flexible metadata schemes for research data repositories - Clarin Conference...
PPTX
5 years of Dataverse evolution
 
PPTX
CLARIN CMDI use case and flexible metadata schemes
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Dataverse in the European Open Science Cloud
 
Building COVID-19 Museum as Open Science Project
 
Ontologies, controlled vocabularies and Dataverse
 
Flexible metadata schemes for research data repositories - Clarin Conference...
5 years of Dataverse evolution
 
CLARIN CMDI use case and flexible metadata schemes
 

What's hot (20)

PPTX
Setting up Dataverse repository for research data
 
PPTX
External CV support in Dataverse 5.7
 
PPTX
Building COVID-19 Knowledge Graph at CoronaWhy
 
PPTX
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
PPTX
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
PPTX
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
PPTX
The world of Docker and Kubernetes
 
PPTX
Technical integration of data repositories status and challenges
 
PPTX
Fighting COVID-19 with Artificial Intelligence
 
PPTX
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
PPTX
External controlled vocabularies support in Dataverse
 
PPTX
SSHOC Dataverse in the European Open Science Cloud
 
PPTX
Controlled vocabularies and ontologies in Dataverse data repository
 
PDF
Dataverse opportunities
 
PPTX
Metaverse for Dataverse
 
PPTX
DataverseNL as structured data hub
 
PPTX
Linked Open Data and DANS
 
PPTX
DataverseEU as multilingual repository
 
PPTX
Repository technologies
PDF
DSpace-CRIS: an open source solution - Cineca euroCRIS membership meeting Por...
Setting up Dataverse repository for research data
 
External CV support in Dataverse 5.7
 
Building COVID-19 Knowledge Graph at CoronaWhy
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
The world of Docker and Kubernetes
 
Technical integration of data repositories status and challenges
 
Fighting COVID-19 with Artificial Intelligence
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
External controlled vocabularies support in Dataverse
 
SSHOC Dataverse in the European Open Science Cloud
 
Controlled vocabularies and ontologies in Dataverse data repository
 
Dataverse opportunities
 
Metaverse for Dataverse
 
DataverseNL as structured data hub
 
Linked Open Data and DANS
 
DataverseEU as multilingual repository
 
Repository technologies
DSpace-CRIS: an open source solution - Cineca euroCRIS membership meeting Por...
Ad

Similar to Building an electronic repository and archives on Dataverse in the European Open Science Cloud (20)

PPTX
Decentralised identifiers and knowledge graphs
 
PPTX
Dataverse repository for research data in the COVID-19 Museum
 
PPTX
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
 
PPTX
FAIR Dataverse
 
PPTX
Data standardization process for social sciences and humanities
 
PDF
Sharing Advisory Board newsletter #8
PPTX
Data standardization process for social sciences and humanities
 
PPT
Information science in practice - research at a Trusted Digital Archive
PPTX
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
PDF
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
PDF
Data analysis in dataverse & visualization of datasets on historical maps
 
PDF
Alexandria winer20100623
PDF
Rob Davies : How we got here
PPTX
Research methods group accelarating impact by sharing data
PPT
Rio Info 2009 - Europeana - Bram van der Werf
PDF
Data Virtualization In The Cloud Era For I I Daniel Abadi Andrew Mott
PDF
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
PDF
Eun lre brussels_winer20100616
PPTX
Data sharing in the Netherlands
PPT
Drowning in information – the need of macroscopes for research funding
Decentralised identifiers and knowledge graphs
 
Dataverse repository for research data in the COVID-19 Museum
 
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
 
FAIR Dataverse
 
Data standardization process for social sciences and humanities
 
Sharing Advisory Board newsletter #8
Data standardization process for social sciences and humanities
 
Information science in practice - research at a Trusted Digital Archive
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
Data analysis in dataverse & visualization of datasets on historical maps
 
Alexandria winer20100623
Rob Davies : How we got here
Research methods group accelarating impact by sharing data
Rio Info 2009 - Europeana - Bram van der Werf
Data Virtualization In The Cloud Era For I I Daniel Abadi Andrew Mott
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Eun lre brussels_winer20100616
Data sharing in the Netherlands
Drowning in information – the need of macroscopes for research funding
Ad

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
A Presentation on Artificial Intelligence
PPT
Teaching material agriculture food technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Modernizing your data center with Dell and AMD
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Encapsulation theory and applications.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Cloud computing and distributed systems.
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A Presentation on Artificial Intelligence
Teaching material agriculture food technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Spectral efficient network and resource selection model in 5G networks
Modernizing your data center with Dell and AMD
Mobile App Security Testing_ A Comprehensive Guide.pdf
Empathic Computing: Creating Shared Understanding
The Rise and Fall of 3GPP – Time for a Sabbatical?
Approach and Philosophy of On baking technology
NewMind AI Monthly Chronicles - July 2025
Encapsulation theory and applications.pdf
Network Security Unit 5.pdf for BCA BBA.
Cloud computing and distributed systems.
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation

Building an electronic repository and archives on Dataverse in the European Open Science Cloud

  • 1. dans.knaw.nl DANS is een instituut van KNAW en NWO Building an electronic repository and archives on Dataverse in the European Open Science Cloud Vyacheslav Tykhonov Senior Information Scientist Data Archiving and Networked Services (DANS-KNAW, Netherlands) XVIII International Scientific and Practical conference "BUILDING OF INFORMATION SOCIETY: RESOURCES AND TECHNOLOGIES" September 19, 2019 in Kyiv
  • 2. About me • was born in Kyiv in 1979 • studied in the National Technical University of Ukraine – Kyiv Polytechnic Institute (MSc, 2002) • used to work for international search engines companies and media monitoring agencies in the past (1999-2010) • started to work for the Royal Netherlands Academy of Arts and Sciences (KNAW) in 2011 • Senior Data Scientist at DANS-KNAW from 2016 • currently leading the technical development of DataverseEU cloud efforts in SSHOC Dataverse and other projects
  • 4. Why Dataverse? • Open source project developed by IQSS of Harvard University and published on github • Great product with very long history (from 2006) • Very dynamic and experienced development team working in the Agile environment (community call scheduled once in two weeks) • Clear vision and understanding of research communities requirements, public roadmap • Strong community behind of Dataverse is helping to improve the basic functionality and develop it further • Dataverse has been selected as a data repository infrastructure by countries from all continents • Well developed architecture with rich API endpoints to build application layers around Dataverse
  • 5. Dataverse and API economy Dataverse is data repository platform with 4 API endpoints: - Native API - SWORD API - Search API - Data Access API API token is the key to connect Dataverse with unlimited amount of tools developed by different research communities and integrate it with other repositories.
  • 6. DataverseNL as a shared service
  • 7. Datasets container for Leiden University
  • 8. DataverseNL as collaboration platform • DataverseNL is a shared service provided by the participating institutions and DANS. DANS performs back office tasks, including server and software maintenance and administrative support. • The participating institutions are responsible for managing the deposited data and the content. Every institution has own data manager. • User friendly:users at participating institutions simply log in and DataverseNL will be ready for use. • Reliable and safe: in cooperation with the participating institutions and universities, standard procedures have been established which ensure sound data management. Data are stored in the Netherlands. • Accessible: the service can be accessed online, from anywhere and at any time. Just open dataverse.nl!
  • 10. Published dataset in Ukrainian
  • 11. SSHOC DataverseEU project SSHOC is Social Sciences and Humanities Open Cloud The goal of SSHOC Dataverse project (CESSDA, DARIAH and CLARIN) is to create a reliable and production ready Open Source data infrastructure that everybody can install and reuse for their own needs and requirements. We’re developing multilingual web interface and localizing metadata fields and developed data standardization technique based on APIs for CESSDA CVs, Topic Classification and CESSDA CV Manager services. DataverseEU countries: • Hungary (TARKI) • Sweden(SND) • Slovenia (ADP) • Germany (GESIS) • France (SciencesPro) • Austria (AUSSDA) • United Kingdom (UKDA) • Italy (UniData) • Belgium (SODA) • Latvia (LSZDA) • Netherlands (DANS-KNAW)
  • 12. SSHOC Dataverse project has two parallel tracks of the development: • Core development team is working on the modification and extension of the Dataverse core functionality. • The application development team will create new or will integrate existent tools that will be published on Dataverse App Store website. Our goal is to build the distributed and mature data infrastructure based on sustainable microservices. Development process
  • 13. Maturity evaluation of DataverseEU services • testing process should be compliant with CESSDA services maturity model https://zenodo.org/record/2591055#.XKR6ny2B2u5 • every change of Dataverse functionality should be supplied with unit test, changes of external functionality should get Selenium scenarios. • the service should score as high as possible according to CESSDA maturity model
  • 14. Services in European Open Science Cloud (EOSC) • EOSC requires the level 8 of maturity (at least) • we need the highest quality of software to be accepted as a service • clear and transparent evaluation of services is essential • the evidence of technical maturity is the key to success • the limited warranty will allow to stop out-of-warranty services
  • 15. Research data management Data standardization process plays a key role in the data management plan of any organization but current situation in research data management is very complex: • too much data chaos in datasets • no data transparency • sometimes no standards available • no provenance information attached to data • homonyms, synonyms, generalizations, specializations, spelling variations and mistakes, language versions are all complicating the keyword-based search and retrieval of information
  • 16. Controlled vocabulary and thesaurus • Linked data is one step forward (or actually backward in the right direction) on solving some of standardization problems. • By having shared controlled vocabularies (CV) created and maintained by experts on various domains, the digital items can be annotated with them and easily retrieved by other experts from the same domain without being librarian. It’s clear indication which vocabulary is good enough and shared by a critical mass. • A thesaurus is a semantic network of unique concepts, including relationships between synonyms, broader and narrower (parent/child) contexts, and other related concepts. Thesaurus is hierarchy for controlled vocabularies.
  • 20. Weblate as a multilingual support service
  • 22. Questions? Contact me: Slava Tykhonov vyacheslav.tykhonov@dans.knaw.nl https://www.linkedin.com/in/vyacheslavtikhonov/ https://twitter.com/4tykhonov Watch SSHOC Dataverse presentation at Harvard University: https://www.youtube.com/watch?v=vAPpKuDQUDY Try now! https://dataverse.harvard.edu and https://dataverse.nl http://dataverse.org.ua (Ukrainian portal) http://github.com/IQSS/dataverse (application source code) http://github.com/IQSS/dataverse-docker (Cloud release for Kubernetes)