SlideShare a Scribd company logo
Can I access and use this data?
FAIR into practice.
Hella Hollander, Head Data Archive DANS
• Quality (trustworthiness) of data repositories
• Quality (fitness for use) of datasets
• FAIR into practice
• Europeana and re-use of Cultural Heritage Data
What I will present?
Fit for Purpose?
Data Archiving and Networked Services
• Established in 2005
Predecessors dating back to 1964 (Steinmetz Foundation)
• Institute of the Royal Netherlands Academy of
Arts and Sciences (KNAW)
• Co-founded by the Netherlands Organization
for Scientific Research (NWO)
• Objective: permanent preservation of, and
enabling access to scientific research data
Institute of
Dutch Academy
and Research
Funding
Organisation
(KNAW & NWO)
since 2005
First predecessor
dates back to
1964 (Steinmetz
Foundation),
Historical Data
Archive 1989
Mission:
promote and
provide
permanent
access to digital
research
resources
DANS is about keeping data FAIR
DataverseNL
to support data
storage during
research until
10 years after
NARCIS
Portal
aggregating
research
information and
institutional
repositories
EASY
Certified
Long-term
Archive
DANS key services
https://dans.knaw.nl
DANS and DSA
• 2005: DANS to promote and provide permanent access to
digital research resources
• Formulate quality guidelines for digital repositories including
DANS
• 2006: 5 basic principles as basis for 16 DSA guidelines
• 2009: international DSA Board
• Almost 70 seals acquired around the globe, but with a focus
on Europe
The Certification Pyramid
ISO 16363:2012 - Audit and certification
of trustworthy digital repositories
http://www.iso16363.org/
DIN 31644 standard “Criteria for trustworthy
digital archives”
http://www.langzeitarchivierung.de
http://www.datasealofapproval.org/
https://www.icsu-wds.org/
DSA and WDS: look-a-likes
Communalities:
• Lightweight, self assessment, community review
Complementarity:
• Geographical spread
• Disciplinary spread
Partnership
Goals:
• Realizing efficiencies
• Simplifying assessment options
• Stimulating more certifications
• Increasing impact on the community
Outcomes:
• Common catalogue of requirements for core repository
assessment
• Common procedures for assessment
• Shared testbed for assessment
New common requirements: CoreTrustSeal
18 requirements:
• Context (1)
• Organizational infrastructure (6)
• Digital object management (8)
• Technology (2)
• Additional information and
applicant feedback (1)
Requirements (indirectly) dealing with data quality
R2. The repository maintains all applicable licenses covering data access and use and
monitors compliance.
R3. The repository has a continuity plan to ensure ongoing access to and preservation
of its holdings.
R4. The repository ensures, to the extent possible, that data are created, curated,
accessed, and used in compliance with disciplinary and ethical norms.
R7. The repository guarantees the integrity and authenticity of the data.
Requirements (indirectly) dealing with data quality
R8. The repository accepts data and metadata based on defined criteria to
ensure relevance and understandability for data users.
R10. The repository assumes responsibility for long-term preservation and
manages this function in a planned and documented way.
R11. The repository has appropriate expertise to address technical data and
metadata quality and ensures that sufficient information is available for end
users to make quality-related evaluations.
R13. The repository enables users to discover the data and refer to them in a
persistent way through proper citation.
R14. The repository enables reuse of the data over time, ensuring that
appropriate metadata are available to support the understanding and use of the
data.
Resemblance DSA – FAIR principles
DSA Principles (for data repositories) FAIR Principles (for data sets)
data can be found on the internet Findable
data are accessible Accessible
data are in a usable format Interoperable
data are reliable Reusable
data can be referred to (citable)
The resemblance is not perfect:
• usable format (DSA) is an aspect of interoperability (FAIR)
• FAIR explicitly addresses machine readability
• etc.
A certified TDR already offers a baseline data quality level
Implementing FAIR Principles
See: http://datafairport.org/fair-principles-living-document-menu and
https://www.force11.org/group/fairgroup/fairprinciples
FAIR Data Principles
In the FAIR Data approach, data should be:
Findable – Easy to find by both humans and computer systems and based on
mandatory description of the metadata that allow the discovery of interesting
datasets;
Accessible – Stored for long term such that they can be easily accessed and/or
downloaded with well-defined license and access conditions (Open Access when
possible), whether at the level of metadata, or at the level of the actual data
content;
Interoperable – Ready to be combined with other datasets by humans as well as
computer systems;
Reusable – Ready to be used for future research and to be processed further
using computational methods.
Accessible: Implementing FAIR
Examples:
• (Meta)data should be open as possible and closed as necessary
• Protected data and personal data must be available through a controlled and
documented procedure. Information that needs to be protected, for example for
privacy reasons, should not be part of the publicly accessible (meta)data but should
be recorded as part of the documentation of the resource in restricted contexts.
• In order to be fully accessible, research data should be fully accessible via (free)
exchange protocols.
• Maintain the integrity and quality of data. This is a general principle, that emerged in
particular from the interviews with historians. It refers to the necessity to maintain
the richness and the context of the data created and collected during time
16
Combine and operationalize: DSA & FAIR
• Growing demand for quality criteria for
research datasets and ways to assess their
fitness for use
• Combine the principles of core repository
certification and FAIR
• Use the principles as quality criteria:
• Core certification – digital repositories
• FAIR principles – research data (sets)
• Operationalize the principles as an
instrument to assess FAIRness of existing
datasets in certified TDRs
Different implementations of FAIR
Requirements for new data
creation
Establishing the profile for existing data
Transformation tools to make
data FAIR (Go-FAIR initiative)
FAIR badge scheme
• Proxy for data “quality” or “fitness
for (re-)use”
• Prevent interactions among
dimensions to ease scoring
• Consider Reusability as the
resultant of the other three:
– the average FAIRness as an indicator
of data quality
– (F+A+I)/3=R
• Manual and automatic scoring
F A I R
2 User Reviews
1 Archivist Assessment
24 Downloads
Some unresolved FAIR complications:
1. Dependencies among dimensions, difficulty to measure the criteria,
no rank order from “low” to “high” FAIRness, grouping of criteria
under dimensions is disputable
2. Do we need or want additional dimensions, principles or criteria?
• Is “openness” a separate dimension, not included in FAIR?
• Is it desirable/possible to say something about “substantive” data quality, such as
the accuracy/precision or correctness of the data?
• What about the long-term access? For how long does data remain FAIR?
• Should data security be included?
3. Several FAIR criteria can be solved at the level of the repository
4. Do we need separate FAIR criteria for different disciplines?
• e.g. machine actionable data are more important in some fields than in other;
note that data accessibility by machines is partly defined by technical specs (A1),
partly by licenses (R1.1)
First we attempted to operationalise R –
Reusable as well… but we changed our mind
Reusable – is it a separate dimension? Partly subjective: it
depends on what you want to use the data for!
Idea for operationalization Solution
R1. plurality of accurate and relevant attributes ≈ F2: “data are described
with rich metadata”  F
R1.1. clear and accessible data usage license  A
R1.2. provenance (for replication and reuse)  F
R1.3. meet domain-relevant community standards  I
Data is in a TDR – unsustained data will not remain usable Aspect of Repository  Data
Seal of Approval
Explication on how data was or can be used is available  F
Data is automatically usable by machines  I
Findable (defined by metadata (PID included) and documentation)
1. No PID nor metadata/documentation
2. PID without or with insufficient metadata
3. Sufficient/limited metadata without PID
4. PID with sufficient metadata
5. Extensive metadata and rich additional documentation available
Accessible (defined by presence of user license)
1. Metadata nor data are accessible
2. Metadata are accessible but data is not accessible (no clear terms of reuse in
license)
3. User restrictions apply (i.e. privacy, commercial interests, embargo period)
4. Public access (after registration)
5. Open access unrestricted
Interoperable (defined by data format)
1. Proprietary (privately owned), non-open format data
2. Proprietary format, accepted by Certified Trustworthy Data Repository
3. Non-proprietary, open format = ‘preferred format’
4. As well as in the preferred format, data is standardised using a standard
vocabulary format (for the research field to which the data pertain)
5. Data additionally linked to other data to provide context
Creating a FAIR data assessment tool
Using an online questionnaire system
Prototype:
https://www.surveymonkey.com/r/fairdat
Website FAIRDAT
• To contain FAIR data
assessments from any
repository or website,
linking to the location of
the data set via
(persistent) identifier
• The repository can show
the resultant badge,
linking back to the
FAIRDAT website
F A I R
2 User Reviews
1 Archivist
Assessment
24 Downloads
Neutral, Independent
Analogous to DSA website
Display FAIR badges in any repository (Zenodo,
Dataverse, Mendeley Data, figshare, B2SAFE, …)
Can FAIR Data Assessment be automatic?
Criterion Automatic?
Y/N/Semi
Subjective?
Y/N/Semi
Comments
F1 No PID / No Metadata Y N Solved by Repository
F2 PID / Insuff. Metadata S S Insufficient metadata is subjective
F3 No PID / Suff. Metadata S S Sufficient metadata is subjective
F4 PID / Sufficient Metadata S S Sufficient metadata is subjective
F5 PID / Rich Metadata S S Rich metadata is subjective
A1 No License / No Access Y N Solved by Repository
A2 Metadata Accessible Y N Solved by Repository
A3 User Restrictions Y N Solved by Repository
A4 Public Access Y N Solved by Repository
A5 Open Access Y N Solved by Repsoitory
I1 Proprietary Format S N Depends on list of proprietary formats
I2 Accepted Format S S Depends on list of accepted formats
I3 Archival Format S S Depends on list of archival formats
I4 + Harmonized N S Depends on domain vocabularies
I5 + Linked S N Depends on semantic methods used
Optional: qualitative assessment / data review
Open and FAIR Data in Trusted Data Repositories
Data does not only need to be Open
Data must also be FAIR
– Findable, Accessible, Interoperable, Reusable
– And must remains so, and therefore should be preserved in a DSA
Certified Trusted Digital Repository
Perfect Couple
FAIR principles for data quality
DSA criteria for quality of TDR
minimal set of community agreed guiding principles to make data more easily findable,
accessible, appropriately integrated and re-usable, and adequately citable.
• A perfect couple for quality assessment of research data and trustworthy data
repositories
• Ideally: a DSA certified archive will contain FAIR data
28
Europeana
CARARE: Can I use this data? FAIR into practice
Example in Europeana of NO Re-use
What does it mean: In copyright
DANS: Open Access registered users
Comparison Europeana and DANS
Europeana DANS
Public domain CC0
CC licenses CC-BY is very close to “Open
Access Registered Users”
In copyright All categories except CC0
Orphan Work Formally not existing
Unknown Exists
No Copyright NonCommercial Use only Not applicable
Europeana
In Copyright:
This work is protected by copyright and/or related rights.
Access/re-use: You are free to use this work in any way that is
permitted by the copyright and related rights legislation that
applies to your use. For other use you need to obtain
permission form the rights holder(s)
DANS
Open Access for registered users:
The objects/data are, without further restrictions, only made
available to all registered EASY users. Any existing copyrights
and/or database rights are respected.
Acces/re-use: You are free to use this work in any way that is
allowed by the copyright-and related rights legislation that
applies to your use, but only after user registration. A
registered user is permitted to cite the data in a limited way in
publications. For other use you need to obtain permission form
the rights holder(s).
Can I access and Use it?
Clarification DANS licence agreement:
It is allowed to:
copy a dataset for your own use
cite from the dataset in limited degree in publication with a
bibliographic reference to the dataset.
Not allowed to:
Distribute the dataset = (re)-publish the dataset as a whole
Can I access and use this data?
CARARE: Can I use this data? FAIR into practice
Thank you for listening!
Hella.hollander@dans.knaw.nl
www.dans.knaw.nl
http://www.dtls.nl/go-fair/
https://eudat.eu/events/webinar/fair-data-in-trustworthy-data-repositories-
webinar

More Related Content

PDF
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
PDF
Preparing Data for Sharing: The FAIR Principles
PPT
Umesha naik metadata
PPTX
FSCI Data Discovery
PPTX
ROER4D Open Data Initiative
PPTX
Fair data principles for AOASG
PPTX
Preparing research data for sharing
PPTX
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
Preparing Data for Sharing: The FAIR Principles
Umesha naik metadata
FSCI Data Discovery
ROER4D Open Data Initiative
Fair data principles for AOASG
Preparing research data for sharing

What's hot (18)

PPTX
Essentials 4 Data Support: a fine course in FAIR Data Support
PPTX
Increasing the Reputation of your Published Data on the Web
PPTX
LIBER Webinar: Are the FAIR Data Principles really fair?
PPTX
Shareable by Design: Making Better Use of your Research
PPT
Webinar@AIMS_FAIR Principles and Data Management Planning
PDF
FAIR Data Management and FAIR Data Sharing
PDF
"Cool" metadata for FAIR data
PPTX
Findable, Accessible, Interoperable and Reusable (FAIR) data
PPTX
FAIR data and data management
PPTX
Providing support and services for researchers in good data governance
PPTX
PDF
Horizon 2020 open access and open data mandates
PDF
Urm concept for sharing information inside of communities
PDF
Research data management : Open Research Data pilot, data management (plans),...
PDF
An ecosystem to support FAIR data
PPTX
PARTHENOS Common Policies and Implementation Strategies
PPTX
Data Management Planning for researchers
PPTX
Supporting the development of a national Research Data Discovery Service - A ...
Essentials 4 Data Support: a fine course in FAIR Data Support
Increasing the Reputation of your Published Data on the Web
LIBER Webinar: Are the FAIR Data Principles really fair?
Shareable by Design: Making Better Use of your Research
Webinar@AIMS_FAIR Principles and Data Management Planning
FAIR Data Management and FAIR Data Sharing
"Cool" metadata for FAIR data
Findable, Accessible, Interoperable and Reusable (FAIR) data
FAIR data and data management
Providing support and services for researchers in good data governance
Horizon 2020 open access and open data mandates
Urm concept for sharing information inside of communities
Research data management : Open Research Data pilot, data management (plans),...
An ecosystem to support FAIR data
PARTHENOS Common Policies and Implementation Strategies
Data Management Planning for researchers
Supporting the development of a national Research Data Discovery Service - A ...
Ad

Similar to CARARE: Can I use this data? FAIR into practice (20)

PPTX
OSFair2017 Training | FAIR metrics - Starring your data sets
PDF
FAIR Ddata in trustworthy repositories: the basics
PPTX
OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the ...
PPTX
Fair data vs 5 star open data final
PDF
Dataverse as a FAIR Data Repository (Mercè Crosas)
PPTX
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
PPTX
Kr slides fair astronomy 20181019
PPTX
RDM Training: Publish research data with the Research Data Repository
PPTX
Towards data FAIRness
PPTX
Towards metrics to assess and encourage FAIRness
PPTX
Data sharing in the Netherlands
PPTX
VODAN Africa IN.pptx
PPTX
Open Data: Strategies for Research Data Management (and Planning)
PPTX
FAIR data: what it means, how we achieve it, and the role of RDA
PPTX
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
PPT
Digital Curation 101 - Taster
PPTX
FAIRsharing - ENVRI-FAIR Webinar
PPTX
DTL Integrator's meeting
PPTX
Garret McMahon - Research Data Preservation
PPTX
David Van Enckevort - FAIR sample and data access
OSFair2017 Training | FAIR metrics - Starring your data sets
FAIR Ddata in trustworthy repositories: the basics
OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the ...
Fair data vs 5 star open data final
Dataverse as a FAIR Data Repository (Mercè Crosas)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
Kr slides fair astronomy 20181019
RDM Training: Publish research data with the Research Data Repository
Towards data FAIRness
Towards metrics to assess and encourage FAIRness
Data sharing in the Netherlands
VODAN Africa IN.pptx
Open Data: Strategies for Research Data Management (and Planning)
FAIR data: what it means, how we achieve it, and the role of RDA
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Digital Curation 101 - Taster
FAIRsharing - ENVRI-FAIR Webinar
DTL Integrator's meeting
Garret McMahon - Research Data Preservation
David Van Enckevort - FAIR sample and data access
Ad

More from CARARE (20)

PDF
Essential guide to 3D digitised heritage: part 3
PDF
Essential guide to 3D digital heritage: capturing and processing 3D data
PDF
Essential guide to 3D digitised heritage: Introduction to 3D
PPTX
3D and Cultural Heritage, Kate Fernie, CARARE
PPTX
Sharing with Europeana: Depositing and publishing 3D datasets for preservatio...
PDF
Archaeology data and non-archaeological professionals: Why do people need arc...
PDF
5Dculture: Improving the quality and promoting the reuse of 3D cultural herit...
PDF
Engaging with climate change through intelligent characters in historic scenes
PDF
Virtual Landscapes and their communities: Digital heritage for preservation, ...
PDF
Archaeological tools for documentation and communication
PDF
From Trenches to Timetravel: immersive 3-D reconstruction as a tool for proce...
PDF
Reimagining Pictish Heritage Interactive Digital Narratives and Contemporary ...
PPTX
Unlocking 3D Digital Heritage, Henk Alkemade and Kate Fernie (CARARE)
PDF
Archaeological Heritage in 3D as an Educational and responsible dissemination...
PDF
It is excavated and now what? How do we communicate our data to the community?
PPTX
Frameworks for narratives: towards interoperability for 3D and other media
PPTX
Europeana 3D
PDF
3D reconstructions for story telling and understanding
PDF
Speaking one language: how vocabularies can help organise information
PPSX
Exploiting vocabularies and Linked Data: in practice
Essential guide to 3D digitised heritage: part 3
Essential guide to 3D digital heritage: capturing and processing 3D data
Essential guide to 3D digitised heritage: Introduction to 3D
3D and Cultural Heritage, Kate Fernie, CARARE
Sharing with Europeana: Depositing and publishing 3D datasets for preservatio...
Archaeology data and non-archaeological professionals: Why do people need arc...
5Dculture: Improving the quality and promoting the reuse of 3D cultural herit...
Engaging with climate change through intelligent characters in historic scenes
Virtual Landscapes and their communities: Digital heritage for preservation, ...
Archaeological tools for documentation and communication
From Trenches to Timetravel: immersive 3-D reconstruction as a tool for proce...
Reimagining Pictish Heritage Interactive Digital Narratives and Contemporary ...
Unlocking 3D Digital Heritage, Henk Alkemade and Kate Fernie (CARARE)
Archaeological Heritage in 3D as an Educational and responsible dissemination...
It is excavated and now what? How do we communicate our data to the community?
Frameworks for narratives: towards interoperability for 3D and other media
Europeana 3D
3D reconstructions for story telling and understanding
Speaking one language: how vocabularies can help organise information
Exploiting vocabularies and Linked Data: in practice

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PPTX
Spectroscopy.pptx food analysis technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
Teaching material agriculture food technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Programs and apps: productivity, graphics, security and other tools
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
The AUB Centre for AI in Media Proposal.docx
Teaching material agriculture food technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
sap open course for s4hana steps from ECC to s4
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

CARARE: Can I use this data? FAIR into practice

  • 1. Can I access and use this data? FAIR into practice. Hella Hollander, Head Data Archive DANS
  • 2. • Quality (trustworthiness) of data repositories • Quality (fitness for use) of datasets • FAIR into practice • Europeana and re-use of Cultural Heritage Data What I will present? Fit for Purpose?
  • 3. Data Archiving and Networked Services • Established in 2005 Predecessors dating back to 1964 (Steinmetz Foundation) • Institute of the Royal Netherlands Academy of Arts and Sciences (KNAW) • Co-founded by the Netherlands Organization for Scientific Research (NWO) • Objective: permanent preservation of, and enabling access to scientific research data
  • 4. Institute of Dutch Academy and Research Funding Organisation (KNAW & NWO) since 2005 First predecessor dates back to 1964 (Steinmetz Foundation), Historical Data Archive 1989 Mission: promote and provide permanent access to digital research resources DANS is about keeping data FAIR
  • 5. DataverseNL to support data storage during research until 10 years after NARCIS Portal aggregating research information and institutional repositories EASY Certified Long-term Archive DANS key services https://dans.knaw.nl
  • 6. DANS and DSA • 2005: DANS to promote and provide permanent access to digital research resources • Formulate quality guidelines for digital repositories including DANS • 2006: 5 basic principles as basis for 16 DSA guidelines • 2009: international DSA Board • Almost 70 seals acquired around the globe, but with a focus on Europe
  • 7. The Certification Pyramid ISO 16363:2012 - Audit and certification of trustworthy digital repositories http://www.iso16363.org/ DIN 31644 standard “Criteria for trustworthy digital archives” http://www.langzeitarchivierung.de http://www.datasealofapproval.org/ https://www.icsu-wds.org/
  • 8. DSA and WDS: look-a-likes Communalities: • Lightweight, self assessment, community review Complementarity: • Geographical spread • Disciplinary spread
  • 9. Partnership Goals: • Realizing efficiencies • Simplifying assessment options • Stimulating more certifications • Increasing impact on the community Outcomes: • Common catalogue of requirements for core repository assessment • Common procedures for assessment • Shared testbed for assessment
  • 10. New common requirements: CoreTrustSeal 18 requirements: • Context (1) • Organizational infrastructure (6) • Digital object management (8) • Technology (2) • Additional information and applicant feedback (1)
  • 11. Requirements (indirectly) dealing with data quality R2. The repository maintains all applicable licenses covering data access and use and monitors compliance. R3. The repository has a continuity plan to ensure ongoing access to and preservation of its holdings. R4. The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms. R7. The repository guarantees the integrity and authenticity of the data.
  • 12. Requirements (indirectly) dealing with data quality R8. The repository accepts data and metadata based on defined criteria to ensure relevance and understandability for data users. R10. The repository assumes responsibility for long-term preservation and manages this function in a planned and documented way. R11. The repository has appropriate expertise to address technical data and metadata quality and ensures that sufficient information is available for end users to make quality-related evaluations. R13. The repository enables users to discover the data and refer to them in a persistent way through proper citation. R14. The repository enables reuse of the data over time, ensuring that appropriate metadata are available to support the understanding and use of the data.
  • 13. Resemblance DSA – FAIR principles DSA Principles (for data repositories) FAIR Principles (for data sets) data can be found on the internet Findable data are accessible Accessible data are in a usable format Interoperable data are reliable Reusable data can be referred to (citable) The resemblance is not perfect: • usable format (DSA) is an aspect of interoperability (FAIR) • FAIR explicitly addresses machine readability • etc. A certified TDR already offers a baseline data quality level
  • 14. Implementing FAIR Principles See: http://datafairport.org/fair-principles-living-document-menu and https://www.force11.org/group/fairgroup/fairprinciples
  • 15. FAIR Data Principles In the FAIR Data approach, data should be: Findable – Easy to find by both humans and computer systems and based on mandatory description of the metadata that allow the discovery of interesting datasets; Accessible – Stored for long term such that they can be easily accessed and/or downloaded with well-defined license and access conditions (Open Access when possible), whether at the level of metadata, or at the level of the actual data content; Interoperable – Ready to be combined with other datasets by humans as well as computer systems; Reusable – Ready to be used for future research and to be processed further using computational methods.
  • 16. Accessible: Implementing FAIR Examples: • (Meta)data should be open as possible and closed as necessary • Protected data and personal data must be available through a controlled and documented procedure. Information that needs to be protected, for example for privacy reasons, should not be part of the publicly accessible (meta)data but should be recorded as part of the documentation of the resource in restricted contexts. • In order to be fully accessible, research data should be fully accessible via (free) exchange protocols. • Maintain the integrity and quality of data. This is a general principle, that emerged in particular from the interviews with historians. It refers to the necessity to maintain the richness and the context of the data created and collected during time 16
  • 17. Combine and operationalize: DSA & FAIR • Growing demand for quality criteria for research datasets and ways to assess their fitness for use • Combine the principles of core repository certification and FAIR • Use the principles as quality criteria: • Core certification – digital repositories • FAIR principles – research data (sets) • Operationalize the principles as an instrument to assess FAIRness of existing datasets in certified TDRs
  • 18. Different implementations of FAIR Requirements for new data creation Establishing the profile for existing data Transformation tools to make data FAIR (Go-FAIR initiative)
  • 19. FAIR badge scheme • Proxy for data “quality” or “fitness for (re-)use” • Prevent interactions among dimensions to ease scoring • Consider Reusability as the resultant of the other three: – the average FAIRness as an indicator of data quality – (F+A+I)/3=R • Manual and automatic scoring F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads
  • 20. Some unresolved FAIR complications: 1. Dependencies among dimensions, difficulty to measure the criteria, no rank order from “low” to “high” FAIRness, grouping of criteria under dimensions is disputable 2. Do we need or want additional dimensions, principles or criteria? • Is “openness” a separate dimension, not included in FAIR? • Is it desirable/possible to say something about “substantive” data quality, such as the accuracy/precision or correctness of the data? • What about the long-term access? For how long does data remain FAIR? • Should data security be included? 3. Several FAIR criteria can be solved at the level of the repository 4. Do we need separate FAIR criteria for different disciplines? • e.g. machine actionable data are more important in some fields than in other; note that data accessibility by machines is partly defined by technical specs (A1), partly by licenses (R1.1)
  • 21. First we attempted to operationalise R – Reusable as well… but we changed our mind Reusable – is it a separate dimension? Partly subjective: it depends on what you want to use the data for! Idea for operationalization Solution R1. plurality of accurate and relevant attributes ≈ F2: “data are described with rich metadata”  F R1.1. clear and accessible data usage license  A R1.2. provenance (for replication and reuse)  F R1.3. meet domain-relevant community standards  I Data is in a TDR – unsustained data will not remain usable Aspect of Repository  Data Seal of Approval Explication on how data was or can be used is available  F Data is automatically usable by machines  I
  • 22. Findable (defined by metadata (PID included) and documentation) 1. No PID nor metadata/documentation 2. PID without or with insufficient metadata 3. Sufficient/limited metadata without PID 4. PID with sufficient metadata 5. Extensive metadata and rich additional documentation available Accessible (defined by presence of user license) 1. Metadata nor data are accessible 2. Metadata are accessible but data is not accessible (no clear terms of reuse in license) 3. User restrictions apply (i.e. privacy, commercial interests, embargo period) 4. Public access (after registration) 5. Open access unrestricted Interoperable (defined by data format) 1. Proprietary (privately owned), non-open format data 2. Proprietary format, accepted by Certified Trustworthy Data Repository 3. Non-proprietary, open format = ‘preferred format’ 4. As well as in the preferred format, data is standardised using a standard vocabulary format (for the research field to which the data pertain) 5. Data additionally linked to other data to provide context
  • 23. Creating a FAIR data assessment tool Using an online questionnaire system Prototype: https://www.surveymonkey.com/r/fairdat
  • 24. Website FAIRDAT • To contain FAIR data assessments from any repository or website, linking to the location of the data set via (persistent) identifier • The repository can show the resultant badge, linking back to the FAIRDAT website F A I R 2 User Reviews 1 Archivist Assessment 24 Downloads Neutral, Independent Analogous to DSA website
  • 25. Display FAIR badges in any repository (Zenodo, Dataverse, Mendeley Data, figshare, B2SAFE, …)
  • 26. Can FAIR Data Assessment be automatic? Criterion Automatic? Y/N/Semi Subjective? Y/N/Semi Comments F1 No PID / No Metadata Y N Solved by Repository F2 PID / Insuff. Metadata S S Insufficient metadata is subjective F3 No PID / Suff. Metadata S S Sufficient metadata is subjective F4 PID / Sufficient Metadata S S Sufficient metadata is subjective F5 PID / Rich Metadata S S Rich metadata is subjective A1 No License / No Access Y N Solved by Repository A2 Metadata Accessible Y N Solved by Repository A3 User Restrictions Y N Solved by Repository A4 Public Access Y N Solved by Repository A5 Open Access Y N Solved by Repsoitory I1 Proprietary Format S N Depends on list of proprietary formats I2 Accepted Format S S Depends on list of accepted formats I3 Archival Format S S Depends on list of archival formats I4 + Harmonized N S Depends on domain vocabularies I5 + Linked S N Depends on semantic methods used Optional: qualitative assessment / data review
  • 27. Open and FAIR Data in Trusted Data Repositories Data does not only need to be Open Data must also be FAIR – Findable, Accessible, Interoperable, Reusable – And must remains so, and therefore should be preserved in a DSA Certified Trusted Digital Repository
  • 28. Perfect Couple FAIR principles for data quality DSA criteria for quality of TDR minimal set of community agreed guiding principles to make data more easily findable, accessible, appropriately integrated and re-usable, and adequately citable. • A perfect couple for quality assessment of research data and trustworthy data repositories • Ideally: a DSA certified archive will contain FAIR data 28
  • 31. Example in Europeana of NO Re-use
  • 32. What does it mean: In copyright
  • 33. DANS: Open Access registered users
  • 34. Comparison Europeana and DANS Europeana DANS Public domain CC0 CC licenses CC-BY is very close to “Open Access Registered Users” In copyright All categories except CC0 Orphan Work Formally not existing Unknown Exists No Copyright NonCommercial Use only Not applicable
  • 35. Europeana In Copyright: This work is protected by copyright and/or related rights. Access/re-use: You are free to use this work in any way that is permitted by the copyright and related rights legislation that applies to your use. For other use you need to obtain permission form the rights holder(s)
  • 36. DANS Open Access for registered users: The objects/data are, without further restrictions, only made available to all registered EASY users. Any existing copyrights and/or database rights are respected. Acces/re-use: You are free to use this work in any way that is allowed by the copyright-and related rights legislation that applies to your use, but only after user registration. A registered user is permitted to cite the data in a limited way in publications. For other use you need to obtain permission form the rights holder(s).
  • 37. Can I access and Use it? Clarification DANS licence agreement: It is allowed to: copy a dataset for your own use cite from the dataset in limited degree in publication with a bibliographic reference to the dataset. Not allowed to: Distribute the dataset = (re)-publish the dataset as a whole
  • 38. Can I access and use this data?
  • 40. Thank you for listening! Hella.hollander@dans.knaw.nl www.dans.knaw.nl http://www.dtls.nl/go-fair/ https://eudat.eu/events/webinar/fair-data-in-trustworthy-data-repositories- webinar