SlideShare a Scribd company logo
Towards metrics to assess and
encourage FAIRness
1
Michel Dumontier, Ph.D.
Distinguished Professor of Data Science
@micheldumontier::FAIR@Elixir:2017-03-23
Principles that apply to all digital resources
and their metadata.
software, images, data, repositories, web services
@micheldumontier::FAIR@Elixir:2017-03-232
http://www.nature.com/articles/sdata201618
Horizon 2020: Data Management Plan
Section 2. FAIR data
1. Making data findable, including provisions for
metadata (5 questions
2. Making data openly accessible (10 questions)
3. Making data interoperable (4 questions)
4. Increase data re-use (through clarifying
licenses - 4 questions)
Additional sections:
1. Data summary (6 questions, 5 of which also
cover aspects of FAIRness)
2. Allocation of resources (4 questions)
3. Data security (2 questions)
4. Ethical aspects (2 questions)
5. Other issues (2 questions)
Total of 23 + 16 = 39 questions!!
@micheldumontier::FAIR@Elixir:2017-03-233
https://goo.gl/Strjua
Hypothesis
Improving the FAIRness of a digital
resource will increase its discovery and
reuse.
@micheldumontier::FAIR@Elixir:2017-03-234
Fundamental Questions
• What do we mean by FAIRness?
• In what ways can we assess the FAIRness of a digital
resource?
• To what degree can we automate this assessment?
• Must we treat each type of digital resource differently?
• Who will use the metrics? The producers, the funders, or
the users?
• Can one resource be more FAIR than another? Will/should
this impact funding decisions?
• Should only one organization define these metrics? Or can
anybody make their own metrics? What happens if a
digital resources scores well against one set of metrics, but
not another?
@micheldumontier::FAIR@Elixir:2017-03-235
What is FAIRness?
FAIRness reflects the extent to which a digital
resource addresses the FAIR principles as per the
expectations defined by a community of
stakeholders.
@micheldumontier::FAIR@Elixir:2017-03-236
What is a metric?
• A metric is a standard of measurement.
• It must provide clear definition of what is being
measured, why one wants to measure it.
• It must describe the process by which you
obtain a valid measurement result, so that it
can be reproduced by others. It needs to
specify what a valid result is.
@micheldumontier::FAIR@Elixir:2017-03-237
Example of a FAIRness Metric
F1 (meta)data are assigned a globally unique and persistent
identifier
Aspect: Identifier Persistence
Rationale: An identifier can be used to find, access, and reuse a
resource. As such, it must be available to users in the longest term
possible otherwise we will not be able to perform those functions with
the identifier in hand.
Relevant FAIR Principles: F,A,I,R
Metric: Availability of data management plan, which includes a section
dealing with continuity and contingencies related to the persistence of
identifiers. The value of the metric is true or false.
Procedure: Check and verify the URL in the resource metadata points to
a data management plan with continuity section. Document should
follow a community standard, or recommend a basic structure.
@micheldumontier::FAIR@Elixir:2017-03-238
NIH Commons Framework Working Group on
FAIR Metrics
Aim: To identify and prototype methods to
assess the FAIRness of a digital resource.
– Identify and include initial stakeholders
– Develop and discuss potential metrics
– Explore ways in which to report and assess
metrics.
@micheldumontier::FAIR@Elixir:2017-03-239
Current Thinking:
FAIRness Index
• A FAIRness Index is a collection of metrics that
are aligned to the FAIR principles and can be
consistently and transparently evaluated.
• A community, comprised of clearly defined
stakeholders (researchers, publishers, users,
etc), may define their own FAIRness Index
that expresses what makes a digital resource
ideally or maximally FAIR.
@micheldumontier::FAIR@Elixir:2017-03-2310
Stakeholders
People worried about
– Findability
– Accessibility
– Interoperability
– Reuse
– Provenance
– Licensing
– Citation
– Value
@micheldumontier::FAIR@Elixir:2017-03-2311
People who are
- Potential users
- Resource creators
- Academics
- Publishers
- Industry
- The public
- Funding agencies
Ways can we gather information to
assess FAIRness
A) Self assessment
B) Self-appointed FAIR Assessment Team
C) Automated assessment
D) Crowdsourcing
E) All of the above
@micheldumontier::FAIR@Elixir:2017-03-2312
• Is there structured metadata describing the resource?
– Check for embedded metadata as microdata or linked data
– Check for hyperlinked documents with standardized formats: HCLS dataset
description/DCAT schema.org annotations, etc
• Are entries identified with a persistent identifier?
– Is there a DOI with scholarly publications?
– Is there a permanent URL for each item (w/out query parameters)
– Is there a resource type specified, does it use a well known vocabulary such
as EDAM, identifiers.org, etc.
• Can the resource be found in a recognized repository?
– E.g. a database in Biosharing
– E.g. a tool in Elixir bio.tools
– E.g. gene expression data in GEO
• Can the resource be found with a web search engine?
– What rank does the resource appear at when using the identifier or title in a
web search?
@micheldumontier::FAIR@Elixir:2017-03-2313
Findable
Example FAIR Metrics
Accessible metrics
• Are the (meta)data accessible by permanent URL?
• Can you obtain the resource as a standardized language (e.g. HTML, XML, JSON, JSON-LD)?
• Are the data downloadable in bulk or in part with an application programming interface
(API)? Is the API documented using Swagger, smartAPI, or follow the Hydra protocol?
Interoperable metrics
• Are the (meta)data described with a community vocabulary?
• Are the data and metadata linked to other datasets, vocabularies and ontologies?
• Are the data and metadata expressed in universal languages (e.g. XML, JSON, JSON-LD,
RDF/XML)
Reusable metrics
• Is there a license specified? Is it a standardized license? Is it linked to in the resource
metadata?
• Is it clear how the work should be cited? See the FORCE11 Data Citation Implementation
Pilot and bioCADDIE Working Group 5.
• Is there any indication of reuse beyond its original context and original creators?
• Is there any indication of access through published statistics?
@micheldumontier::FAIR@Elixir:2017-03-2314
A first attempt!
• IDCC17 Practice Paper “Are the FAIR Data
Principles fair?” by Alastair Dunning,
Madelein de Smael, Jasmin Böhmer
• web-interfaces, help-pages and metadata-
records of over 40 data repositories were
examined to score the individual data
repository against the FAIR principles
• 2 months
@micheldumontier::FAIR@Elixir:2017-03-2315
Data: http://dx.doi.org/10.4121/uuid:5146dd06-98e4-426c-9ae5-dc8fa65c549f
Paper: https://zenodo.org/record/321423#.WNFNrTvytm8
37 repositories
@micheldumontier::FAIR@Elixir:2017-03-2316
Scoring the resources
@micheldumontier::FAIR@Elixir:2017-03-2317
Overall Evaluation
@micheldumontier::FAIR@Elixir:2017-03-2318
@micheldumontier::FAIR@Elixir:2017-03-2319
@micheldumontier::FAIR@Elixir:2017-03-2320
Summary of Study
• Offers an initial larger scale assessment
• Issues
– confusion about what is meant by each principle,
clarified after the study through discussion
– Fully manual effort, but AFAIK inter-annotator
agreement not established
– not easy to scale, can we automate it?
@micheldumontier::FAIR@Elixir:2017-03-2321
Metrics for Digital Repositories
• Data Seal of Approval
– 6 core requirements
– 16 criteria
• DIN31644: Information and documentation -
Criteria for trustworthy digital archives
– 10 core requirements
– 34 criteria
• ISO16363: : Audit and certification of trustworthy
digital repositories
– 100+ criteria
@micheldumontier::FAIR@Elixir:2017-03-2322
DSA
The data can be found on the Internet
The data are accessible (clear rights
and licences)
The data are in a usable format
The data are reliable
The data are identified in a unique and
persistent way so that they can be
referred to
@micheldumontier::FAIR@Elixir:2017-03-2323
DSA 16 requirements
1. mission to provide access to and preserve data
2. licenses covering data access and use and monitors compliance.
3. continuity plan
4. ensures that data created/used in compliance with norms.
5. adequate funding and qualified staff through clear governance
6. mechanism(s) for expert guidance and feedback
7. guarantees the integrity and authenticity of the data
8. accepts data and metadata to ensure relevance and understandability
9. applies documented processes in archival
10. responsibility for preservation that is documented.
11. expertise to address data and metadata quality
12. Archiving according to defined workflows.
13. enables discovery and citation.
14. enables reuse with appropriate metadata.
15. infrastructure
16. infrastructure
@micheldumontier::FAIR@Elixir:2017-03-2324
https://www.datasealofapproval.org
Data Seal of Approval
• self-assessment in the DSA online tool. The
online tool takes you through the
16 requirements and provides you with
support.
• Once you have completed your self-
assessment you can submit it for peer review.
@micheldumontier::FAIR@Elixir:2017-03-2325
• Score data on each FAIR dimension (e.g. from
1 to 5)
• Total score of FAIRness as an indicator of data
quality
• Scoring can only be partly automatic, not all
principles can be established objectively:
– scoring at ingest by data archivists of TDR
– after reuse by data users (community review)
@micheldumontier::FAIR@Elixir:2017-03-2326
From: https://dans.knaw.nl/nl/actueel/PresentationP.D..pdf
DANS FAIR metrics proposal
@micheldumontier::FAIR@Elixir:2017-03-2327
@micheldumontier::FAIR@Elixir:2017-03-2328
@micheldumontier::FAIR@Elixir:2017-03-2329
@micheldumontier::FAIR@Elixir:2017-03-2330
@micheldumontier::BD2K Metadata WG:16-10-201531
http://www.w3.org/TR/hcls-dataset/
http://hw-swel.github.io/Validata/
VALIDATA DEMO
@gray_alasdair www.macs.hw.ac.uk/~ajg3332
RDF constraint validation tool
Configurable to any profile
Declarative reusable schema description
Shape Expression (ShEx) constraints
Open source javascript implementation
michel.dumontier@maastrichtuniversity.nl
Website: http://dumontierlab.com
Presentations: http://slideshare.com/micheldumontier
33 @micheldumontier::FAIR@Elixir:2017-03-23
• Early stages of thinking about FAIR metrics and FAIR
indexes
• Lots of opportunities to explore different models
• Send me an email if you’re interested in
collaborating or participating in the working group
METRICS

More Related Content

PPTX
Advancing Biomedical Knowledge Reuse with FAIR
PPTX
Developing and assessing FAIR digital resources
PPTX
A Framework to develop the FAIR Metrics
PPTX
FAIR principles and metrics for evaluation
PDF
Evaluating FAIRness
PPTX
Building a Network of Interoperable and Independently Produced Linked and Ope...
PPTX
Data Science for the Win
PPTX
FAIR Data Knowledge Graphs
Advancing Biomedical Knowledge Reuse with FAIR
Developing and assessing FAIR digital resources
A Framework to develop the FAIR Metrics
FAIR principles and metrics for evaluation
Evaluating FAIRness
Building a Network of Interoperable and Independently Produced Linked and Ope...
Data Science for the Win
FAIR Data Knowledge Graphs

What's hot (20)

PPTX
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
PDF
DataCite and its Members: Connecting Research and Identifying Knowledge
PDF
FAIR Data Knowledge Graphs–from Theory to Practice
PPTX
Acclerating biomedical discovery with an internet of FAIR data and services -...
PDF
FAIR Data Management and FAIR Data Sharing
PPTX
BioPharma and FAIR Data, a Collaborative Advantage
PPTX
PDF
PA webinar on benefits & costs of FAIR implementation in life sciences
PDF
Preparing Data for Sharing: The FAIR Principles
PDF
Dataset Catalogs as a Foundation for FAIR* Data
PDF
"Cool" metadata for FAIR data
PPTX
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
PPTX
LIBER Webinar: Are the FAIR Data Principles really fair?
PDF
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
PPTX
The Missing Link: Giving Statistical Data Meaning
PDF
Some Frameworks for Improving Analytic Operations at Your Company
PDF
Data quality supporting AI in Life Sciences webinar 10 dec 2018
PDF
IC-SDV 2019: OntoChem
PDF
Big Data for Library Services (2017)
PPTX
Linked Data for Biopharma
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
DataCite and its Members: Connecting Research and Identifying Knowledge
FAIR Data Knowledge Graphs–from Theory to Practice
Acclerating biomedical discovery with an internet of FAIR data and services -...
FAIR Data Management and FAIR Data Sharing
BioPharma and FAIR Data, a Collaborative Advantage
PA webinar on benefits & costs of FAIR implementation in life sciences
Preparing Data for Sharing: The FAIR Principles
Dataset Catalogs as a Foundation for FAIR* Data
"Cool" metadata for FAIR data
From Data Policy Towards FAIR Data For All: How standardised data policies ca...
LIBER Webinar: Are the FAIR Data Principles really fair?
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
The Missing Link: Giving Statistical Data Meaning
Some Frameworks for Improving Analytic Operations at Your Company
Data quality supporting AI in Life Sciences webinar 10 dec 2018
IC-SDV 2019: OntoChem
Big Data for Library Services (2017)
Linked Data for Biopharma
Ad

Viewers also liked (20)

PDF
Ontologies
PPTX
2016 bmdid-mappings
PDF
Startups in Brazil and Latin America - SXSW 2017
PDF
Opportunities in solar business
PPTX
Envisioning a world where everyone helps solve disease
PPTX
Reproducible research: theory
PPT
Importance and Challenges of Reproducible Research
PDF
Keynote: Beth Noveck
PDF
GARNet workshop on Integrating Large Data into Plant Science
PPTX
API Governance – Modern API solutions in a digitalized world
PPTX
Network Biology: from lists to underpinnings of molecular behaviour
PPTX
DTL Partners Event - FAIR Data Tech overview - Day 1
PPTX
Credit where credit is due: acknowledging all types of contributions
PPT
Make Your API Catalog Essential with z/OS Connect EE
PDF
Crowdsourcing Linked Data Quality Assessment
PPTX
Science in the open, what does it take?
PPTX
Profissões do futuro [ou o futuro das Profissões?]
PPTX
Force11 JDDCP workshop presentation, @ Force2015, Oxford
PPTX
Making the most of phenotypes in ontology-based biomedical knowledge discovery
PPTX
Why z/OS is a Great Platform for Developing and Hosting APIs
Ontologies
2016 bmdid-mappings
Startups in Brazil and Latin America - SXSW 2017
Opportunities in solar business
Envisioning a world where everyone helps solve disease
Reproducible research: theory
Importance and Challenges of Reproducible Research
Keynote: Beth Noveck
GARNet workshop on Integrating Large Data into Plant Science
API Governance – Modern API solutions in a digitalized world
Network Biology: from lists to underpinnings of molecular behaviour
DTL Partners Event - FAIR Data Tech overview - Day 1
Credit where credit is due: acknowledging all types of contributions
Make Your API Catalog Essential with z/OS Connect EE
Crowdsourcing Linked Data Quality Assessment
Science in the open, what does it take?
Profissões do futuro [ou o futuro das Profissões?]
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Making the most of phenotypes in ontology-based biomedical knowledge discovery
Why z/OS is a Great Platform for Developing and Hosting APIs
Ad

Similar to Towards metrics to assess and encourage FAIRness (20)

PDF
Dataverse as a FAIR Data Repository (Mercè Crosas)
PPTX
Increasing the Reputation of your Published Data on the Web
PPTX
VODAN Africa IN.pptx
PPTX
Findable, Accessible, Interoperable and Reusable (FAIR) data
PPTX
The future of FAIR
PPTX
CARARE: Can I use this data? FAIR into practice
PDF
FAIRsharing and FAIRmetrics - RDA, March 2018
PPT
Webinar@AIMS_FAIR Principles and Data Management Planning
PPTX
Fair data vs 5 star open data final
PDF
FAIR Ddata in trustworthy repositories: the basics
PPTX
Essentials 4 Data Support: a fine course in FAIR Data Support
PPTX
NIH Data Summit - The NIH Data Commons
PDF
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
PPTX
Why institutions need to raise their capabilities to support FAIR
PPTX
FAIR data: what it means, how we achieve it, and the role of RDA
PDF
Dataverse, Cloud Dataverse, and DataTags
PPT
A Data Citation Roadmap for Scholarly Data Repositories
PPTX
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
PPTX
A coordinated framework for open data open science in Botswana/Simon Hodson
PPTX
DataONE Education Module 02: Data Sharing
Dataverse as a FAIR Data Repository (Mercè Crosas)
Increasing the Reputation of your Published Data on the Web
VODAN Africa IN.pptx
Findable, Accessible, Interoperable and Reusable (FAIR) data
The future of FAIR
CARARE: Can I use this data? FAIR into practice
FAIRsharing and FAIRmetrics - RDA, March 2018
Webinar@AIMS_FAIR Principles and Data Management Planning
Fair data vs 5 star open data final
FAIR Ddata in trustworthy repositories: the basics
Essentials 4 Data Support: a fine course in FAIR Data Support
NIH Data Summit - The NIH Data Commons
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Why institutions need to raise their capabilities to support FAIR
FAIR data: what it means, how we achieve it, and the role of RDA
Dataverse, Cloud Dataverse, and DataTags
A Data Citation Roadmap for Scholarly Data Repositories
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
A coordinated framework for open data open science in Botswana/Simon Hodson
DataONE Education Module 02: Data Sharing

More from Michel Dumontier (19)

PPTX
Generating (useful) synthetic data for medical research and AI application
PDF
FAIR & AI Ready KGs for Explainable Predictions.pdf
PPTX
FAIR & AI Ready KGs for Explainable Predictions
PPTX
A metadata standard for Knowledge Graphs
PPTX
Data-Driven Discovery Science with FAIR Knowledge Graphs
PPTX
The Role of the FAIR Guiding Principles for an effective Learning Health System
PPTX
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
PPTX
The role of the FAIR Guiding Principles in a Learning Health System
PPTX
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
PPTX
Are we FAIR yet? And will it be worth it?
PPTX
The Future of FAIR Data: An international social, legal and technological inf...
PDF
Keynote at the 2018 Maastricht University Dinner
PPTX
The future of science and business - a UM Star Lecture
PPTX
Are we FAIR yet?
PPTX
Model Organism Linked Data
PDF
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
PPTX
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
PDF
Link Analysis of Life Sciences Linked Data
PPTX
W3C HCLS Dataset Description Guidelines
Generating (useful) synthetic data for medical research and AI application
FAIR & AI Ready KGs for Explainable Predictions.pdf
FAIR & AI Ready KGs for Explainable Predictions
A metadata standard for Knowledge Graphs
Data-Driven Discovery Science with FAIR Knowledge Graphs
The Role of the FAIR Guiding Principles for an effective Learning Health System
CIKM2020 Keynote: Accelerating discovery science with an Internet of FAIR dat...
The role of the FAIR Guiding Principles in a Learning Health System
Accelerating Biomedical Research with the Emerging Internet of FAIR Data and ...
Are we FAIR yet? And will it be worth it?
The Future of FAIR Data: An international social, legal and technological inf...
Keynote at the 2018 Maastricht University Dinner
The future of science and business - a UM Star Lecture
Are we FAIR yet?
Model Organism Linked Data
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Link Analysis of Life Sciences Linked Data
W3C HCLS Dataset Description Guidelines

Recently uploaded (20)

PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
2Systematics of Living Organisms t-.pptx
. Radiology Case Scenariosssssssssssssss
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
ECG_Course_Presentation د.محمد صقران ppt
Placing the Near-Earth Object Impact Probability in Context
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Phytochemical Investigation of Miliusa longipes.pdf
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet Module 2ELS
bbec55_b34400a7914c42429908233dbd381773.pdf
TOTAL hIP ARTHROPLASTY Presentation.pptx
INTRODUCTION TO EVS | Concept of sustainability
Derivatives of integument scales, beaks, horns,.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
AlphaEarth Foundations and the Satellite Embedding dataset
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Classification Systems_TAXONOMY_SCIENCE8.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
2Systematics of Living Organisms t-.pptx

Towards metrics to assess and encourage FAIRness

  • 1. Towards metrics to assess and encourage FAIRness 1 Michel Dumontier, Ph.D. Distinguished Professor of Data Science @micheldumontier::FAIR@Elixir:2017-03-23
  • 2. Principles that apply to all digital resources and their metadata. software, images, data, repositories, web services @micheldumontier::FAIR@Elixir:2017-03-232 http://www.nature.com/articles/sdata201618
  • 3. Horizon 2020: Data Management Plan Section 2. FAIR data 1. Making data findable, including provisions for metadata (5 questions 2. Making data openly accessible (10 questions) 3. Making data interoperable (4 questions) 4. Increase data re-use (through clarifying licenses - 4 questions) Additional sections: 1. Data summary (6 questions, 5 of which also cover aspects of FAIRness) 2. Allocation of resources (4 questions) 3. Data security (2 questions) 4. Ethical aspects (2 questions) 5. Other issues (2 questions) Total of 23 + 16 = 39 questions!! @micheldumontier::FAIR@Elixir:2017-03-233 https://goo.gl/Strjua
  • 4. Hypothesis Improving the FAIRness of a digital resource will increase its discovery and reuse. @micheldumontier::FAIR@Elixir:2017-03-234
  • 5. Fundamental Questions • What do we mean by FAIRness? • In what ways can we assess the FAIRness of a digital resource? • To what degree can we automate this assessment? • Must we treat each type of digital resource differently? • Who will use the metrics? The producers, the funders, or the users? • Can one resource be more FAIR than another? Will/should this impact funding decisions? • Should only one organization define these metrics? Or can anybody make their own metrics? What happens if a digital resources scores well against one set of metrics, but not another? @micheldumontier::FAIR@Elixir:2017-03-235
  • 6. What is FAIRness? FAIRness reflects the extent to which a digital resource addresses the FAIR principles as per the expectations defined by a community of stakeholders. @micheldumontier::FAIR@Elixir:2017-03-236
  • 7. What is a metric? • A metric is a standard of measurement. • It must provide clear definition of what is being measured, why one wants to measure it. • It must describe the process by which you obtain a valid measurement result, so that it can be reproduced by others. It needs to specify what a valid result is. @micheldumontier::FAIR@Elixir:2017-03-237
  • 8. Example of a FAIRness Metric F1 (meta)data are assigned a globally unique and persistent identifier Aspect: Identifier Persistence Rationale: An identifier can be used to find, access, and reuse a resource. As such, it must be available to users in the longest term possible otherwise we will not be able to perform those functions with the identifier in hand. Relevant FAIR Principles: F,A,I,R Metric: Availability of data management plan, which includes a section dealing with continuity and contingencies related to the persistence of identifiers. The value of the metric is true or false. Procedure: Check and verify the URL in the resource metadata points to a data management plan with continuity section. Document should follow a community standard, or recommend a basic structure. @micheldumontier::FAIR@Elixir:2017-03-238
  • 9. NIH Commons Framework Working Group on FAIR Metrics Aim: To identify and prototype methods to assess the FAIRness of a digital resource. – Identify and include initial stakeholders – Develop and discuss potential metrics – Explore ways in which to report and assess metrics. @micheldumontier::FAIR@Elixir:2017-03-239
  • 10. Current Thinking: FAIRness Index • A FAIRness Index is a collection of metrics that are aligned to the FAIR principles and can be consistently and transparently evaluated. • A community, comprised of clearly defined stakeholders (researchers, publishers, users, etc), may define their own FAIRness Index that expresses what makes a digital resource ideally or maximally FAIR. @micheldumontier::FAIR@Elixir:2017-03-2310
  • 11. Stakeholders People worried about – Findability – Accessibility – Interoperability – Reuse – Provenance – Licensing – Citation – Value @micheldumontier::FAIR@Elixir:2017-03-2311 People who are - Potential users - Resource creators - Academics - Publishers - Industry - The public - Funding agencies
  • 12. Ways can we gather information to assess FAIRness A) Self assessment B) Self-appointed FAIR Assessment Team C) Automated assessment D) Crowdsourcing E) All of the above @micheldumontier::FAIR@Elixir:2017-03-2312
  • 13. • Is there structured metadata describing the resource? – Check for embedded metadata as microdata or linked data – Check for hyperlinked documents with standardized formats: HCLS dataset description/DCAT schema.org annotations, etc • Are entries identified with a persistent identifier? – Is there a DOI with scholarly publications? – Is there a permanent URL for each item (w/out query parameters) – Is there a resource type specified, does it use a well known vocabulary such as EDAM, identifiers.org, etc. • Can the resource be found in a recognized repository? – E.g. a database in Biosharing – E.g. a tool in Elixir bio.tools – E.g. gene expression data in GEO • Can the resource be found with a web search engine? – What rank does the resource appear at when using the identifier or title in a web search? @micheldumontier::FAIR@Elixir:2017-03-2313 Findable
  • 14. Example FAIR Metrics Accessible metrics • Are the (meta)data accessible by permanent URL? • Can you obtain the resource as a standardized language (e.g. HTML, XML, JSON, JSON-LD)? • Are the data downloadable in bulk or in part with an application programming interface (API)? Is the API documented using Swagger, smartAPI, or follow the Hydra protocol? Interoperable metrics • Are the (meta)data described with a community vocabulary? • Are the data and metadata linked to other datasets, vocabularies and ontologies? • Are the data and metadata expressed in universal languages (e.g. XML, JSON, JSON-LD, RDF/XML) Reusable metrics • Is there a license specified? Is it a standardized license? Is it linked to in the resource metadata? • Is it clear how the work should be cited? See the FORCE11 Data Citation Implementation Pilot and bioCADDIE Working Group 5. • Is there any indication of reuse beyond its original context and original creators? • Is there any indication of access through published statistics? @micheldumontier::FAIR@Elixir:2017-03-2314
  • 15. A first attempt! • IDCC17 Practice Paper “Are the FAIR Data Principles fair?” by Alastair Dunning, Madelein de Smael, Jasmin Böhmer • web-interfaces, help-pages and metadata- records of over 40 data repositories were examined to score the individual data repository against the FAIR principles • 2 months @micheldumontier::FAIR@Elixir:2017-03-2315 Data: http://dx.doi.org/10.4121/uuid:5146dd06-98e4-426c-9ae5-dc8fa65c549f Paper: https://zenodo.org/record/321423#.WNFNrTvytm8
  • 21. Summary of Study • Offers an initial larger scale assessment • Issues – confusion about what is meant by each principle, clarified after the study through discussion – Fully manual effort, but AFAIK inter-annotator agreement not established – not easy to scale, can we automate it? @micheldumontier::FAIR@Elixir:2017-03-2321
  • 22. Metrics for Digital Repositories • Data Seal of Approval – 6 core requirements – 16 criteria • DIN31644: Information and documentation - Criteria for trustworthy digital archives – 10 core requirements – 34 criteria • ISO16363: : Audit and certification of trustworthy digital repositories – 100+ criteria @micheldumontier::FAIR@Elixir:2017-03-2322
  • 23. DSA The data can be found on the Internet The data are accessible (clear rights and licences) The data are in a usable format The data are reliable The data are identified in a unique and persistent way so that they can be referred to @micheldumontier::FAIR@Elixir:2017-03-2323
  • 24. DSA 16 requirements 1. mission to provide access to and preserve data 2. licenses covering data access and use and monitors compliance. 3. continuity plan 4. ensures that data created/used in compliance with norms. 5. adequate funding and qualified staff through clear governance 6. mechanism(s) for expert guidance and feedback 7. guarantees the integrity and authenticity of the data 8. accepts data and metadata to ensure relevance and understandability 9. applies documented processes in archival 10. responsibility for preservation that is documented. 11. expertise to address data and metadata quality 12. Archiving according to defined workflows. 13. enables discovery and citation. 14. enables reuse with appropriate metadata. 15. infrastructure 16. infrastructure @micheldumontier::FAIR@Elixir:2017-03-2324 https://www.datasealofapproval.org
  • 25. Data Seal of Approval • self-assessment in the DSA online tool. The online tool takes you through the 16 requirements and provides you with support. • Once you have completed your self- assessment you can submit it for peer review. @micheldumontier::FAIR@Elixir:2017-03-2325
  • 26. • Score data on each FAIR dimension (e.g. from 1 to 5) • Total score of FAIRness as an indicator of data quality • Scoring can only be partly automatic, not all principles can be established objectively: – scoring at ingest by data archivists of TDR – after reuse by data users (community review) @micheldumontier::FAIR@Elixir:2017-03-2326 From: https://dans.knaw.nl/nl/actueel/PresentationP.D..pdf
  • 27. DANS FAIR metrics proposal @micheldumontier::FAIR@Elixir:2017-03-2327
  • 32. http://hw-swel.github.io/Validata/ VALIDATA DEMO @gray_alasdair www.macs.hw.ac.uk/~ajg3332 RDF constraint validation tool Configurable to any profile Declarative reusable schema description Shape Expression (ShEx) constraints Open source javascript implementation
  • 33. michel.dumontier@maastrichtuniversity.nl Website: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier 33 @micheldumontier::FAIR@Elixir:2017-03-23 • Early stages of thinking about FAIR metrics and FAIR indexes • Lots of opportunities to explore different models • Send me an email if you’re interested in collaborating or participating in the working group METRICS