SlideShare a Scribd company logo
FAIR EXPLAINED
Webinar FAIR Italy – February 27 2019
Luiz Bonino
FAIR PRINCIPLES: MANY WAYS TO LOOK AT THEM
FAIR PRINCIPLES
Findable:
F1. (meta)data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of accurate and
relevant attributes;
R1.1. (meta)data are released with a clear and accessible data
usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community
standards;
https://www.nature.com/articles/sdata201618
FAIR DATA PRINCIPLES - METADATA
Findable:
F1. metadata are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. metadata are registered or indexed in a searchable
resource;
Accessible:
A1. metadata are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. metadata use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. metadata use vocabularies that follow FAIR principles;
I3. metadata include qualified references to other metadata;
Reusable:
R1. metadata are richly described with a plurality of accurate and
relevant attributes;
R1.1. metadata are released with a clear and accessible data
usage license;
R1.2. metadata are associated with detailed provenance;
R1.3. metadata meet domain-relevant community standards;
https://www.nature.com/articles/sdata201618
FAIR DATA PRINCIPLES – DATA/DIGITAL RESOURCES
Findable:
F1. data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. metadata are registered or indexed in a searchable
resource;
Accessible:
A1. metadata are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. metadata use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. metadata use vocabularies that follow FAIR principles;
I3. metadata include qualified references to other (meta)data;
Reusable:
R1. metadata are richly described with a plurality of accurate and
relevant attributes;
R1.1. metadata are released with a clear and accessible data
usage license;
R1.2. metadata are associated with detailed provenance;
R1.3. metadata meet domain-relevant community standards;
https://www.nature.com/articles/sdata201618
FAIR DATA PRINCIPLES – SUPPORTING ELEMENTS
Findable:
F1. (meta)data are assigned a globally unique and
persistent identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier
of the data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1. the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no
longer available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation;
I2. (meta)data use vocabularies that follow FAIR
principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of
accurate and relevant attributes;
R1.1. (meta)data are released with a clear and
accessible data usage license;
R1.2. (meta)data are associated with detailed
provenance;
R1.3. (meta)data meet domain-relevant community
standards;
https://www.nature.com/articles/sdata201618
REPOSITORIES SUPPORTING USERS TO ACHIEVE FAIR
Findable:
F1. (meta)data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of the
data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no longer
available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly
applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. (meta)data are richly described with a plurality of accurate and
relevant attributes;
R1.1. (meta)data are released with a clear and accessible data
usage license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community
standards;
FAIR PRINCIPLES DETAILED
FAIR PRINCIPLES
 What does it mean?
 We need an identification mechanism, e.g., PID, PURL, DOI, …
 This mechanism needs to guarantee global uniqueness of the issued identifier, i.e., every time a given
identifier is called, the same resource is points to is retrieved
 This mechanism needs to guarantee persistency of the issued identifier, i.e., what happens when the
identifier scheme is changed?
 What do we need to fulfill this principle?
 How to describe the used identification mechanism?
 How to properly identify the identifier service? I.e., what is the commonly agreed vocabulary that can
represent that a given piece of information is the identifier of a digital resource?
 What is the uniqueness policy?
 How to represent the policy in a computer-actionable way?
 What is the required content of the policy, e.g., uniqueness mechanism?
 What is the persistency policy?
 How to represent the persistency policy in a computer-actionable way?
 What is the required content of the policy, e.g., persistency over updates of the mechanism?
 What is resolved by sending a request to the identifier, the actual digital resource, its metadata, etc.? I.e,
what is the protocol for getting the actual digital resource from its identifier?
F1. (meta)data are assigned a globally unique and persistent identifier
FAIR PRINCIPLES
 What does it mean?
 If we don’t have the identifier, the digital resource should be described with rich enough metadata that we
can find it through the combination of the items in this metadata
 What do we need to fulfill this principle?
 For different types of digital resources, what are the minimal metadata elements that provide this richness?
 How to describe the metadata in a commonly agreed and computer-actionable way? By using a common way
to represent metadata, tools can be made that are able to interpret metadata from any kind of digital
resource.
F2. data are described with rich metadata
FAIR PRINCIPLES
 What does it mean?
 The discovery of a digital resource should be possible from its metadata. For this to happen, the metadata
must explicitly contain the identifier for the digital resource it describes.
 What do we need to fulfill this principle?
 How to differentiate the information about the digital resource’s identifier and the one about its metadata
identifier? I.e., a metadata record contains two identifiers, of itself (the metadata record) and of the data
they describe. What is the vocabulary that contains concepts to describe a metadata identifier and digital
resource they describe
F3. metadata clearly and explicitly include the identifier of the data it
describes
FAIR PRINCIPLES
 What does it mean?
 Most people use a search engine to initiate a search for a particular digital resource. If the resource or its
metadata are not index in a searchable resource, the capability for an individual to find it is substantially
reduced.
 What do we need to fulfill this principle?
 For the data part, a full indexing is equivalent to allowing complete and direct querying on the data, which
may not be feasible every time. An intermediate step would be to select a number of relevant parts of the
data to be highlighted by their metadata, which would be indexed. E.g., in a dataset describing gene
information, it may be relevant to allow the indexing of the unique genes that the dataset has information
about.
 Search engines benefit from common interfaces (or at least interfaces that are described in a commonly
agreed way) to allow the harvesting of the elements (metadata and/or data) to be indexed.
 A common representation format for the metadata also improves the possibility of different searchable
resources to index the metadata records.
F4. (meta)data are registered or indexed in a searchable resource
FAIR PRINCIPLES
 What does it mean?
 In order to access a digital resource, the requestor needs to be able to implement the used communication
protocol. Therefore, this protocol should be open, free and universally implementable. Moreover, the
protocol should also describe whether authentication and authorization mechanisms are required.
 What do we need to fulfill this principle?
 How to describe the communication (accessibility) protocol?
 What are the elements required in the description of the communication (accessibility) protocol, including
the authentication and authorization procedure?
 How to demonstrate that the protocol is open, free and universally implementable?
A1. (meta)data are retrievable by their identifier using a standardized
communications protocol;
A1.1 the protocol is open, free, and universally implementable;
A1.2. the protocol allows for an authentication and authorization
procedure, where necessary;
FAIR PRINCIPLES
 What does it mean?
 Cross-reference to data from third-party’s FAIR data and metadata will naturally degrade over time.
Therefore, it is important for FAIR providers to continue to provide descriptors of what the data was to
assist in the continued interpretation of those third-party data.
 What do we need to fulfill this principle?
 How to guarantee long-term persistency of the metadata?
 How to describe that the data (digital resource) referred by the metadata are no longer accessible? Is it
necessary to inform why?
 How to harmonize the persistency of the metadata with the GDPR’s “right to be forgotten”?
A2. metadata are accessible, even when the data are no longer
available
FAIR PRINCIPLES
 What does it mean?
 Cross-reference to data from third-party’s FAIR data and metadata will naturally degrade over time.
Therefore, it is important for FAIR providers to continue to provide descriptors of what the data was to
assist in the continued interpretation of those third-party data.
 What do we need to fulfill this principle?
 How to guarantee long-term persistency of the metadata?
 How to describe that the data (digital resource) referred by the metadata are no longer accessible? Is it
necessary to inform why?
 How to harmonize the persistency of the metadata with the GDPR’s “right to be forgotten”?
A2. metadata are accessible, even when the data are no longer
available
FAIR PRINCIPLES
 What does it mean?
 The digital resource is described using a formal, accessible, shared and broadly applicable language

 What do we need to fulfill this principle?
 How to inform the language used to represent the digital object?
 How to provide this information for the metadata? In a meta-metadata?
 How to demonstrate the formality (BNF), accessibility (resolution of the language description document),
shareability and broad applicability of the language (IANA media-type?)?
I1. (meta)data use a formal, accessible, shared, and broadly applicable
language for knowledge representation
FAIR PRINCIPLES
 What does it mean?
 The digital resource is described using a formal, accessible, shared and broadly applicable language

 What do we need to fulfill this principle?
 How to inform the language used to represent the digital object?
 How to provide this information for the metadata? In a meta-metadata?
 How to demonstrate the formality (BNF), accessibility (resolution of the language description document),
shareability and broad applicability of the language (IANA media-type?)?
I1. (meta)data use a formal, accessible, shared, and broadly applicable
language for knowledge representation
FAIR PRINCIPLES
 What does it mean?
 The metadata values and qualified relations should themselves be FAIR
 What do we need to fulfill this principle?
 Inform which vocabularies are used
 What is the minimal FAIRness for these vocabularies to be considered to follow FAIR principles?
I2. (meta)data use vocabularies that follow FAIR principles
FAIR PRINCIPLES
 What does it mean?
 Relationships within digital resources and between local and third-party data, have explicit and “useful”
semantic meaning
 What do we need to fulfill this principle?
 Qualify (provide proper semantics) the references to other digital resources
 As per I2, these references (and their qualifiers) should also be FAIR
I3. (meta)data include qualified references to other (meta)data
FAIR PRINCIPLES
 What does it mean?
 Digital resources should inform who has which rights under which circumstances (license), what is their
provenance and use relevant standards adopt by the community in which the resource has been
created/used
 What do we need to fulfill this principle?
 Inform the usage license:
 What representation format can be used for a computer-actionable license description?
 What are the required concerns that should be present in this description (rights, conditions, …)?
R1. (meta)data are richly described with a plurality of accurate and
relevant attributes;
R1.1. (meta)data are released with a clear and accessible data usage
license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community standards;
FAIR PRINCIPLES
 What do we need to fulfill this principle?
 Inform the digital resource’s provenance information:
 What are the core provenance information?
 What are the community-specific provenance information?
 How to represent provenance? Which vocabularies to use?
 Inform the relevant community standards used by the digital resource (certification):
 How to describe which standards are used?
 How to describe compliance to these standards?
 How to demonstrate that the standards are accepted by a given community?
R1. (meta)data are richly described with a plurality of accurate and
relevant attributes;
R1.1. (meta)data are released with a clear and accessible data usage
license;
R1.2. (meta)data are associated with detailed provenance;
R1.3. (meta)data meet domain-relevant community standards;
FAIRIFICATION PROCESS
CURRENT SITUATION
Retrieve
non-FAIR
data
Analyse and
prepare
datasets
Combine with
other data
Query combined
data
CURRENT SITUATION
Retrieve
non-FAIR
data
Analyse and
prepare
datasets
Combine with
other data
Query combined
data
FAIRIFICATION WORKFLOW
Retrieve
non-FAIR
data
Download/locate file
Identify API call
Identify data access protocol
FAIRIFICATION WORKFLOW
Retrieve
non-FAIR
data
Analyse and
prepare
datasets
Standard format (XML, RDF, relational DB API, VCF, DICOM, etc.)?
What is the content?
Column/field names?
Relations?
Data domain and range?
Understand the data
Data munging
…
FAIRIFICATION WORKFLOW
Retrieve
non-FAIR
data
Define
semantic
model
What are the concepts involved?
Relations among concepts?
Existing vocabularies for the concepts and instances?
Analyse and
prepare
datasets
Interoperability
Reusability
FAIRIFICATION WORKFLOW
Retrieve
non-FAIR
data
Define
semantic
model
Make
data
linkable
Apply the semantic model on the
original data to make it linkable
Analyse and
prepare
datasets
Interoperability
Reusability
FAIRIFICATION WORKFLOW
Retrieve
non-FAIR
data
Define
semantic
model
Make
data
linkable
Assign
license
Who can access/reuse the data?
Under which conditions?
Analyse and
prepare
datasets
Accessibility
Reusability
FAIRIFICATION WORKFLOW
Retrieve
non-FAIR
data
Define
semantic
model
Make
data
linkable
Assign
license
Define
metadata
Authors
Version
Distributions
Provenance
…
Analyse and
prepare
datasets
Findability
Accessibility
Interoperability
Reusability
FAIRIFICATION WORKFLOW
Retrieve
non-FAIR
data
Define
semantic
model
Make
data
linkable
Assign
license
Define
metadata
Deploy
FAIR data
resource
How to make the data and me
tadata available in a FAIR way?
Analyse and
prepare
datasets
Findability
Accessibility
Interoperability
Reusability
FAIRIFICATION WORKFLOW
Retrieve
non-FAIR
data
Define
semantic
model
Make
data
linkable
Assign
license
Define
metadata
Deploy
FAIR data
resource
Combine with
other FAIR data
Query combined
data
Analyse and
prepare
datasets
FAIRNESS ASSESSMENT CHALLENGES
WHY TO ASSESS?
 Because everybody is talking about FAIR and my resources should be
seen as FAIR, whatever this means?
 To satisfy funders requirements?
 To serve as a guideline for achieving higher levels of interoperability
and reuse with clarity on the concrete benefits (help improve)?
WHAT TO ASSESS?
 Metadata and data?
 Only metadata?
 Only data?
 What do you mean by data?
 In the FAIR principles, data refers to a variety of different resources, e.g.,
“traditional” data, services, software, APIs, vocabularies, ontologies, articles, etc.
HOW TO ASSESS?
 Manual
 Takes advantage of human understandable artifacts, which are currently
prevalent
 May lead to subjective assessments and, therefore, harder to compare
resources
 Harder to scale
 Harder to evaluate FAIR for machines, which is the main goal of the FAIR
principles
 Automatic
 Requires more rigor on the assessed resources
 More likely to produce objective assessments
 Easier to scale
 Able to check if machines can, in fact, “work” with the (meta)data
HOW TO “READ” THE ASSESSMENTS?
 Need for a scoring system
 One score for the 4 aspects of FAIR? Does not seem useful.
 One score per aspect (F, A, I and R)?
 One score per principle? What about the sub-principles?
 Is there a hierarchy among the principles? Is there an order of precedence? Or different
weights?
 Is there an acceptable minimal FAIR level? Should it be across domains and applications
or domain/community-dependent?
 Do we use a pass/fail approach or introduce intermediary compliance levels in
each/some evaluation?
 Need for a visual representation of the scores
 To facilitate quick perception of the FAIRness level, a visual representation of the FAIR
scores is required, e.g., stars, bars, etc.
GENERAL CHALLENGES
 Clarify that nobody has been asked to be 100% FAIR. Many times a lower
FAIRness level is perfectly adequate.
 How to deal with the conflicting forces that, from one side want to push
the communities towards a better (and FAIRer) data landscape and,
from the other side, want to preserve the status quo (existing
“kingdoms”) but labeling themselves FAIR?
 Who will define the assessment criteria?
 Who will execute the assessments based on the defined criteria?
 Should we have a unique set of assessment criteria? Or a core set for
general comparison and domain-specific sets on top of the core for the
specific needs of a given domain/application?
OTHER CHALLENGES
 FAIR should be used as a guideline for achieving higher levels of interoperability and
reuse with clarity on the concrete benefits.
 Improvements on FAIRness can be done incrementally, e.g., first deal with metadata
then go for data.
 We need a combination of (FAIR) infrastructure and (FAIR) community practices.
 How to deal with other aspects not covered by the FAIR principles, e.g., openness,
quality, …?
Q&A – CONTACT INFO
Luiz Bonino
International Technology Coordinator – GO FAIR
Associate Professor BioSemantics – LUMC
E-mail: luiz.bonino@go-fair.org
Skype: luizolavobonino
Web: www.go-fair.org

More Related Content

PPTX
Achieving FAIR from a repository perspective
PPTX
My repository is FAIR!!! What does it mean?
PPTX
PPTX
PPTX
PDF
An ecosystem to support FAIR data
PPTX
DTL Integrator's meeting
PPTX
Why institutions need to raise their capabilities to support FAIR
Achieving FAIR from a repository perspective
My repository is FAIR!!! What does it mean?
An ecosystem to support FAIR data
DTL Integrator's meeting
Why institutions need to raise their capabilities to support FAIR

What's hot (8)

PDF
Towards FAIR principles for research software @ FAIR Software Session, Nation...
PPTX
FAIR Ecosystem - Health RI 2017
PPTX
Making data FAIR using InterMine
PPTX
Metadata
PPTX
Linked Data and Users in Library - Does the library communicate efficiently?
PDF
Linked Data for the Masses: The approach and the Software
PPT
香港六合彩
PDF
Modified query roles based access
Towards FAIR principles for research software @ FAIR Software Session, Nation...
FAIR Ecosystem - Health RI 2017
Making data FAIR using InterMine
Metadata
Linked Data and Users in Library - Does the library communicate efficiently?
Linked Data for the Masses: The approach and the Software
香港六合彩
Modified query roles based access
Ad

Similar to FAIR Explained (20)

PPTX
PPTX
Towards cross-domain interoperation in the internet of FAIR data and services
PDF
04 findable imming
PPTX
VODAN Africa IN.pptx
PPTX
Increasing the Reputation of your Published Data on the Web
PDF
FAIR-Principles-and-ELN.pdf
PDF
Dataverse as a FAIR Data Repository (Mercè Crosas)
PPTX
Fair traits data 20180517
PPTX
Metadata modeling principles for life.pptx
PPTX
Towards metrics to assess and encourage FAIRness
PPTX
Science in the open, what does it take?
PPTX
Kr slides fair astronomy 20181019
PPTX
FAIR data
PPTX
Fair data vs 5 star open data final
PPTX
Fair data principles for AOASG
PDF
DataCite and its Members: Connecting Research and Identifying Knowledge
PDF
Origins of FAIR webinar
PPTX
CARARE: Can I use this data? FAIR into practice
PDF
FAIRness through a novel combination of Web technologies
PDF
FAIR Data Knowledge Graphs–from Theory to Practice
Towards cross-domain interoperation in the internet of FAIR data and services
04 findable imming
VODAN Africa IN.pptx
Increasing the Reputation of your Published Data on the Web
FAIR-Principles-and-ELN.pdf
Dataverse as a FAIR Data Repository (Mercè Crosas)
Fair traits data 20180517
Metadata modeling principles for life.pptx
Towards metrics to assess and encourage FAIRness
Science in the open, what does it take?
Kr slides fair astronomy 20181019
FAIR data
Fair data vs 5 star open data final
Fair data principles for AOASG
DataCite and its Members: Connecting Research and Identifying Knowledge
Origins of FAIR webinar
CARARE: Can I use this data? FAIR into practice
FAIRness through a novel combination of Web technologies
FAIR Data Knowledge Graphs–from Theory to Practice
Ad

More from Luiz Olavo Bonino da Silva Santos (7)

PPTX
Estruturas de apoio ao acesso aberto
PPTX
Ciência aberto, diretrizes FAIR, etapas de viabilização e horizontes
PPTX
Panorama global de gestão de dados de pesquisa e a iniciativa GO FAIR
PPTX
Ciência aberta e dados FAIR
PDF
Mendeley Data FAIR hackathon
PPTX
DTL Partners Event - FAIR Data Tech overview - Day 1
PPTX
Estruturas de apoio ao acesso aberto
Ciência aberto, diretrizes FAIR, etapas de viabilização e horizontes
Panorama global de gestão de dados de pesquisa e a iniciativa GO FAIR
Ciência aberta e dados FAIR
Mendeley Data FAIR hackathon
DTL Partners Event - FAIR Data Tech overview - Day 1

Recently uploaded (20)

PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
STKI Israel Market Study 2025 version august
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
August Patch Tuesday
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Hybrid model detection and classification of lung cancer
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
A novel scalable deep ensemble learning framework for big data classification...
STKI Israel Market Study 2025 version august
Group 1 Presentation -Planning and Decision Making .pptx
Enhancing emotion recognition model for a student engagement use case through...
Final SEM Unit 1 for mit wpu at pune .pptx
Zenith AI: Advanced Artificial Intelligence
OMC Textile Division Presentation 2021.pptx
August Patch Tuesday
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Hybrid model detection and classification of lung cancer
Hindi spoken digit analysis for native and non-native speakers
A comparative study of natural language inference in Swahili using monolingua...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Programs and apps: productivity, graphics, security and other tools
gpt5_lecture_notes_comprehensive_20250812015547.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Univ-Connecticut-ChatGPT-Presentaion.pdf

FAIR Explained

  • 1. FAIR EXPLAINED Webinar FAIR Italy – February 27 2019 Luiz Bonino
  • 2. FAIR PRINCIPLES: MANY WAYS TO LOOK AT THEM
  • 3. FAIR PRINCIPLES Findable: F1. (meta)data are assigned a globally unique and persistent identifier; F2. data are described with rich metadata; F3. metadata clearly and explicitly include the identifier of the data it describes; F4. (meta)data are registered or indexed in a searchable resource; Accessible: A1. (meta)data are retrievable by their identifier using a standardized communications protocol; A1.1 the protocol is open, free, and universally implementable; A1.2. the protocol allows for an authentication and authorization procedure, where necessary; A2. metadata are accessible, even when the data are no longer available; Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles; I3. (meta)data include qualified references to other (meta)data; Reusable: R1. (meta)data are richly described with a plurality of accurate and relevant attributes; R1.1. (meta)data are released with a clear and accessible data usage license; R1.2. (meta)data are associated with detailed provenance; R1.3. (meta)data meet domain-relevant community standards; https://www.nature.com/articles/sdata201618
  • 4. FAIR DATA PRINCIPLES - METADATA Findable: F1. metadata are assigned a globally unique and persistent identifier; F2. data are described with rich metadata; F3. metadata clearly and explicitly include the identifier of the data it describes; F4. metadata are registered or indexed in a searchable resource; Accessible: A1. metadata are retrievable by their identifier using a standardized communications protocol; A1.1 the protocol is open, free, and universally implementable; A1.2. the protocol allows for an authentication and authorization procedure, where necessary; A2. metadata are accessible, even when the data are no longer available; Interoperable: I1. metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. metadata use vocabularies that follow FAIR principles; I3. metadata include qualified references to other metadata; Reusable: R1. metadata are richly described with a plurality of accurate and relevant attributes; R1.1. metadata are released with a clear and accessible data usage license; R1.2. metadata are associated with detailed provenance; R1.3. metadata meet domain-relevant community standards; https://www.nature.com/articles/sdata201618
  • 5. FAIR DATA PRINCIPLES – DATA/DIGITAL RESOURCES Findable: F1. data are assigned a globally unique and persistent identifier; F2. data are described with rich metadata; F3. metadata clearly and explicitly include the identifier of the data it describes; F4. metadata are registered or indexed in a searchable resource; Accessible: A1. metadata are retrievable by their identifier using a standardized communications protocol; A1.1 the protocol is open, free, and universally implementable; A1.2. the protocol allows for an authentication and authorization procedure, where necessary; A2. metadata are accessible, even when the data are no longer available; Interoperable: I1. metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. metadata use vocabularies that follow FAIR principles; I3. metadata include qualified references to other (meta)data; Reusable: R1. metadata are richly described with a plurality of accurate and relevant attributes; R1.1. metadata are released with a clear and accessible data usage license; R1.2. metadata are associated with detailed provenance; R1.3. metadata meet domain-relevant community standards; https://www.nature.com/articles/sdata201618
  • 6. FAIR DATA PRINCIPLES – SUPPORTING ELEMENTS Findable: F1. (meta)data are assigned a globally unique and persistent identifier; F2. data are described with rich metadata; F3. metadata clearly and explicitly include the identifier of the data it describes; F4. (meta)data are registered or indexed in a searchable resource; Accessible: A1. (meta)data are retrievable by their identifier using a standardized communications protocol; A1.1. the protocol is open, free, and universally implementable; A1.2. the protocol allows for an authentication and authorization procedure, where necessary; A2. metadata are accessible, even when the data are no longer available; Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation; I2. (meta)data use vocabularies that follow FAIR principles; I3. (meta)data include qualified references to other (meta)data; Reusable: R1. (meta)data are richly described with a plurality of accurate and relevant attributes; R1.1. (meta)data are released with a clear and accessible data usage license; R1.2. (meta)data are associated with detailed provenance; R1.3. (meta)data meet domain-relevant community standards; https://www.nature.com/articles/sdata201618
  • 7. REPOSITORIES SUPPORTING USERS TO ACHIEVE FAIR Findable: F1. (meta)data are assigned a globally unique and persistent identifier; F2. data are described with rich metadata; F3. metadata clearly and explicitly include the identifier of the data it describes; F4. (meta)data are registered or indexed in a searchable resource; Accessible: A1. (meta)data are retrievable by their identifier using a standardized communications protocol; A1.1 the protocol is open, free, and universally implementable; A1.2. the protocol allows for an authentication and authorization procedure, where necessary; A2. metadata are accessible, even when the data are no longer available; Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles; I3. (meta)data include qualified references to other (meta)data; Reusable: R1. (meta)data are richly described with a plurality of accurate and relevant attributes; R1.1. (meta)data are released with a clear and accessible data usage license; R1.2. (meta)data are associated with detailed provenance; R1.3. (meta)data meet domain-relevant community standards;
  • 9. FAIR PRINCIPLES  What does it mean?  We need an identification mechanism, e.g., PID, PURL, DOI, …  This mechanism needs to guarantee global uniqueness of the issued identifier, i.e., every time a given identifier is called, the same resource is points to is retrieved  This mechanism needs to guarantee persistency of the issued identifier, i.e., what happens when the identifier scheme is changed?  What do we need to fulfill this principle?  How to describe the used identification mechanism?  How to properly identify the identifier service? I.e., what is the commonly agreed vocabulary that can represent that a given piece of information is the identifier of a digital resource?  What is the uniqueness policy?  How to represent the policy in a computer-actionable way?  What is the required content of the policy, e.g., uniqueness mechanism?  What is the persistency policy?  How to represent the persistency policy in a computer-actionable way?  What is the required content of the policy, e.g., persistency over updates of the mechanism?  What is resolved by sending a request to the identifier, the actual digital resource, its metadata, etc.? I.e, what is the protocol for getting the actual digital resource from its identifier? F1. (meta)data are assigned a globally unique and persistent identifier
  • 10. FAIR PRINCIPLES  What does it mean?  If we don’t have the identifier, the digital resource should be described with rich enough metadata that we can find it through the combination of the items in this metadata  What do we need to fulfill this principle?  For different types of digital resources, what are the minimal metadata elements that provide this richness?  How to describe the metadata in a commonly agreed and computer-actionable way? By using a common way to represent metadata, tools can be made that are able to interpret metadata from any kind of digital resource. F2. data are described with rich metadata
  • 11. FAIR PRINCIPLES  What does it mean?  The discovery of a digital resource should be possible from its metadata. For this to happen, the metadata must explicitly contain the identifier for the digital resource it describes.  What do we need to fulfill this principle?  How to differentiate the information about the digital resource’s identifier and the one about its metadata identifier? I.e., a metadata record contains two identifiers, of itself (the metadata record) and of the data they describe. What is the vocabulary that contains concepts to describe a metadata identifier and digital resource they describe F3. metadata clearly and explicitly include the identifier of the data it describes
  • 12. FAIR PRINCIPLES  What does it mean?  Most people use a search engine to initiate a search for a particular digital resource. If the resource or its metadata are not index in a searchable resource, the capability for an individual to find it is substantially reduced.  What do we need to fulfill this principle?  For the data part, a full indexing is equivalent to allowing complete and direct querying on the data, which may not be feasible every time. An intermediate step would be to select a number of relevant parts of the data to be highlighted by their metadata, which would be indexed. E.g., in a dataset describing gene information, it may be relevant to allow the indexing of the unique genes that the dataset has information about.  Search engines benefit from common interfaces (or at least interfaces that are described in a commonly agreed way) to allow the harvesting of the elements (metadata and/or data) to be indexed.  A common representation format for the metadata also improves the possibility of different searchable resources to index the metadata records. F4. (meta)data are registered or indexed in a searchable resource
  • 13. FAIR PRINCIPLES  What does it mean?  In order to access a digital resource, the requestor needs to be able to implement the used communication protocol. Therefore, this protocol should be open, free and universally implementable. Moreover, the protocol should also describe whether authentication and authorization mechanisms are required.  What do we need to fulfill this principle?  How to describe the communication (accessibility) protocol?  What are the elements required in the description of the communication (accessibility) protocol, including the authentication and authorization procedure?  How to demonstrate that the protocol is open, free and universally implementable? A1. (meta)data are retrievable by their identifier using a standardized communications protocol; A1.1 the protocol is open, free, and universally implementable; A1.2. the protocol allows for an authentication and authorization procedure, where necessary;
  • 14. FAIR PRINCIPLES  What does it mean?  Cross-reference to data from third-party’s FAIR data and metadata will naturally degrade over time. Therefore, it is important for FAIR providers to continue to provide descriptors of what the data was to assist in the continued interpretation of those third-party data.  What do we need to fulfill this principle?  How to guarantee long-term persistency of the metadata?  How to describe that the data (digital resource) referred by the metadata are no longer accessible? Is it necessary to inform why?  How to harmonize the persistency of the metadata with the GDPR’s “right to be forgotten”? A2. metadata are accessible, even when the data are no longer available
  • 15. FAIR PRINCIPLES  What does it mean?  Cross-reference to data from third-party’s FAIR data and metadata will naturally degrade over time. Therefore, it is important for FAIR providers to continue to provide descriptors of what the data was to assist in the continued interpretation of those third-party data.  What do we need to fulfill this principle?  How to guarantee long-term persistency of the metadata?  How to describe that the data (digital resource) referred by the metadata are no longer accessible? Is it necessary to inform why?  How to harmonize the persistency of the metadata with the GDPR’s “right to be forgotten”? A2. metadata are accessible, even when the data are no longer available
  • 16. FAIR PRINCIPLES  What does it mean?  The digital resource is described using a formal, accessible, shared and broadly applicable language   What do we need to fulfill this principle?  How to inform the language used to represent the digital object?  How to provide this information for the metadata? In a meta-metadata?  How to demonstrate the formality (BNF), accessibility (resolution of the language description document), shareability and broad applicability of the language (IANA media-type?)? I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
  • 17. FAIR PRINCIPLES  What does it mean?  The digital resource is described using a formal, accessible, shared and broadly applicable language   What do we need to fulfill this principle?  How to inform the language used to represent the digital object?  How to provide this information for the metadata? In a meta-metadata?  How to demonstrate the formality (BNF), accessibility (resolution of the language description document), shareability and broad applicability of the language (IANA media-type?)? I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
  • 18. FAIR PRINCIPLES  What does it mean?  The metadata values and qualified relations should themselves be FAIR  What do we need to fulfill this principle?  Inform which vocabularies are used  What is the minimal FAIRness for these vocabularies to be considered to follow FAIR principles? I2. (meta)data use vocabularies that follow FAIR principles
  • 19. FAIR PRINCIPLES  What does it mean?  Relationships within digital resources and between local and third-party data, have explicit and “useful” semantic meaning  What do we need to fulfill this principle?  Qualify (provide proper semantics) the references to other digital resources  As per I2, these references (and their qualifiers) should also be FAIR I3. (meta)data include qualified references to other (meta)data
  • 20. FAIR PRINCIPLES  What does it mean?  Digital resources should inform who has which rights under which circumstances (license), what is their provenance and use relevant standards adopt by the community in which the resource has been created/used  What do we need to fulfill this principle?  Inform the usage license:  What representation format can be used for a computer-actionable license description?  What are the required concerns that should be present in this description (rights, conditions, …)? R1. (meta)data are richly described with a plurality of accurate and relevant attributes; R1.1. (meta)data are released with a clear and accessible data usage license; R1.2. (meta)data are associated with detailed provenance; R1.3. (meta)data meet domain-relevant community standards;
  • 21. FAIR PRINCIPLES  What do we need to fulfill this principle?  Inform the digital resource’s provenance information:  What are the core provenance information?  What are the community-specific provenance information?  How to represent provenance? Which vocabularies to use?  Inform the relevant community standards used by the digital resource (certification):  How to describe which standards are used?  How to describe compliance to these standards?  How to demonstrate that the standards are accepted by a given community? R1. (meta)data are richly described with a plurality of accurate and relevant attributes; R1.1. (meta)data are released with a clear and accessible data usage license; R1.2. (meta)data are associated with detailed provenance; R1.3. (meta)data meet domain-relevant community standards;
  • 26. FAIRIFICATION WORKFLOW Retrieve non-FAIR data Analyse and prepare datasets Standard format (XML, RDF, relational DB API, VCF, DICOM, etc.)? What is the content? Column/field names? Relations? Data domain and range? Understand the data Data munging …
  • 27. FAIRIFICATION WORKFLOW Retrieve non-FAIR data Define semantic model What are the concepts involved? Relations among concepts? Existing vocabularies for the concepts and instances? Analyse and prepare datasets Interoperability Reusability
  • 28. FAIRIFICATION WORKFLOW Retrieve non-FAIR data Define semantic model Make data linkable Apply the semantic model on the original data to make it linkable Analyse and prepare datasets Interoperability Reusability
  • 29. FAIRIFICATION WORKFLOW Retrieve non-FAIR data Define semantic model Make data linkable Assign license Who can access/reuse the data? Under which conditions? Analyse and prepare datasets Accessibility Reusability
  • 31. FAIRIFICATION WORKFLOW Retrieve non-FAIR data Define semantic model Make data linkable Assign license Define metadata Deploy FAIR data resource How to make the data and me tadata available in a FAIR way? Analyse and prepare datasets Findability Accessibility Interoperability Reusability
  • 34. WHY TO ASSESS?  Because everybody is talking about FAIR and my resources should be seen as FAIR, whatever this means?  To satisfy funders requirements?  To serve as a guideline for achieving higher levels of interoperability and reuse with clarity on the concrete benefits (help improve)?
  • 35. WHAT TO ASSESS?  Metadata and data?  Only metadata?  Only data?  What do you mean by data?  In the FAIR principles, data refers to a variety of different resources, e.g., “traditional” data, services, software, APIs, vocabularies, ontologies, articles, etc.
  • 36. HOW TO ASSESS?  Manual  Takes advantage of human understandable artifacts, which are currently prevalent  May lead to subjective assessments and, therefore, harder to compare resources  Harder to scale  Harder to evaluate FAIR for machines, which is the main goal of the FAIR principles  Automatic  Requires more rigor on the assessed resources  More likely to produce objective assessments  Easier to scale  Able to check if machines can, in fact, “work” with the (meta)data
  • 37. HOW TO “READ” THE ASSESSMENTS?  Need for a scoring system  One score for the 4 aspects of FAIR? Does not seem useful.  One score per aspect (F, A, I and R)?  One score per principle? What about the sub-principles?  Is there a hierarchy among the principles? Is there an order of precedence? Or different weights?  Is there an acceptable minimal FAIR level? Should it be across domains and applications or domain/community-dependent?  Do we use a pass/fail approach or introduce intermediary compliance levels in each/some evaluation?  Need for a visual representation of the scores  To facilitate quick perception of the FAIRness level, a visual representation of the FAIR scores is required, e.g., stars, bars, etc.
  • 38. GENERAL CHALLENGES  Clarify that nobody has been asked to be 100% FAIR. Many times a lower FAIRness level is perfectly adequate.  How to deal with the conflicting forces that, from one side want to push the communities towards a better (and FAIRer) data landscape and, from the other side, want to preserve the status quo (existing “kingdoms”) but labeling themselves FAIR?  Who will define the assessment criteria?  Who will execute the assessments based on the defined criteria?  Should we have a unique set of assessment criteria? Or a core set for general comparison and domain-specific sets on top of the core for the specific needs of a given domain/application?
  • 39. OTHER CHALLENGES  FAIR should be used as a guideline for achieving higher levels of interoperability and reuse with clarity on the concrete benefits.  Improvements on FAIRness can be done incrementally, e.g., first deal with metadata then go for data.  We need a combination of (FAIR) infrastructure and (FAIR) community practices.  How to deal with other aspects not covered by the FAIR principles, e.g., openness, quality, …?
  • 40. Q&A – CONTACT INFO Luiz Bonino International Technology Coordinator – GO FAIR Associate Professor BioSemantics – LUMC E-mail: luiz.bonino@go-fair.org Skype: luizolavobonino Web: www.go-fair.org