SlideShare a Scribd company logo
Semantic challenges in sharing
dataset metadata and creating
federated dataset catalogs
The example of the CIARD RING
Valeria Pesce (Global Forum on Agricultural Research and Innovation)
Linked Open Data in Agriculture
MACS-G20 Workshop in Berlin, September 27th–28th, 2017
Semantics involved in describing datasets
Name
Owner
Type of data
Topic(s)
Data standards used
Data structure
Place of collection
Date of collection
Distribution(s)
[…]
Dataset
for describing
datasets, e.g. DCAT
or DataCube
Metadata
vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies” / Knowledge Organization systems
Data of type Organization,
e.g. VIAF
“Descriptionvocabularies”
KOS / classification
Concepts suitable for
organizing by Topic, e.g.
AGROVOC
Concepts describing
Types of data
Dataset structure
Dimensions
Attributes
Measures
Value lists
for describing data
structures, e.g.
DataCube or STAT-
DCAY
Metadata
vocabulary or
ontology
Distribution
Protocol
URL
Format
Size
for describing
geospatial entities,
e.g. GML
Schema
No universal
agreed model or
vocabulary!
for describing
distributions, e.g.
DCAT or VOID
Metadata
vocabulary or
ontology
Semantics involved in describing datasets
Name
Owner
Type of data
Topic(s)
Data standards used
Data structure
Place of collection
Date of collection
Distribution(s)
[…]
Dataset
for describing
datasets, e.g. DCAT
or DataCube
Metadata
vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies”
Data of type Organization,
e.g. VIAF
“Descriptionvocabularies”
KOS / classification
Concepts suitable for
organizing by Topic, e.g.
AGROVOC
Concepts describing
Types of data
Dataset structure
Dimensions
Attributes
Measures
Value lists
Metadata
vocabulary or
ontology
Distribution
Protocol
URL
Format
Size
for describing
geospatial entities,
e.g. GML
Schema
The
dataset
resource
Metadata
vocabulary or
ontology
for describing data
structures, e.g.
DataCube or STAT-
DCAY for describing
distributions, e.g.
DCAT or VOID
Semantics involved in describing datasets
Name
Owner
Type of data
Topic(s)
Data standards used
Data structure
Place of collection
Date of collection
Distribution(s)
[…]
Dataset
Metadata
vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies”
Data of type Organization,
e.g. VIAF
“Descriptionvocabularies”
KOS / classification
Concepts suitable for
organizing by Topic, e.g.
AGROVOC
Concepts describing
Types of data
Dataset structure
Dimensions
Attributes
Measures
Value lists
for describing data
structures, e.g.
DataCube or
STAT-DCAT
Metadata
vocabulary or
ontology
Distribution
Protocol
URL
Format
Size
for describing
geospatial entities,
e.g. GML
Schema
The
dataset
structure
Metadata
vocabulary or
ontology
for describing
datasets, e.g. DCAT
or DataCube
for describing
distributions, e.g.
DCAT or VOID
Semantics involved in describing datasets
Name
Owner
Type of data
Topic(s)
Data standards used
Data structure
Place of collection
Date of collection
Distribution(s)
[…]
Dataset
Metadata
vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies”
Data of type Organization,
e.g. VIAF
“Descriptionvocabularies”
KOS / classification
Concepts suitable for
organizing by Topic, e.g.
AGROVOC
Concepts describing
Types of data
Dataset structure
Dimensions
Attributes
Measures
Value lists
Metadata
vocabulary or
ontology
Distribution
Protocol
URL
Format
Size
for describing
geospatial entities,
e.g. GML
Schema
Dataset
serialization
for describing
datasets, e.g. DCAT
or DataCube
for describing
distributions, e.g.
DCAT or VOID
Metadata
vocabulary or
ontology
for describing data
structures, e.g.
DataCube or STAT-
DCAY
Semantics needed to describe datasets
Name
Owner
Type of data
Topic(s)
Data standards used
Data structure
Place of collection
Date of collection
Distribution(s)
[…]
Dataset
Metadata
vocabulary or
ontology
Authority data
KOS / thesaurus
“Value vocabularies”
Data of type Organization,
e.g. VIAF
“Descriptionvocabularies”
KOS / classification
Concepts suitable for
organizing by Topic, e.g.
AGROVOC
Concepts describing
Types of data
Dataset structure
Dimensions
Attributes
Measures
Value lists
Metadata
vocabulary or
ontology
Distribution
Protocol
URL
Format
Size
for describing
geospatial entities,
e.g. GML
Schema
Reference value vocabularies
for describing
distributions, e.g.
DCAT or VOID
Metadata
vocabulary or
ontology
for describing
datasets, e.g. DCAT
or DataCube
for describing data
structures, e.g.
DataCube or STAT-
DCAY
Semantics of the values
• Standardization of the values, e.g. for “thematic
coverage” or “dimensions” of datasets, “format”
or “protocol used” of distributions etc.
• The value should be standardized, possibly a URI
• The value should be part of an authority list /
code list
RDF dataset vocabularies normally treat these
values as resources, so identifiable by URIs, BUT…
a) Often strings are used
b) Often a local concept URI is used
c) THERE AREN’T AGREED KOSs FOR EVERYTHING!
Authority data
KOS / thesaurus
Examples:
- VIAF registry
- Library of Congress
- ORCID
Examples:
- AGROVOC
- CABI thesaurus
Publisher metadata
Thematic metadata
Code list
Examples:
- ICASA variables
- CF conventions RDF
Dimensions
Authority data
Examples:
- GeoNames
- FAO Geopol Ontology
Geographic metadata
Examples of relevant value vocabularies
• Domain
• Agricultural concepts, topics: AGROVOC > GACS (or agreed subsets)
• Crop names: AGROVOC, Crop Ontology
• Soil types: USDA Soil Taxonomy, INSPIRE Registry
• Dimensions / variables: ICASA variables ( RDF?), CF conventions RDF
• Cross-domain
• Authority lists of organizations, projects: VIAF, CERIF, ORCID?
• Geospatial / geopolitical data: GeoNames, FAO Geopolitical Ontology
• Data formats / data standards? AgriSemantics Map of Standards
• File formats: IANA types ( RDF?), W3C formats
• Agreed list of types of data?
• Units of measure?
• Authority list of licenses (OpenDefinition list?)
Not for everything we would need!
The CIARD RING
The CIARD RING is a federated and curated catalog of agri-food datasets
and data services
http://ring.ciard.net
• a primary catalog (providers can catalog individual data services and
datasets directly in the RING) exposing all metadata as RDF
• a federated catalog (it harvests dataset metadata from other catalogs)
Federated catalogs so far
Dataverse
catalog
datasets services
Semantics in the RING dataset hub
• Dataset description vocabularies
The RING uses a combination of the DCAT-AP model + the VOID
vocabulary and the DataCube vocabulary
 a “RING DCAT profile” will be published
• Value vocabularies
• Domains: local RING Domains SKOS, based on FAO and USDA top-level
classifications of domains
• Types of data: local RING ”Types of data” SKOS, aligned with GODAN Ag Sector
Package types of data
• Topics: AGROVOC
• Countries: FAO Geopolitical Ontology
• Data formats / data standards: AgriSemantics Map of data standards
• File formats: “mapped” to IANA types and W3C formats when applicable
Examples of semantics in federated datasets - 1
• IFPRI dataset in Datahub (DCAT RDF)
dcat:keyword:
strings
dct:format: string
dcat:mediaType: string (IANA syntax)
Examples of semantics in federated datasets - 2
• IFPRI dataset in Dataverse (OAI-PMH XML response)
Description metadata: no published vocabulary
Geographic scope: string
Local keywords
Examples of semantics in federated datasets – 3a
• EuroStat dataset in EU Data Portal (DCAT RDF) (1)
dct:subject: URI of EUROVOC thesaurus
concept URI from EU KOS
additional property for dataset type
concept URI from EU KOS of licenses
RING: enriching and linking semantics
Match or partial match with
synonym in local KOS >>
Becomes a RING resource of type
skos:Concept and dc:FileFormat
with local URI
Becomes dct:conformsTo as in DCAT
with resource of type dc:Standard
linked to URI of same standard in
AgriSemantics Map of Data Standards
RING: linking semantics
“Uganda” matches a local concept of type
skos:Concept and dc:Location,
mapped to URI of Uganda country in
FAO Geopolitical Ontology
“Women” narrower of “Socio-
economic data” local concept,
mapped as closeMatch to URI of
“socioeconomic development”
concept in AGROVOC
Queries can leverage LOD mappings - 1
Example: To get all datasets with geographic coverage of “Uganda" using the
Geopolitical Ontology URI for “Uganda"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
DESCRIBE ?dataset ?distro WHERE {
?dataset rdf:type dcat:Dataset .
?dataset dcat:distribution ?distro .
?dataset dc:spatial ?topic .
?topic owl:sameAs <http://aims.fao.org/aos/geopolitical.owl#Uganda> .
}
URI of the “Uganda” in the
Geopolitical Ontology
Queries can leverage LOD mappings - 2
Example: To get all datasets on topic "Livestock" using the AGROVOC URI for "Livestock"
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
DESCRIBE ?dataset ?distro WHERE {
?dataset rdf:type dcat:Dataset .
?dataset dcat:distribution ?distro .
?dataset dcat:theme ?topic .
?topic owl:sameAs <http://aims.fao.org/aos/agrovoc/c_4397> .
}
URI of the “Livestock” concept in
the AGROVOC thesaurus
Example: To get all datasets complying with the INSPIRE specification for soil
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
DESCRIBE ?dataset ?distro WHERE {
?dataset rdf:type dcat:Dataset .
?dataset dcat:distribution ?distro .
?distro dc:conformsTo ?standard .
?standard owl:sameAs <http://vest.agrisemantics.org/node/19915> .
}
URI that identifies the
INSPIRE specification for Soil
Queries can leverage LOD mappings - 3
Conclusions
• The major semantic challenges when integrating (meta)data arise from the
lack of use of common value vocabularies, not so much from the use of
different description vocabularies / schemas / formats.
• In most cases the lack of good semantics in the (meta)data at the level of
value vocabularies is not due to ill will or lack of awareness, but to the
constraints posed by most dataset management tools.
• The machine-readable layer and the SPARQL endpoint of the RING are not
for the end users: we expect this layer to be used by developers to build
added-value services for the end users on top of the featured datasets.
Relevant vocabularies, catalog tools, catalogs
• DCAT: http://www.w3.org/TR/vocab-dcat/
• DCAT AP: https://joinup.ec.europa.eu/asset/dcat_application_profile/home
• STAT-DCAT: https://joinup.ec.europa.eu/asset/stat_dcat_application_profile/home
• DataCube: http://purl.org/linked-data/cube#
• VOID: http://rdfs.org/ns/void-guide
• DDI-RDF Discovery Vocabulary: http://rdf-vocabulary.ddialliance.org/discovery.html
• VIVO Datastar: http://sourceforge.net/projects/vivo/files/Datastar%20ontology/
• CERIF for datasets: https://cerif4datasets.wordpress.com/c4d-deliverables/
• CKAN: http://ckan.org/
• Dataverse: http://dataverse.org/
• Datahub: http://datahub.io/
• DataCite: http://search.datacite.org/ui?q=subject%3Aagriculture
• Re3data: http://www.re3data.org
• OpenAIRE: https://www.openaire.eu/
• CIARD RING: http://ring.ciard.info
Thank you
Semantic challenges in sharing dataset metadata
and creating federated dataset catalogs.
The example of the CIARD RING
Valeria Pesce (GFAR)
valeria.pesce@fao.org
Linked Open Data in Agriculture
MACS-G20 Workshop in Berlin, September 27th–28th, 2017

More Related Content

PPTX
Data discovery through federated dataset catalogs
PPTX
Dataset description: DCAT and other vocabularies
PPTX
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
PPTX
Advantages of metadata
PPTX
How to describe a dataset. Interoperability issues
PPTX
PPTX
Ag Data Commons: Agricultural research metadata and data
PDF
Dats nih-dccpc-kc7-april2018-prs-uoxf
Data discovery through federated dataset catalogs
Dataset description: DCAT and other vocabularies
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
Advantages of metadata
How to describe a dataset. Interoperability issues
Ag Data Commons: Agricultural research metadata and data
Dats nih-dccpc-kc7-april2018-prs-uoxf

What's hot (20)

PPT
Metadata: A concept
PDF
Introduction to eudat and its services
PPT
香港六合彩
PPT
Applying Digital Library Metadata Standards
PPT
EIA Biodiversity Data Mobilisation
PPT
Introduction to Metadata
PPTX
Introduction to Metadata
PDF
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
ODP
Metadata lecture riley_2011
PPTX
Dataset description using the W3C HCLS standard
PPT
New Directions in Metadata
PDF
Management of bibliographic metadata - Metadata management at the Leibniz Inf...
PDF
Mending the Gap between Library's Electronic and Print Collections in ILS and...
PDF
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
PPT
Metadata : Concentrating on the data, not on the scheme
PDF
dkNET ESP Meeting - February 2016
PPTX
Role of Cataloger in the 21st Century Academic Library
PPT
Metadata an overview
PDF
Metadata Standards
PPTX
HDL - Towards A Harmonized Dataset Model for Open Data Portals
Metadata: A concept
Introduction to eudat and its services
香港六合彩
Applying Digital Library Metadata Standards
EIA Biodiversity Data Mobilisation
Introduction to Metadata
Introduction to Metadata
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
Metadata lecture riley_2011
Dataset description using the W3C HCLS standard
New Directions in Metadata
Management of bibliographic metadata - Metadata management at the Leibniz Inf...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
Metadata : Concentrating on the data, not on the scheme
dkNET ESP Meeting - February 2016
Role of Cataloger in the 21st Century Academic Library
Metadata an overview
Metadata Standards
HDL - Towards A Harmonized Dataset Model for Open Data Portals
Ad

Viewers also liked (11)

PPTX
Attivio Predictions 2017
PDF
Data Modeling & Data Integration
PDF
The path to a Modern Data Architecture in Financial Services
PPT
Sharing Agricultural Events Information: When and where is that workshop?
PPTX
Inventory of data standards for food & agriculture
PPTX
The agINFRA Linked Data layer
PPTX
Semantics for food and agriculture: the GODAN Action map of data standards
PPTX
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
PPTX
A global linked and open data infrastructure for agricultural development
PPTX
Cognitive Search for Knowledge Management
PDF
Microsoft Technologies for Data Science 201612
Attivio Predictions 2017
Data Modeling & Data Integration
The path to a Modern Data Architecture in Financial Services
Sharing Agricultural Events Information: When and where is that workshop?
Inventory of data standards for food & agriculture
The agINFRA Linked Data layer
Semantics for food and agriculture: the GODAN Action map of data standards
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
A global linked and open data infrastructure for agricultural development
Cognitive Search for Knowledge Management
Microsoft Technologies for Data Science 201612
Ad

Similar to Semantic challenges in sharing dataset metadata and creating federated dataset catalogs. The example of the CIARD RING. (20)

PPTX
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
PDF
Dataset Catalogs as a Foundation for FAIR* Data
PPT
Linked Data Tutorial
PDF
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
PPT
Metadata Workshop - Utrecht - November 5, 2008
PPT
2009 11 icudl
PPT
Metadata Workshop-Maastricht - November 6, 2008
PPT
Understanding RDF: the Resource Description Framework in Context (1999)
PDF
Linked Open Data in the World of Patents
PPTX
RO-Crate: A framework for packaging research products into FAIR Research Objects
PDF
RDA and Linked Data. Gordon Dunsire
PPTX
Force11 JDDCP workshop presentation, @ Force2015, Oxford
PPTX
Flexible metadata schemes for research data repositories - Clarin Conference...
PPTX
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
PPTX
Expressing Concept Schemes & Competency Frameworks in CTDL
PPT
Going for GOLD - Adventures in Open Linked Geospatial Metadata
PPTX
RDTF Metadata Guidelines: an update
PPT
Re-using Media on the Web: Media fragment re-mixing and playout
PPT
SemanticWeb Nuts 'n Bolts
PDF
IRJET- Data Retrieval using Master Resource Description Framework
How to Describe a Dataset. Interoperability Issues, by Valeria Pesce
Dataset Catalogs as a Foundation for FAIR* Data
Linked Data Tutorial
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Metadata Workshop - Utrecht - November 5, 2008
2009 11 icudl
Metadata Workshop-Maastricht - November 6, 2008
Understanding RDF: the Resource Description Framework in Context (1999)
Linked Open Data in the World of Patents
RO-Crate: A framework for packaging research products into FAIR Research Objects
RDA and Linked Data. Gordon Dunsire
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Going for GOLD - Adventures in Open Linked Geospatial Metadata
RDTF Metadata Guidelines: an update
Re-using Media on the Web: Media fragment re-mixing and playout
SemanticWeb Nuts 'n Bolts
IRJET- Data Retrieval using Master Resource Description Framework

More from Valeria Pesce (16)

PPTX
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
PPTX
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
PPTX
Farmers' data rights - Some findings
PPTX
The new CIARD RING , a machine-readable directory of datasets for agriculture
PPT
Publishing Germplasm Vocabularies as Linked Data
PPT
VIVOCamp slides: agenda and slides on the extension of the ontology
PPT
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
PPTX
AgriVIVO. Fostering better networking and collaboration among researchers, re...
PPT
AgriDrupal: general presentation
PPT
Developing Agricultural Research Information Systems. The experience of the G...
PPT
Information / software architectures based on Content Management Systems (CMS)
PPT
The CIARD RING, an infrastructure for interoperability of agricultural resear...
PPT
Libraries 2.0 and RSS
PPT
The Ciard RING
PPT
The Global ARD Web Ring
PPT
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Farmers' data rights - Some findings
The new CIARD RING , a machine-readable directory of datasets for agriculture
Publishing Germplasm Vocabularies as Linked Data
VIVOCamp slides: agenda and slides on the extension of the ontology
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
AgriVIVO. Fostering better networking and collaboration among researchers, re...
AgriDrupal: general presentation
Developing Agricultural Research Information Systems. The experience of the G...
Information / software architectures based on Content Management Systems (CMS)
The CIARD RING, an infrastructure for interoperability of agricultural resear...
Libraries 2.0 and RSS
The Ciard RING
The Global ARD Web Ring
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR

Recently uploaded (20)

PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Business Analytics and business intelligence.pdf
PDF
Lecture1 pattern recognition............
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
annual-report-2024-2025 original latest.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
1_Introduction to advance data techniques.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Qualitative Qantitative and Mixed Methods.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Business Analytics and business intelligence.pdf
Lecture1 pattern recognition............
IBA_Chapter_11_Slides_Final_Accessible.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
annual-report-2024-2025 original latest.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Mega Projects Data Mega Projects Data
Business Acumen Training GuidePresentation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Foundation of Data Science unit number two notes
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
ISS -ESG Data flows What is ESG and HowHow
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
1_Introduction to advance data techniques.pptx

Semantic challenges in sharing dataset metadata and creating federated dataset catalogs. The example of the CIARD RING.

  • 1. Semantic challenges in sharing dataset metadata and creating federated dataset catalogs The example of the CIARD RING Valeria Pesce (Global Forum on Agricultural Research and Innovation) Linked Open Data in Agriculture MACS-G20 Workshop in Berlin, September 27th–28th, 2017
  • 2. Semantics involved in describing datasets Name Owner Type of data Topic(s) Data standards used Data structure Place of collection Date of collection Distribution(s) […] Dataset for describing datasets, e.g. DCAT or DataCube Metadata vocabulary or ontology Authority data KOS / thesaurus “Value vocabularies” / Knowledge Organization systems Data of type Organization, e.g. VIAF “Descriptionvocabularies” KOS / classification Concepts suitable for organizing by Topic, e.g. AGROVOC Concepts describing Types of data Dataset structure Dimensions Attributes Measures Value lists for describing data structures, e.g. DataCube or STAT- DCAY Metadata vocabulary or ontology Distribution Protocol URL Format Size for describing geospatial entities, e.g. GML Schema No universal agreed model or vocabulary! for describing distributions, e.g. DCAT or VOID Metadata vocabulary or ontology
  • 3. Semantics involved in describing datasets Name Owner Type of data Topic(s) Data standards used Data structure Place of collection Date of collection Distribution(s) […] Dataset for describing datasets, e.g. DCAT or DataCube Metadata vocabulary or ontology Authority data KOS / thesaurus “Value vocabularies” Data of type Organization, e.g. VIAF “Descriptionvocabularies” KOS / classification Concepts suitable for organizing by Topic, e.g. AGROVOC Concepts describing Types of data Dataset structure Dimensions Attributes Measures Value lists Metadata vocabulary or ontology Distribution Protocol URL Format Size for describing geospatial entities, e.g. GML Schema The dataset resource Metadata vocabulary or ontology for describing data structures, e.g. DataCube or STAT- DCAY for describing distributions, e.g. DCAT or VOID
  • 4. Semantics involved in describing datasets Name Owner Type of data Topic(s) Data standards used Data structure Place of collection Date of collection Distribution(s) […] Dataset Metadata vocabulary or ontology Authority data KOS / thesaurus “Value vocabularies” Data of type Organization, e.g. VIAF “Descriptionvocabularies” KOS / classification Concepts suitable for organizing by Topic, e.g. AGROVOC Concepts describing Types of data Dataset structure Dimensions Attributes Measures Value lists for describing data structures, e.g. DataCube or STAT-DCAT Metadata vocabulary or ontology Distribution Protocol URL Format Size for describing geospatial entities, e.g. GML Schema The dataset structure Metadata vocabulary or ontology for describing datasets, e.g. DCAT or DataCube for describing distributions, e.g. DCAT or VOID
  • 5. Semantics involved in describing datasets Name Owner Type of data Topic(s) Data standards used Data structure Place of collection Date of collection Distribution(s) […] Dataset Metadata vocabulary or ontology Authority data KOS / thesaurus “Value vocabularies” Data of type Organization, e.g. VIAF “Descriptionvocabularies” KOS / classification Concepts suitable for organizing by Topic, e.g. AGROVOC Concepts describing Types of data Dataset structure Dimensions Attributes Measures Value lists Metadata vocabulary or ontology Distribution Protocol URL Format Size for describing geospatial entities, e.g. GML Schema Dataset serialization for describing datasets, e.g. DCAT or DataCube for describing distributions, e.g. DCAT or VOID Metadata vocabulary or ontology for describing data structures, e.g. DataCube or STAT- DCAY
  • 6. Semantics needed to describe datasets Name Owner Type of data Topic(s) Data standards used Data structure Place of collection Date of collection Distribution(s) […] Dataset Metadata vocabulary or ontology Authority data KOS / thesaurus “Value vocabularies” Data of type Organization, e.g. VIAF “Descriptionvocabularies” KOS / classification Concepts suitable for organizing by Topic, e.g. AGROVOC Concepts describing Types of data Dataset structure Dimensions Attributes Measures Value lists Metadata vocabulary or ontology Distribution Protocol URL Format Size for describing geospatial entities, e.g. GML Schema Reference value vocabularies for describing distributions, e.g. DCAT or VOID Metadata vocabulary or ontology for describing datasets, e.g. DCAT or DataCube for describing data structures, e.g. DataCube or STAT- DCAY
  • 7. Semantics of the values • Standardization of the values, e.g. for “thematic coverage” or “dimensions” of datasets, “format” or “protocol used” of distributions etc. • The value should be standardized, possibly a URI • The value should be part of an authority list / code list RDF dataset vocabularies normally treat these values as resources, so identifiable by URIs, BUT… a) Often strings are used b) Often a local concept URI is used c) THERE AREN’T AGREED KOSs FOR EVERYTHING! Authority data KOS / thesaurus Examples: - VIAF registry - Library of Congress - ORCID Examples: - AGROVOC - CABI thesaurus Publisher metadata Thematic metadata Code list Examples: - ICASA variables - CF conventions RDF Dimensions Authority data Examples: - GeoNames - FAO Geopol Ontology Geographic metadata
  • 8. Examples of relevant value vocabularies • Domain • Agricultural concepts, topics: AGROVOC > GACS (or agreed subsets) • Crop names: AGROVOC, Crop Ontology • Soil types: USDA Soil Taxonomy, INSPIRE Registry • Dimensions / variables: ICASA variables ( RDF?), CF conventions RDF • Cross-domain • Authority lists of organizations, projects: VIAF, CERIF, ORCID? • Geospatial / geopolitical data: GeoNames, FAO Geopolitical Ontology • Data formats / data standards? AgriSemantics Map of Standards • File formats: IANA types ( RDF?), W3C formats • Agreed list of types of data? • Units of measure? • Authority list of licenses (OpenDefinition list?) Not for everything we would need!
  • 9. The CIARD RING The CIARD RING is a federated and curated catalog of agri-food datasets and data services http://ring.ciard.net • a primary catalog (providers can catalog individual data services and datasets directly in the RING) exposing all metadata as RDF • a federated catalog (it harvests dataset metadata from other catalogs) Federated catalogs so far Dataverse catalog datasets services
  • 10. Semantics in the RING dataset hub • Dataset description vocabularies The RING uses a combination of the DCAT-AP model + the VOID vocabulary and the DataCube vocabulary  a “RING DCAT profile” will be published • Value vocabularies • Domains: local RING Domains SKOS, based on FAO and USDA top-level classifications of domains • Types of data: local RING ”Types of data” SKOS, aligned with GODAN Ag Sector Package types of data • Topics: AGROVOC • Countries: FAO Geopolitical Ontology • Data formats / data standards: AgriSemantics Map of data standards • File formats: “mapped” to IANA types and W3C formats when applicable
  • 11. Examples of semantics in federated datasets - 1 • IFPRI dataset in Datahub (DCAT RDF) dcat:keyword: strings dct:format: string dcat:mediaType: string (IANA syntax)
  • 12. Examples of semantics in federated datasets - 2 • IFPRI dataset in Dataverse (OAI-PMH XML response) Description metadata: no published vocabulary Geographic scope: string Local keywords
  • 13. Examples of semantics in federated datasets – 3a • EuroStat dataset in EU Data Portal (DCAT RDF) (1) dct:subject: URI of EUROVOC thesaurus concept URI from EU KOS additional property for dataset type concept URI from EU KOS of licenses
  • 14. RING: enriching and linking semantics Match or partial match with synonym in local KOS >> Becomes a RING resource of type skos:Concept and dc:FileFormat with local URI Becomes dct:conformsTo as in DCAT with resource of type dc:Standard linked to URI of same standard in AgriSemantics Map of Data Standards
  • 15. RING: linking semantics “Uganda” matches a local concept of type skos:Concept and dc:Location, mapped to URI of Uganda country in FAO Geopolitical Ontology “Women” narrower of “Socio- economic data” local concept, mapped as closeMatch to URI of “socioeconomic development” concept in AGROVOC
  • 16. Queries can leverage LOD mappings - 1 Example: To get all datasets with geographic coverage of “Uganda" using the Geopolitical Ontology URI for “Uganda" PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/terms/> PREFIX dcat: <http://www.w3.org/ns/dcat#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> DESCRIBE ?dataset ?distro WHERE { ?dataset rdf:type dcat:Dataset . ?dataset dcat:distribution ?distro . ?dataset dc:spatial ?topic . ?topic owl:sameAs <http://aims.fao.org/aos/geopolitical.owl#Uganda> . } URI of the “Uganda” in the Geopolitical Ontology
  • 17. Queries can leverage LOD mappings - 2 Example: To get all datasets on topic "Livestock" using the AGROVOC URI for "Livestock" PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/terms/> PREFIX dcat: <http://www.w3.org/ns/dcat#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> DESCRIBE ?dataset ?distro WHERE { ?dataset rdf:type dcat:Dataset . ?dataset dcat:distribution ?distro . ?dataset dcat:theme ?topic . ?topic owl:sameAs <http://aims.fao.org/aos/agrovoc/c_4397> . } URI of the “Livestock” concept in the AGROVOC thesaurus
  • 18. Example: To get all datasets complying with the INSPIRE specification for soil PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/terms/> PREFIX dcat: <http://www.w3.org/ns/dcat#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> DESCRIBE ?dataset ?distro WHERE { ?dataset rdf:type dcat:Dataset . ?dataset dcat:distribution ?distro . ?distro dc:conformsTo ?standard . ?standard owl:sameAs <http://vest.agrisemantics.org/node/19915> . } URI that identifies the INSPIRE specification for Soil Queries can leverage LOD mappings - 3
  • 19. Conclusions • The major semantic challenges when integrating (meta)data arise from the lack of use of common value vocabularies, not so much from the use of different description vocabularies / schemas / formats. • In most cases the lack of good semantics in the (meta)data at the level of value vocabularies is not due to ill will or lack of awareness, but to the constraints posed by most dataset management tools. • The machine-readable layer and the SPARQL endpoint of the RING are not for the end users: we expect this layer to be used by developers to build added-value services for the end users on top of the featured datasets.
  • 20. Relevant vocabularies, catalog tools, catalogs • DCAT: http://www.w3.org/TR/vocab-dcat/ • DCAT AP: https://joinup.ec.europa.eu/asset/dcat_application_profile/home • STAT-DCAT: https://joinup.ec.europa.eu/asset/stat_dcat_application_profile/home • DataCube: http://purl.org/linked-data/cube# • VOID: http://rdfs.org/ns/void-guide • DDI-RDF Discovery Vocabulary: http://rdf-vocabulary.ddialliance.org/discovery.html • VIVO Datastar: http://sourceforge.net/projects/vivo/files/Datastar%20ontology/ • CERIF for datasets: https://cerif4datasets.wordpress.com/c4d-deliverables/ • CKAN: http://ckan.org/ • Dataverse: http://dataverse.org/ • Datahub: http://datahub.io/ • DataCite: http://search.datacite.org/ui?q=subject%3Aagriculture • Re3data: http://www.re3data.org • OpenAIRE: https://www.openaire.eu/ • CIARD RING: http://ring.ciard.info
  • 21. Thank you Semantic challenges in sharing dataset metadata and creating federated dataset catalogs. The example of the CIARD RING Valeria Pesce (GFAR) valeria.pesce@fao.org Linked Open Data in Agriculture MACS-G20 Workshop in Berlin, September 27th–28th, 2017