SlideShare a Scribd company logo
SEMANTiCS 2018 - 14th International Conference on Semantic Systems
September 2018
Title here
CC BY-SA
Outline
● Introduction and Motivation
● Our context for investigating the Web of Data
● Requirements for metadata aggregation
● Our pilot case study
• Dataset descriptions for metadata aggregation
● Results and conclusions
● Ongoing and future work
CC BY-SA
Metadata aggregation of IIIF Resources at Europeana
Czech Republic, PD
1887, Uměleckoprůmyslové museum v Praze
Preissig, Vojtech
Coloured etchings
Introduction
Metadata aggregation in the
cultural heritage domain
Title here
CC BY-SA
History
● Thousands of digital libraries exist, maintained by different organizations
• hindering the discoverability, sharing and reuse of the cultural resources
● Federated search has been applied in some cases
• ...but it did not scale, and had serious usability limitations
● Standard protocols for metadata aggregation gained more attention,
mainly driven by preprint repositories
• With OAI-PMH as the dominating standard
● Metadata aggregation centralizes the discoverability and access to
resources to further promote the usage of the resources
CC BY-SA
Title here
CC BY-SA
What kinds of technologies are we considering?
● The technological landscape and aggregation scale evolve…
…and aggregation networks need to operate more efficiently to remain
sustainable
• What are the successors of OAI-PMH?
● Focus on adoption of technology that:
• present low barriers for adoption by data providers
• are used by Cultural Heritage institutions for other purposes
•Search engine optimization
•Linked data
•Social web technologies
CC BY-SA
Cristallisation ou Mouvement du
temps, René Bord
1987, Bibliothèque Municipale De Lyon,
public domain
Our context for investigating
the Web of Data
Title here
CC BY-SA
Motivation for studying the Web of Data
● Cultural heritage institutions are increasingly seeking wider
interoperability on the Web
● We have identified linked data and Schema.org as options for innovating
metadata aggregation. Use of these technologies may:
• reuse expertise and deployed technology within the organizations
• motivate the participation in aggregation networks
CC BY-SA
Title here
CC BY-SA
Our earlier experiments
● Does crawling the Web (of documents) allow aggregators to reach
structured metadata?
● Can cultural resources metadata be expressed in Schema.org?
CC BY-SA
Title here
CC BY-SA
Reaching Structured
data
CC BY-SA
● We crawled 50.000 landing pages of
resources from Europeana data
providers
● We collected statistics about:
• Usage of HTML5 meta tags,
RDFa/RDFa lite, content
negotiation/RDF
• Data model in use: Dublin Core
and Schema.org
Title here
CC BY-SA
Reaching structured data automatically is
currently difficult
● In spite of the numerous activities, in cultural heritage, for making
available linked data, reaching it through automated means is difficult
• No wide support of HTTP content negotiation
• ...and the structured metadata within the HTML pages is very limited
or non-existent
CC BY-SA
Title here
CC BY-SA
Representing cultural resources in Schema.org
● Can we find in Schema.org all the core modelling constructs of the EDM?
• The resource that institutions provide metadata about
schema:CreativeWork, schema:VisualArtwork, schema:Book,
schema:Painting, schema:Sculpture, and schema:Product
• The digital versions of the resource
schema:MediaObject, schema:ImageObject, schema:VideoObject,
schema:AudioObject
• The contextual classes
schema:Person, schema:Place and schema:Organization
CC BY-SA
Title here
CC BY-SA
Representing cultural resources in Schema.org
● So all the core required modelling constructs are available, but...
• … are cultural heritage institutions using them?
• … can the very specific information needs of EDM/Europeana be
achieved?
• … do the original Schema.org patterns need to be further
specified/restricted for metadata aggregation?
CC BY-SA
Title here
CC BY-SA
CC BY-SA
● To evaluate, we used
datasets where the
same resources had
metadata available
in both Schema.org
and EDM
● Schema.org (@data
provider)
● EDM (@Digital Public
Library of America)
Representing
cultural resources
in Schema.org
Title here
CC BY-SA
Schema.org is suitable for describing cultural
heritage resources
● The data providers prepared Schema.org metadata for Internet
discovery, not specifically for cultural heritage aggregation
● ...despite this fact, the EDM metadata derived from Schema.org was
close to fully suitable for aggregation by Europeana
CC BY-SA
Title here
CC BY-SA
...additional conclusions
● Specific semantics for aggregation and controlled vocabularies
• Data providers may need to provide some specific properties, that are
not being used for web search engines
● It may allow the description of additional types and characteristics not
(yet) implemented in EDM
• an opportunity for progressively improve the data available
● It will require recommendations and/or specifications regarding how
data providers should provide their Schema.org metadata
CC BY-SA
Title here
CC BY-SA
References of the studies
CC BY-SA
N. Freire, Martins B., Calado P. (2018) Availability of cultural heritage structured metadata in the
World Wide Web. In proceedings of 22nd International Conference on Electronic Publishing ELPUB
2018. Available online: https://elpub.episciences.org/4608
N. Freire, Charles V., Isaac A. (2018) Evaluation of Schema.org for Aggregation of Cultural Heritage
Metadata. In: Gangemi A. et al. (eds) The Semantic Web. ESWC 2018. Lecture Notes in Computer
Science, vol 10843. Springer, Cham. doi:10.1007/978-3-319-93417-4_15. Preprint available online:
https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper_224.pdf
Cristallisation ou Mouvement du
temps, René Bord
1987, Bibliothèque Municipale De Lyon,
public domain
Requirements for metadata
aggregation
Title here
CC BY-SA
Constraints of our context
● The solution must fulfil the same functional requirements as the current
aggregation solution of Europeana:
• which is based on OAI-PMH and the Europeana Data Model
● The automation of metadata aggregation should be 100% based on Web
of Data technology
CC BY-SA
Title here
CC BY-SA
Requirements
● R1 - Data providers must be able to provide a LD resource of their dataset.
● R2 - All data transmissions between data providers and Europeana must be built
on standard technologies of the Web of data
● R3 - Data providers must be able to provide machine readable licensing of their
metadata
• at the dataset level and also at the individual metadata record level
● R4 - Data providers must provide a machine-readable specification of how the
dataset can be downloaded or harvested by Europeana
• Two mechanisms may be used: RDF data dumps; or listings of the URI of the
resources that are part of the dataset.
● R5 - EDM compliant metadata must be made available by the data provider.
Alternatively, Schema.org metadata maybe used, as long as after conversion to
EDM, it complies with the EDM schema requirements.
CC BY-SA
Title here
CC BY-SA
Our Europeana pilot case study
● We addressed these requirements through a pilot case study within the
Europeana network:
• Including partner for the three roles of the Europeana network:
•Data provider - The National Library of The Netherlands
•Intermediary aggregator - The Dutch Digital Heritage Network
•Central aggregator - Europeana Foundation
● We analyzed standards, created guidelines, developed software systems and
defined data conversions
● Our paper presents the results on the description of datasets and their
distributions
...applying vocabularies such as DCAT, VoID and Schema.org
CC BY-SA
Title here
CC BY-SA
Sidenote:
Linked Data activities from the National Library of the Netherlands
CC BY-SA
Cristallisation ou Mouvement du
temps, René Bord
1987, Bibliothèque Municipale De Lyon,
public domain
Dataset descriptions for
metadata aggregation
Title here
CC BY-SA
Analyzed vocabularies for datasets
CC BY-SA
Vocabulary
Req. R1
Provide an RDF
resource
Req. R3
Licensing
Req. R4
Distribution of
the dataset
VoID - Vocabulary of Interlinked Datasets yes yes yes
DCAT – Data Catalogue Vocabulary yes yes partial
Schema.org yes yes partial
EDM Datasets Profile yes no no
ADMS - Asset Description Metadata Schema N/A - specific for semantic assets such as
schemas, generic data models and vocabularies
RDF Data Cube Vocabulary N/A - specific for multi-dimensional data
Three requirements of metadata aggregation can be fulfilled by dataset descriptions.
We identified and analyzed six candidates:
Title here
CC BY-SA
Three of the vocabularies were suitable
CC BY-SA
VoID -
Vocabulary
of Interlinked
Datasets
“VoID is an RDF Schema vocabulary for expressing metadata about RDF
datasets. It is intended as a bridge between the publishers and users of RDF
data, with applications ranging from data discovery to cataloging and archiving of
datasets.”
DCAT –
Data
Catalogue
Vocabulary
“DCAT is an RDF vocabulary designed to facilitate interoperability between data
catalogs published on the Web. Publishers increase discoverability and enable
applications easily to consume metadata from multiple catalogs. It further
enables decentralized publishing of catalogs and facilitates federated dataset
search across sites.”
Schema.org The Schema.org vocabulary defines classes representing Datasets and their
distribution
Title here
CC BY-SA
Supporting the usage of the three vocabularies
Active communities in this area have provided alignments between the vocabularies:
CC BY-SA
Vocabularies
Aligned
Description
DCAT and
Schema.org
From the W3C Dataset Exchange Working Group (DXWG).
DCAT v1.1 (draft stage) includes the alignment of DCAT with Schema.org
https://github.com/w3c/dxwg/blob/gh-pages/dcat/rdf/schema.ttl
DCAT, ADMS
and VoID
The original work that lead to the creation of the Schema.org Dataset class.
From a collaboration around the DCAT, ADMS and VoID vocabularies.
http://www.w3.org/wiki/WebSchemas/Datasets
DCAT and
Schema.org
From the W3C Spatial Data on the Web Working Group that included also the
alignment with ISO 19115 - ‘Geographic information - Metadata’.
https://webgate.ec.europa.eu/CITnet/stash/projects/ODCKAN/repos/dcat-ap-to-schema.org/browse
Title here
CC BY-SA
Example of a dataset description
A case from the National Library of the Netherlands, using both Schema.org and VoID
CC BY-SA
Title here
CC BY-SA
Example of a dataset description
A made-up example of a DCAT description of a dataset with a downloadable distribution.
CC BY-SA
Title here
CC BY-SA
● We have put into practice a solution with low implementation barriers
and use of existing expertise in CH institutions.
● Although the pilot is conducted just with three partners, we believe it
provides a strong support for further research
● At present, the complete workflow of metadata aggregation based on the
Web of Data was tested in the pilot
CC BY-SA
Conclusions on describing datasets
Ongoing and future work
France, Public Domain
Agence Rol. Agence photographique,
Bibliothèque national de France
Chat "regardant" à travers une longue-vue et
autre chat perché dessus
Title here
CC BY-SA
Ongoing and future work
● The pilot is continuing.
Currently it is addressing other requirements of the aggregation
workflow
● It is also being extended to museum resources
• taking its first steps (a cooperation with project CEMEC)
CC BY-SA
Title here
CC BY-SA
Ongoing and future work
Development of a prototype
supporting all activities of the
metadata aggregation
workflow
CC BY-SA
Thank you for your attention
nuno.freire@tecnico.ulisboa.pt
Netherlands, Public Domain
1660 - 1625, Rijksmuseum
Anonymous
Arrival of a Portuguese ship
Acknowledgments
Valentine Charles from Europeana Foundation
Fundação para a Ciência e a Tecnologia (FCT): UID/CEC/50021/2013
European Commission contract number 30-CE-0885387/00-80.

More Related Content

PPTX
Interaction with Linked Data
PPTX
Big Linked Data - Creating Training Curricula
PPTX
Building Linked Data Applications
PDF
The importance of metadata for datasets: The DCAT-AP European standard
PDF
Standardizing for Open Data
PPTX
Core Activities
PPTX
HDL - Towards A Harmonized Dataset Model for Open Data Portals
PPTX
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...
Interaction with Linked Data
Big Linked Data - Creating Training Curricula
Building Linked Data Applications
The importance of metadata for datasets: The DCAT-AP European standard
Standardizing for Open Data
Core Activities
HDL - Towards A Harmonized Dataset Model for Open Data Portals
April 24, 2013 NISO/DCMI Webinar: Deployment of RDA (Resource Description and...

What's hot (20)

PDF
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
PPT
euclid_linkedup WWW tutorial (Besnik Fetahu)
PPT
Ifla swsig meeting - Puerto Rico - 20110817
PDF
ARIADNE: progress in the first nine month
PPTX
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
PDF
Wed roman tut_open_datapub
PPTX
Usage of Linked Data: Introduction and Application Scenarios
PPT
Gbrds Workshop Sept09 Metadata Identifiers
PPTX
Adlug annual meeting 2013
PDF
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
PDF
An introduction to Linked (Open) Data
PPT
Introduction | Categories for Description of Works of Art | CDWA-LITE
PPTX
Dublin Core Metadata Initiatives
PDF
100827 ting.concept suhf - stockholm
PPT
Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...
PPTX
The Semantic Data Web, Sören Auer, University of Leipzig
PPTX
The Dublin Core 1:1 Principle in the Age of Linked Data
PPT
Introduction to linked data and the semantic web
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
euclid_linkedup WWW tutorial (Besnik Fetahu)
Ifla swsig meeting - Puerto Rico - 20110817
ARIADNE: progress in the first nine month
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Wed roman tut_open_datapub
Usage of Linked Data: Introduction and Application Scenarios
Gbrds Workshop Sept09 Metadata Identifiers
Adlug annual meeting 2013
WWW2014 Tutorial: Online Learning & Linked Data - Lessons Learned
An introduction to Linked (Open) Data
Introduction | Categories for Description of Works of Art | CDWA-LITE
Dublin Core Metadata Initiatives
100827 ting.concept suhf - stockholm
Developing linked Open Data - Nuno Freire, Senior Researcher, The European Li...
The Semantic Data Web, Sören Auer, University of Leipzig
The Dublin Core 1:1 Principle in the Age of Linked Data
Introduction to linked data and the semantic web
Ad

Similar to Aggregation of cultural heritage datasets through the Web of Data (20)

PPTX
Evaluation of Schema.org for Aggregation of Cultural Heritage Metadata
PPTX
Aggregation of Linked Data A case study in the cultural heritage domain
PPTX
Automated interpretability of linked data ontologies: an evaluation within th...
PPTX
European databases in cultural heritage: making connections
PPTX
Scaling up Linked Data
PPTX
Scaling up Linked Data
PDF
Data quality in cultural heritage (meta)data
PPTX
Linked Energy Data Generation
PPTX
Introduction to APIs and Linked Data
PPT
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
PPTX
Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...
PPTX
‘Development of a MODS-RDF Cataloguing Tool for the Digital Resources and Ima...
PPTX
Best Practices for Descriptive Metadata
PPTX
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
PPTX
Semantic Interoperability at Europeana - MultilingualDSIs2018
PDF
How e-infrastructure can contribute to Linked Germplasm Data
PPTX
Why I don't use Semantic Web technologies anymore, event if they still influe...
PDF
Presentation 16 may keynote karin bredenberg
PDF
What flavor of linked data is best for your collection?
PPT
Summary of Trends in Cataloging
Evaluation of Schema.org for Aggregation of Cultural Heritage Metadata
Aggregation of Linked Data A case study in the cultural heritage domain
Automated interpretability of linked data ontologies: an evaluation within th...
European databases in cultural heritage: making connections
Scaling up Linked Data
Scaling up Linked Data
Data quality in cultural heritage (meta)data
Linked Energy Data Generation
Introduction to APIs and Linked Data
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...
‘Development of a MODS-RDF Cataloguing Tool for the Digital Resources and Ima...
Best Practices for Descriptive Metadata
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
Semantic Interoperability at Europeana - MultilingualDSIs2018
How e-infrastructure can contribute to Linked Germplasm Data
Why I don't use Semantic Web technologies anymore, event if they still influe...
Presentation 16 may keynote karin bredenberg
What flavor of linked data is best for your collection?
Summary of Trends in Cataloging
Ad

More from Nuno Freire (12)

PPTX
Aggregation of Schema.org Linked Data for the Europeana Common Culture project
PPT
Connecting Europe Facility - The eArchiving Building Block
PPTX
Next Generation Research with Europeana: the Humanities and Cultural Heritage...
PPTX
Demo of the Data Aggregation Lab - June 2018
PPTX
Demo of the Data Aggregation Lab - October 2018
PPTX
Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...
PPTX
The Europeana Community: Semantics and Cultural Heritage Data
PPTX
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
PPTX
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
PPTX
IIIF at europeana, IIIF conference, Vatican, 2017
PPTX
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
PPTX
Use Cases From Digital Humanities for Library Linked Data
Aggregation of Schema.org Linked Data for the Europeana Common Culture project
Connecting Europe Facility - The eArchiving Building Block
Next Generation Research with Europeana: the Humanities and Cultural Heritage...
Demo of the Data Aggregation Lab - June 2018
Demo of the Data Aggregation Lab - October 2018
Opening Digitized Newspapers Corpora: Europeana’s Full-text Data Interoperabi...
The Europeana Community: Semantics and Cultural Heritage Data
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Metadata Aggregation: Assessing the Application of IIIF and Sitemaps within C...
IIIF at europeana, IIIF conference, Vatican, 2017
New approaches for data acquisition at europeana iiif, sitemaps and schema.o...
Use Cases From Digital Humanities for Library Linked Data

Recently uploaded (20)

PPTX
IMPACT OF LANDSLIDE.....................
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
Introduction to Data Science and Data Analysis
PDF
Business Analytics and business intelligence.pdf
PDF
annual-report-2024-2025 original latest.
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Managing Community Partner Relationships
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
IMPACT OF LANDSLIDE.....................
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Leprosy and NLEP programme community medicine
Introduction to Data Science and Data Analysis
Business Analytics and business intelligence.pdf
annual-report-2024-2025 original latest.
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Managing Community Partner Relationships
Pilar Kemerdekaan dan Identi Bangsa.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
A Complete Guide to Streamlining Business Processes
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
retention in jsjsksksksnbsndjddjdnFPD.pptx

Aggregation of cultural heritage datasets through the Web of Data

  • 1. SEMANTiCS 2018 - 14th International Conference on Semantic Systems September 2018
  • 2. Title here CC BY-SA Outline ● Introduction and Motivation ● Our context for investigating the Web of Data ● Requirements for metadata aggregation ● Our pilot case study • Dataset descriptions for metadata aggregation ● Results and conclusions ● Ongoing and future work CC BY-SA Metadata aggregation of IIIF Resources at Europeana
  • 3. Czech Republic, PD 1887, Uměleckoprůmyslové museum v Praze Preissig, Vojtech Coloured etchings Introduction Metadata aggregation in the cultural heritage domain
  • 4. Title here CC BY-SA History ● Thousands of digital libraries exist, maintained by different organizations • hindering the discoverability, sharing and reuse of the cultural resources ● Federated search has been applied in some cases • ...but it did not scale, and had serious usability limitations ● Standard protocols for metadata aggregation gained more attention, mainly driven by preprint repositories • With OAI-PMH as the dominating standard ● Metadata aggregation centralizes the discoverability and access to resources to further promote the usage of the resources CC BY-SA
  • 5. Title here CC BY-SA What kinds of technologies are we considering? ● The technological landscape and aggregation scale evolve… …and aggregation networks need to operate more efficiently to remain sustainable • What are the successors of OAI-PMH? ● Focus on adoption of technology that: • present low barriers for adoption by data providers • are used by Cultural Heritage institutions for other purposes •Search engine optimization •Linked data •Social web technologies CC BY-SA
  • 6. Cristallisation ou Mouvement du temps, René Bord 1987, Bibliothèque Municipale De Lyon, public domain Our context for investigating the Web of Data
  • 7. Title here CC BY-SA Motivation for studying the Web of Data ● Cultural heritage institutions are increasingly seeking wider interoperability on the Web ● We have identified linked data and Schema.org as options for innovating metadata aggregation. Use of these technologies may: • reuse expertise and deployed technology within the organizations • motivate the participation in aggregation networks CC BY-SA
  • 8. Title here CC BY-SA Our earlier experiments ● Does crawling the Web (of documents) allow aggregators to reach structured metadata? ● Can cultural resources metadata be expressed in Schema.org? CC BY-SA
  • 9. Title here CC BY-SA Reaching Structured data CC BY-SA ● We crawled 50.000 landing pages of resources from Europeana data providers ● We collected statistics about: • Usage of HTML5 meta tags, RDFa/RDFa lite, content negotiation/RDF • Data model in use: Dublin Core and Schema.org
  • 10. Title here CC BY-SA Reaching structured data automatically is currently difficult ● In spite of the numerous activities, in cultural heritage, for making available linked data, reaching it through automated means is difficult • No wide support of HTTP content negotiation • ...and the structured metadata within the HTML pages is very limited or non-existent CC BY-SA
  • 11. Title here CC BY-SA Representing cultural resources in Schema.org ● Can we find in Schema.org all the core modelling constructs of the EDM? • The resource that institutions provide metadata about schema:CreativeWork, schema:VisualArtwork, schema:Book, schema:Painting, schema:Sculpture, and schema:Product • The digital versions of the resource schema:MediaObject, schema:ImageObject, schema:VideoObject, schema:AudioObject • The contextual classes schema:Person, schema:Place and schema:Organization CC BY-SA
  • 12. Title here CC BY-SA Representing cultural resources in Schema.org ● So all the core required modelling constructs are available, but... • … are cultural heritage institutions using them? • … can the very specific information needs of EDM/Europeana be achieved? • … do the original Schema.org patterns need to be further specified/restricted for metadata aggregation? CC BY-SA
  • 13. Title here CC BY-SA CC BY-SA ● To evaluate, we used datasets where the same resources had metadata available in both Schema.org and EDM ● Schema.org (@data provider) ● EDM (@Digital Public Library of America) Representing cultural resources in Schema.org
  • 14. Title here CC BY-SA Schema.org is suitable for describing cultural heritage resources ● The data providers prepared Schema.org metadata for Internet discovery, not specifically for cultural heritage aggregation ● ...despite this fact, the EDM metadata derived from Schema.org was close to fully suitable for aggregation by Europeana CC BY-SA
  • 15. Title here CC BY-SA ...additional conclusions ● Specific semantics for aggregation and controlled vocabularies • Data providers may need to provide some specific properties, that are not being used for web search engines ● It may allow the description of additional types and characteristics not (yet) implemented in EDM • an opportunity for progressively improve the data available ● It will require recommendations and/or specifications regarding how data providers should provide their Schema.org metadata CC BY-SA
  • 16. Title here CC BY-SA References of the studies CC BY-SA N. Freire, Martins B., Calado P. (2018) Availability of cultural heritage structured metadata in the World Wide Web. In proceedings of 22nd International Conference on Electronic Publishing ELPUB 2018. Available online: https://elpub.episciences.org/4608 N. Freire, Charles V., Isaac A. (2018) Evaluation of Schema.org for Aggregation of Cultural Heritage Metadata. In: Gangemi A. et al. (eds) The Semantic Web. ESWC 2018. Lecture Notes in Computer Science, vol 10843. Springer, Cham. doi:10.1007/978-3-319-93417-4_15. Preprint available online: https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper_224.pdf
  • 17. Cristallisation ou Mouvement du temps, René Bord 1987, Bibliothèque Municipale De Lyon, public domain Requirements for metadata aggregation
  • 18. Title here CC BY-SA Constraints of our context ● The solution must fulfil the same functional requirements as the current aggregation solution of Europeana: • which is based on OAI-PMH and the Europeana Data Model ● The automation of metadata aggregation should be 100% based on Web of Data technology CC BY-SA
  • 19. Title here CC BY-SA Requirements ● R1 - Data providers must be able to provide a LD resource of their dataset. ● R2 - All data transmissions between data providers and Europeana must be built on standard technologies of the Web of data ● R3 - Data providers must be able to provide machine readable licensing of their metadata • at the dataset level and also at the individual metadata record level ● R4 - Data providers must provide a machine-readable specification of how the dataset can be downloaded or harvested by Europeana • Two mechanisms may be used: RDF data dumps; or listings of the URI of the resources that are part of the dataset. ● R5 - EDM compliant metadata must be made available by the data provider. Alternatively, Schema.org metadata maybe used, as long as after conversion to EDM, it complies with the EDM schema requirements. CC BY-SA
  • 20. Title here CC BY-SA Our Europeana pilot case study ● We addressed these requirements through a pilot case study within the Europeana network: • Including partner for the three roles of the Europeana network: •Data provider - The National Library of The Netherlands •Intermediary aggregator - The Dutch Digital Heritage Network •Central aggregator - Europeana Foundation ● We analyzed standards, created guidelines, developed software systems and defined data conversions ● Our paper presents the results on the description of datasets and their distributions ...applying vocabularies such as DCAT, VoID and Schema.org CC BY-SA
  • 21. Title here CC BY-SA Sidenote: Linked Data activities from the National Library of the Netherlands CC BY-SA
  • 22. Cristallisation ou Mouvement du temps, René Bord 1987, Bibliothèque Municipale De Lyon, public domain Dataset descriptions for metadata aggregation
  • 23. Title here CC BY-SA Analyzed vocabularies for datasets CC BY-SA Vocabulary Req. R1 Provide an RDF resource Req. R3 Licensing Req. R4 Distribution of the dataset VoID - Vocabulary of Interlinked Datasets yes yes yes DCAT – Data Catalogue Vocabulary yes yes partial Schema.org yes yes partial EDM Datasets Profile yes no no ADMS - Asset Description Metadata Schema N/A - specific for semantic assets such as schemas, generic data models and vocabularies RDF Data Cube Vocabulary N/A - specific for multi-dimensional data Three requirements of metadata aggregation can be fulfilled by dataset descriptions. We identified and analyzed six candidates:
  • 24. Title here CC BY-SA Three of the vocabularies were suitable CC BY-SA VoID - Vocabulary of Interlinked Datasets “VoID is an RDF Schema vocabulary for expressing metadata about RDF datasets. It is intended as a bridge between the publishers and users of RDF data, with applications ranging from data discovery to cataloging and archiving of datasets.” DCAT – Data Catalogue Vocabulary “DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. Publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites.” Schema.org The Schema.org vocabulary defines classes representing Datasets and their distribution
  • 25. Title here CC BY-SA Supporting the usage of the three vocabularies Active communities in this area have provided alignments between the vocabularies: CC BY-SA Vocabularies Aligned Description DCAT and Schema.org From the W3C Dataset Exchange Working Group (DXWG). DCAT v1.1 (draft stage) includes the alignment of DCAT with Schema.org https://github.com/w3c/dxwg/blob/gh-pages/dcat/rdf/schema.ttl DCAT, ADMS and VoID The original work that lead to the creation of the Schema.org Dataset class. From a collaboration around the DCAT, ADMS and VoID vocabularies. http://www.w3.org/wiki/WebSchemas/Datasets DCAT and Schema.org From the W3C Spatial Data on the Web Working Group that included also the alignment with ISO 19115 - ‘Geographic information - Metadata’. https://webgate.ec.europa.eu/CITnet/stash/projects/ODCKAN/repos/dcat-ap-to-schema.org/browse
  • 26. Title here CC BY-SA Example of a dataset description A case from the National Library of the Netherlands, using both Schema.org and VoID CC BY-SA
  • 27. Title here CC BY-SA Example of a dataset description A made-up example of a DCAT description of a dataset with a downloadable distribution. CC BY-SA
  • 28. Title here CC BY-SA ● We have put into practice a solution with low implementation barriers and use of existing expertise in CH institutions. ● Although the pilot is conducted just with three partners, we believe it provides a strong support for further research ● At present, the complete workflow of metadata aggregation based on the Web of Data was tested in the pilot CC BY-SA Conclusions on describing datasets
  • 29. Ongoing and future work France, Public Domain Agence Rol. Agence photographique, Bibliothèque national de France Chat "regardant" à travers une longue-vue et autre chat perché dessus
  • 30. Title here CC BY-SA Ongoing and future work ● The pilot is continuing. Currently it is addressing other requirements of the aggregation workflow ● It is also being extended to museum resources • taking its first steps (a cooperation with project CEMEC) CC BY-SA
  • 31. Title here CC BY-SA Ongoing and future work Development of a prototype supporting all activities of the metadata aggregation workflow CC BY-SA
  • 32. Thank you for your attention nuno.freire@tecnico.ulisboa.pt Netherlands, Public Domain 1660 - 1625, Rijksmuseum Anonymous Arrival of a Portuguese ship Acknowledgments Valentine Charles from Europeana Foundation Fundação para a Ciência e a Tecnologia (FCT): UID/CEC/50021/2013 European Commission contract number 30-CE-0885387/00-80.