SlideShare a Scribd company logo
Evaluating Data Quality in Europeana:
Metrics for Multilinguality
Péter Király1
, Juliane Stiller2
, Valentine Charles3
, Werner Bailer4
, Nuno Freire5
1
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
2
Berlin School of Library and Information Science, Humboldt-Universität zu Berlin
3
Europeana Foundation, The Hague
4
Joanneum Research Forschungsgesellschaft mbH, Graz
5
INESC-ID, Lisbon
MTSR 2018 - Track on Cultural Collections and Applications, Limassol, Oct. 24, 2018
1
Nummertjes by Fabio (CC BY-NC 2.0)
Agenda
1. Europeana
2. Multilingual Information in Europeana’s Metadata
3. Multilinguality as a Facet of Quality Dimensions
4. Results
5. Demo
2
Europeana - Platform for
Cultural Heritage Material
www.europeana.eu
○ Books, newspapers, letters, paintings, photographs, radio shows, films,
etc.
○ Text, images, video, audio, sounds, 3D
○ Over 58 million objects
○ > 50 languages
Europeana - Facts
4
Multilingual Information in
Europeana’s Metadata
5
English cultural heritage object:
<dc:language>en</dc:language>
English cultural heritage object:
<dc:language>en</dc:language>
German metadata
Multilinguality on Field Level
<#record> a ore:Proxy ;
dc:subject “Ballet”, “Opera”@en
<#record> a ore:Proxy ; edm:europeanaProxy true ;
dc:subject <http://data.europeana.eu/concept/base/264>.
<http://data.europeana.eu/concept/base/264> a skos:Concept .
skos:prefLabel "Ballett"@no, "बैले"@hi, "Ballett"@de, "Балет"@be, "Балет"@ru,
"Balé"@pt, "Балет"@bg, "Baletas"@lt, "Balet"@hr, "Balets"@lv .
Europeana Dereferencing
Literal, literal with language tag
Processes Contributing to Multilinguality
dc: subject
“subject”@en
dc:creator
<http://vocab.getty.edu/...>
dc:type
<http://voc.example./…>
dc:subject
<http://dbpedia.org/
aSubjectID>
dc:subject
“Subject”
Data from Provider
dc:creator
new labels in
different languages
Data added by Europeana: dereferencing step
Quantifiable: “term”@language annotation
dc:subject
New labels in different
languages
Quantify Multilinguality of Data to:
○ Establish a sense of the multilingual reach of Europeana, incl.
distribution of languages
○ Identify the impact of different workflows / processes on
multilinguality of data
○ Take measures to improve multilinguality in data
○ Devise strategies for underrepresented languages
What Could be Measured?
○ Number of (distinct) languages in the metadata
○ Number of language-tagged literals
○ Tagged literals per language
○ Existence of language information fields such as dc:language
○ Consistency and conformity of language information
Multilinguality as a Facet of
Quality Dimensions
12
Completeness
○ This dimension:
○ expresses the number (fraction) of fields present in a dataset
○ identifies non-empty values in a record or (sub-)collection.
○ Multilingual completeness is captured by:
○ Presence of value in dc:language
○ Share of fields with language tags to overall available fields
Consistency
○ Describes the logical coherence of metadata
○ Assesses variety of language values in the dc:language field:
how many distinct values?
○ Contributes to features like language-based facet
Conformity
○ Describes the conformity to a given standard such as ISO-639-2
○ Example: English is expressed as: English, ENG, en, en-uk, …
○ Share of values that comply or do not comply
Accessibility
○ Access to information and data across languages
○ Distribution of linguistic information in metadata
○ Quantifying the language tag
○ The more language tags, the higher the multilingual reach
Dimensions, Criteria & Measures
Dimension Criteria Measure
Completeness Presence or absence of values in fields
relating to the language of the object or
the metadata
Share of multilingual fields to overall
fields
Presence or absence of dc:language
field
Consistency Variance in language notation Distinct language notations
Conformity Compliance to ISO-639-2 Share of values that comply
Accessibility Accessibility across languages
expressed through language tags
Number of distinct languages
Number of languages/Number of
tagged literals
tagged literals per language
Results
18
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Data processing workflow
web interfacestatistical analysismeasuringingestion
★ OAI-PMH
★ Europeana API
★ Hadoop
★ NoSQL
★ Spark
★ Hadoop
★ Java
★ Apache Solr
★ Spark
★ R
★ PHP
★ D3.js
★ highchart.js
★ NoSQL
json csv json, png html, svg
20
DEMO
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Questions
★ Contact
valentine.charles@europeana.eu
juliane.stiller@ibi.hu-berlin.de
werner.bailer@joanneum.at
peter.kiraly@gwdg.de
nfreire@gmail.com
★ Metadata Quality Assurance Framework
http://144.76.218.178/europeana-qa
★ Europeana Data Quality Committee
https://pro.europeana.eu/project/data-qu
ality-committee
22

More Related Content

PDF
From Collaborative Transcription to Interdisciplinary Education: 
The Postcar...
PPT
Introduction to the Europeana hackathon in Poznan
PDF
The META-NET Strategic Research Agenda for Multilingual Europe 2020
PPT
Europeana @ NISO Bibliographic Roadmap Meeting
PDF
Keynote csws2013
PPT
Europeana vision - Web as Literature 2013
PDF
Connecting political data to media data
PPT
Challenges for the Language Technology Industry
From Collaborative Transcription to Interdisciplinary Education: 
The Postcar...
Introduction to the Europeana hackathon in Poznan
The META-NET Strategic Research Agenda for Multilingual Europe 2020
Europeana @ NISO Bibliographic Roadmap Meeting
Keynote csws2013
Europeana vision - Web as Literature 2013
Connecting political data to media data
Challenges for the Language Technology Industry

What's hot (20)

PPT
Multilingual challenges for accessing digitized culture online - Riga Summit 15
PPT
Multilingual challenges in Europeana
PDF
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
PPT
A portrait of Europeana as a Linked Open Data case
PDF
Talk of Europe: Linked data of the European Parliament
PPTX
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
PDF
Bringing parliamentary debates to the Semantic Web
PPTX
EuropeanaTech update - Europeana AGM 2015
PPTX
NECTAR_VRE1
PDF
2013 05-23-knowledge triangle
PPT
Data modelling at Europeana and DM2E - SMW13
PPT
Semantic Web, Linked Data: the Europeana case(s)
PPT
AAC Education Session
PDF
Building the Biblissima Observatory
PPTX
Enriching Cultural Heritage Data with DBpedia
PPT
Links, languages and semantics: linked data approaches in The European Libra...
PPTX
Stiller & Király, Multilinguality of Metadata
PPT
EHRI Project: Developing a Pan-European Archival Infrastructure for Holocaust...
PPT
Europeana DSI - LT-Accelerate 14
PPTX
Organising a GLAM wiki
Multilingual challenges for accessing digitized culture online - Riga Summit 15
Multilingual challenges in Europeana
L&P Dominique Berube & Tanja Niemann - Usability and Visibility: Adding Value...
A portrait of Europeana as a Linked Open Data case
Talk of Europe: Linked data of the European Parliament
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Bringing parliamentary debates to the Semantic Web
EuropeanaTech update - Europeana AGM 2015
NECTAR_VRE1
2013 05-23-knowledge triangle
Data modelling at Europeana and DM2E - SMW13
Semantic Web, Linked Data: the Europeana case(s)
AAC Education Session
Building the Biblissima Observatory
Enriching Cultural Heritage Data with DBpedia
Links, languages and semantics: linked data approaches in The European Libra...
Stiller & Király, Multilinguality of Metadata
EHRI Project: Developing a Pan-European Archival Infrastructure for Holocaust...
Europeana DSI - LT-Accelerate 14
Organising a GLAM wiki
Ad

Similar to Evaluating Data Quality in Europeana: Metrics for Multilinguality (20)

PPTX
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
PPTX
Data Quality Assessment in Europeana: Metrics for Multilinguality
PDF
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
PDF
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
PDF
Is MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
PDF
E-ARK: Open Data Mining for Government Archives
PPT
Europeana 1914-1918, User-Generated Content and Linked Open Data
PPT
Linked Data and cultural heritage data: an overview of the approaches from Eu...
PPT
2015-11-18 research seminar
PDF
Big (Language) Data – From research strategies to proof-of-concept and implem...
PPTX
Digital humanities in Estonia: digital divide or linguistic isolation?
PDF
Hernani-iCorpora-PosterA1
PDF
Annotated Bibliography Of Language Documentation
PDF
European Language Technologies – Past, Present and Future
PDF
CLARIN Supporting Horizon Europe proposals
PDF
Human Language Technologies in a Multilingual Europe
PDF
Models and Tools for Knowledge Reconstruction
PDF
Session5 03.george rehm
PPT
Framing quality indicators for multilingual repositories of Open Educational ...
PPT
Framing quality indicators for multilingual repositories of Open Educational ...
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Data Quality Assessment in Europeana: Metrics for Multilinguality
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Is MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
E-ARK: Open Data Mining for Government Archives
Europeana 1914-1918, User-Generated Content and Linked Open Data
Linked Data and cultural heritage data: an overview of the approaches from Eu...
2015-11-18 research seminar
Big (Language) Data – From research strategies to proof-of-concept and implem...
Digital humanities in Estonia: digital divide or linguistic isolation?
Hernani-iCorpora-PosterA1
Annotated Bibliography Of Language Documentation
European Language Technologies – Past, Present and Future
CLARIN Supporting Horizon Europe proposals
Human Language Technologies in a Multilingual Europe
Models and Tools for Knowledge Reconstruction
Session5 03.george rehm
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
Ad

More from Juliane Stiller (14)

PDF
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
PDF
KOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
PDF
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
PPTX
Berlin auf dem Weg zu Open Research
PDF
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
PDF
Cross-Lingual Bibliographic Search (CLuBS)
PDF
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
PDF
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
PDF
The Role of Information Literacy for the Integration of Refugees
PDF
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...
PDF
Have You Hired a Refugee? - Hiring Success 2018 Europe
PPTX
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
PPTX
Iconference 2018 stiller trkulja-digital literacy session-27-03
PPTX
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
KOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
Berlin auf dem Weg zu Open Research
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
Cross-Lingual Bibliographic Search (CLuBS)
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
The Role of Information Literacy for the Integration of Refugees
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...
Have You Hired a Refugee? - Hiring Success 2018 Europe
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
Iconference 2018 stiller trkulja-digital literacy session-27-03
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria

Recently uploaded (20)

PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Sciences of Europe No 170 (2025)
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
Microbiology with diagram medical studies .pptx
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
BIOMOLECULES PPT........................
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPT
protein biochemistry.ppt for university classes
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
neck nodes and dissection types and lymph nodes levels
ECG_Course_Presentation د.محمد صقران ppt
Sciences of Europe No 170 (2025)
AlphaEarth Foundations and the Satellite Embedding dataset
microscope-Lecturecjchchchchcuvuvhc.pptx
Microbiology with diagram medical studies .pptx
Viruses (History, structure and composition, classification, Bacteriophage Re...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Phytochemical Investigation of Miliusa longipes.pdf
HPLC-PPT.docx high performance liquid chromatography
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Taita Taveta Laboratory Technician Workshop Presentation.pptx
BIOMOLECULES PPT........................
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
protein biochemistry.ppt for university classes
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Comparative Structure of Integument in Vertebrates.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
neck nodes and dissection types and lymph nodes levels

Evaluating Data Quality in Europeana: Metrics for Multilinguality

  • 1. Evaluating Data Quality in Europeana: Metrics for Multilinguality Péter Király1 , Juliane Stiller2 , Valentine Charles3 , Werner Bailer4 , Nuno Freire5 1 Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen 2 Berlin School of Library and Information Science, Humboldt-Universität zu Berlin 3 Europeana Foundation, The Hague 4 Joanneum Research Forschungsgesellschaft mbH, Graz 5 INESC-ID, Lisbon MTSR 2018 - Track on Cultural Collections and Applications, Limassol, Oct. 24, 2018 1 Nummertjes by Fabio (CC BY-NC 2.0)
  • 2. Agenda 1. Europeana 2. Multilingual Information in Europeana’s Metadata 3. Multilinguality as a Facet of Quality Dimensions 4. Results 5. Demo 2
  • 3. Europeana - Platform for Cultural Heritage Material www.europeana.eu
  • 4. ○ Books, newspapers, letters, paintings, photographs, radio shows, films, etc. ○ Text, images, video, audio, sounds, 3D ○ Over 58 million objects ○ > 50 languages Europeana - Facts 4
  • 6. English cultural heritage object: <dc:language>en</dc:language>
  • 7. English cultural heritage object: <dc:language>en</dc:language> German metadata
  • 8. Multilinguality on Field Level <#record> a ore:Proxy ; dc:subject “Ballet”, “Opera”@en <#record> a ore:Proxy ; edm:europeanaProxy true ; dc:subject <http://data.europeana.eu/concept/base/264>. <http://data.europeana.eu/concept/base/264> a skos:Concept . skos:prefLabel "Ballett"@no, "बैले"@hi, "Ballett"@de, "Балет"@be, "Балет"@ru, "Balé"@pt, "Балет"@bg, "Baletas"@lt, "Balet"@hr, "Balets"@lv . Europeana Dereferencing Literal, literal with language tag
  • 9. Processes Contributing to Multilinguality dc: subject “subject”@en dc:creator <http://vocab.getty.edu/...> dc:type <http://voc.example./…> dc:subject <http://dbpedia.org/ aSubjectID> dc:subject “Subject” Data from Provider dc:creator new labels in different languages Data added by Europeana: dereferencing step Quantifiable: “term”@language annotation dc:subject New labels in different languages
  • 10. Quantify Multilinguality of Data to: ○ Establish a sense of the multilingual reach of Europeana, incl. distribution of languages ○ Identify the impact of different workflows / processes on multilinguality of data ○ Take measures to improve multilinguality in data ○ Devise strategies for underrepresented languages
  • 11. What Could be Measured? ○ Number of (distinct) languages in the metadata ○ Number of language-tagged literals ○ Tagged literals per language ○ Existence of language information fields such as dc:language ○ Consistency and conformity of language information
  • 12. Multilinguality as a Facet of Quality Dimensions 12
  • 13. Completeness ○ This dimension: ○ expresses the number (fraction) of fields present in a dataset ○ identifies non-empty values in a record or (sub-)collection. ○ Multilingual completeness is captured by: ○ Presence of value in dc:language ○ Share of fields with language tags to overall available fields
  • 14. Consistency ○ Describes the logical coherence of metadata ○ Assesses variety of language values in the dc:language field: how many distinct values? ○ Contributes to features like language-based facet
  • 15. Conformity ○ Describes the conformity to a given standard such as ISO-639-2 ○ Example: English is expressed as: English, ENG, en, en-uk, … ○ Share of values that comply or do not comply
  • 16. Accessibility ○ Access to information and data across languages ○ Distribution of linguistic information in metadata ○ Quantifying the language tag ○ The more language tags, the higher the multilingual reach
  • 17. Dimensions, Criteria & Measures Dimension Criteria Measure Completeness Presence or absence of values in fields relating to the language of the object or the metadata Share of multilingual fields to overall fields Presence or absence of dc:language field Consistency Variance in language notation Distinct language notations Conformity Compliance to ISO-639-2 Share of values that comply Accessibility Accessibility across languages expressed through language tags Number of distinct languages Number of languages/Number of tagged literals tagged literals per language
  • 20. Data processing workflow web interfacestatistical analysismeasuringingestion ★ OAI-PMH ★ Europeana API ★ Hadoop ★ NoSQL ★ Spark ★ Hadoop ★ Java ★ Apache Solr ★ Spark ★ R ★ PHP ★ D3.js ★ highchart.js ★ NoSQL json csv json, png html, svg 20
  • 21. DEMO
  • 37. Questions ★ Contact valentine.charles@europeana.eu juliane.stiller@ibi.hu-berlin.de werner.bailer@joanneum.at peter.kiraly@gwdg.de nfreire@gmail.com ★ Metadata Quality Assurance Framework http://144.76.218.178/europeana-qa ★ Europeana Data Quality Committee https://pro.europeana.eu/project/data-qu ality-committee 22