SlideShare a Scribd company logo
Aggregation workflow
Cécile Devarenne
Operations Officer
Metadata training, Europeana Sounds project
Athens, 23rd/24th of October 2014
Content
• Europeana's aggregation team
• Europeana Publication Policy
• Aggregation workflow
• Submission deadlines
• Ingestion processes and tools
• Acceptance criteria and Europeana validation of data
• Guidance and help – Europeana pro
• Future plans for aggregation workflow
Europeana’s aggregation team:
who are we?
Europeana’s aggregation team
• Partner relationships, business development, administration
• Henning Scholz, Joris Pekel, Gina Van der Linden
• Technical support
• Operations officers: content@europeana.eu
• Data support, feedback and ingestion of your collections into
Europeana portal and API
Europeana Publication Policy
Europeana Publication Policy
Clear criteria for acceptance or decline of metadata for publication and for take
down of legacy metadata from the Europeana database
•Ingestion workflow (deadlines, timelines, prioritization)
•Content scope (what is a digital object? what content does Europeana
aggregate?)
•Technical validation of metadata quality (expected values)
•Metadata licensing (CC0)
•Rights Statements for digital objects
• All digital objects with valid edm:rights chosen from http://
pro.europeana.eu/web/guest/available-rights-statements
• Public Domain material labelled with the Public Domain mark in
edm:rights
• edm:rights & dc:rights not in contradiction
Aggregation workflow and
submission deadlines: how does it
work?
8
Submission of data: preliminary steps
for your project
• (1) Data Exchange Agreement to Europeana (DEA)
• Europeana Sounds project needs to submit the signed Data
Exchange Agreements for each contributing data provider
• The Europeana Data Exchange Agreement establishes the terms
under which Europeana can make use of the previews and descriptive
metadata provided by cultural institution
• More information to be found here: http://pro.europeana.eu/ensuring-
permissions-for-aggregators
• (2) Data contribution form
• One form for the whole project
• General information on data to be submitted to Europeana
• Schedule of data delivery: ingestion planning
• (3) Submission of data samples and feedback taken into account
Submission of data: (4) publication cycles
• Operations officers work on a monthly cycle
• Submission of data in the form of datasets: a coherent batch of records,
for the Europeana Sounds project, probably one dataset for each of your
data providers
• A dataset takes on average 40 mins to process
• Around 200 datasets are processed by the Operations officers for each
cycle of publication
• Datasets go through a full flow of operations before they are production
ready
• Datasets need to be submitted on time in order for this production cycle to
work
• Datasets are submitted by the technical/content coordinators of your
project
• The earlier you submit datasets the more feedback we can give!
Submission of data: new provider timeline
Submission of data: regular ingestion
cycle timeline
Ingestion processes and tools:
what happens to your data when
submitted to Europeana?
Europeana’s set of ingestion tools
• Unified Ingestion Manager (UIM): orchestrator of data flows triggered in
various tools and plugins
• SugarCRM (Customer Relationship Management): reference entries for
datasets and organisations
• REPOX: harvester to get the collections uploaded into Europeana
• Europeana’s instance of Mint (Metadata INTeroperability): mapping and
editing tool for ingested datasets
• Data plugins
• Itemization, Europeana identifiers generation
• Dereferencing
• Enrichment
• Redirects
• Extraction of hierarchies
• Thumbnails caching
Europeana ingestion data flows
Steps to get data ingested
From the moment your data was submitted:
• Checks on raw xml (Browser)
• Prior to harvesting
• Identification of key issues
• Creation/update of dataset information, checks on validity of the supplied
harvesting information (SugarCRM - REPOX)
• Harvesting (REPOX)
• Mapping/editing of datasets (Europeana Mint)
• Mapping tool for all datasets
• Adapted for Europeana in order to process multiple formats (EDM,
ESE, any metadata standard with provided XSLT)
• Drag and drop appropriate elements
• Quality checks and data cleaning if necessary
• Transformation and validation of records according to EDM schema
and schematron rules
• EDM Internal data: Europeana ready material
Steps to get data ingested
• Operations on data following transformation:
• Itemization and creation/management of Europeana identifiers for
permalinks to your records in Europeana
• Extraction of hierarchies for datasets including EDM hierarchies
• Thumbnails caching
• Enrichments of data:
• From links to linked data exposed ontologies, generation of
additional contextual data (dereferencing)
• From analysis of the provided data, automated semantic
enrichment (Europeana enrichment)
• If necessary (when a change of identifiers was communicated to
Europeana), creation of redirections between previous and newly
generated identifiers
• Data ready! monthly deploy on Europeana portal and API
Acceptance criteria: how exactly is
the Publication Policy
implemented?
Acceptance criteria
• Data Exchange Agreement to Europeana
• Datasets submitted via OAI-PMH protocol, FTP or file
• Metadata are accepted for publication after the feedback of the
Europeana Operations Officers
• EDM schema and guidelines
• Rights labeling
• Datasets are prioritized for publication if the edm:rights in the majority of
the metadata of the dataset is PDM, CC0, CC BY or CC BY-SA
Automatic validation:
• Validation according to the EDM schema
• Validation of the mandatory properties
• Unique identifiers within a dataset
• Metadata records that don’t meet this validation are invalidated or
discarded
• Providers can fix issues first and resubmit or let Europeana ingest the
records that are valid, and fix the invalid records at a later stage
• Validation of urls for thumbnail creation (ImageMagick)
Europeana validation
Applicable class Mandatory Properties (or alternatives)
Aggregation edm:dataProvider
Aggregation edm:isShownAt or edm:isShownBy
Aggregation edm:provider
Aggregation edm:rights
Aggregation edm:aggregatedCHO
Aggregation edm:ugc (when applicable)
ProvidedCHO dc:title or dc:description
ProvidedCHO dc:language for text objects
ProvidedCHO
dc:subject or dc:type or dc:coverage or
dcterms:spatial
ProvidedCHO edm:type
Mandatory properties
Validation by the Operations officers:
• Feedback is according to the EDM schema and guidelines
• Checks on the connections between the EDM classes and the general
structure of the data
• Correct use of vocabularies, recommendations to include geolocations
• Checks on the types of values: literals vs resources (e.g. a thumbnail
always need to be a valid url)
• Checks on links to digital representations of the objects; if direct links to
a file, check that they are of reasonable size
• Provision of thumbnails highly recommended
• Feedback on (near) duplicate records
• Feedback on rights statements in edm:rights and dc:rights
• Feedback on any other metadata quality related matters (duplication of
properties, encoding in the data, wrongly mapped properties, etc.)
Europeana validation
• The data is represented according to expectations for both sides
• Users can search and retrieve rich content:
• Developers can make the best use of the API
• Objects are clicked through and re-used from the Europeana portal
Happy ingestion :-)
Happy ingestion :-)
Guidance and help
Guidance and help 



Europeana Professional:

http://pro.europeana.eu/provide-data



Content inbox – for all ingestion & metadata related matters 

content@europeana.eu
Questions?
Future plans for aggregation
workflow
Future plans for aggregation workflow
• Future plans to open up part of the Europeana ingestion workflow to
providers
• Providers can log-in the Europeana ingestion suite, identify the
aggregator/project they work for
• Providers can select the datasets they want to update, or add new
datasets
• Providers can upload their data (OAI-PMH and FTP protocols)
• Providers can map their data to EDM, or edit data that is already EDM
• Providers can validate the data against the EDM schema and preview
them prior to submission
• Other processes being considered for refactoring: semantic validation, link
checking, thumbnail caching, enrichment
Future plans for aggregation workflow
• Benefits for providers:
• Possibility to map to EDM
• Validation according to the EDM schema (with schematron rules we
implemented)
• Preview before publication
• Self service, less dependent on Europeana, saving time (you can do
many steps yourself, and you spot errors earlier)
• Benefits for Europeana:
• Operations scaled up – amount of projects, aggregators and therefore
datasets has grown exponentially in the last years
• More focus on EDM modeling and metadata related questions
• Ingestion process transparent and more connected to the process at
aggregators side
Thank you!
Cécile Devarenne
cecile.devarenne@europeana.eu or content@europeana.eu

More Related Content

PDF
Europeana aggregation workflow
PPSX
Data Interoperabilty Extension
PPTX
Hadoop Operations - Past, Present, and Future
PDF
Aggregator of Financial Services
PPT
Business To Consumer Marketing (B2 C)
PPT
Aggregator Site
PPT
Aggregation Workflow at Europeana Aggregator Forum
PDF
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015
Europeana aggregation workflow
Data Interoperabilty Extension
Hadoop Operations - Past, Present, and Future
Aggregator of Financial Services
Business To Consumer Marketing (B2 C)
Aggregator Site
Aggregation Workflow at Europeana Aggregator Forum
Ingestion workflows. Presentation at the Europeana Aggregator Forum 2015

Similar to Aggregation workflow (20)

PPT
Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...
PPT
Chiara Latronico,Europeana Cloud - Ingestion Clinic, The European Library
PPT
Europeana Cloud - Ingestion and Aggregation Workshop
PPT
Europeana @ NISO Bibliographic Roadmap Meeting
PPTX
Archaeology in Europeana quality assurance, enrichment and publishing
PPT
Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...
PPTX
Results of aggregator needs europeana cloud
PDF
Eudat research data management services | www.eudat.eu |
PPTX
OSFair2017 Workshop | Service provisioning for excellent sciences
PPTX
The Europeana Data Model Principles, community and innovation
PDF
Metadata ingestion plan presentation
PPT
Europeana and open data
PPTX
EDI Training Module 4: Organizing Data Into Publishable Units
PPTX
Improving data quality at Europeana (SWIB 2016)
PDF
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
PDF
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
PPTX
Europeana as a Linked Data (Quality) case
PPTX
Aggravated by Aggregation by Valentine Charles - EuropeanaTech Conference 2018
PPT
EDM - American Art Collaborative LOD Meeting
PPTX
B2FIND Integration | www.eudat.eu |
Chiara Latronico, Europeana Cloud - Ingestion and Aggregation Workshop, The E...
Chiara Latronico,Europeana Cloud - Ingestion Clinic, The European Library
Europeana Cloud - Ingestion and Aggregation Workshop
Europeana @ NISO Bibliographic Roadmap Meeting
Archaeology in Europeana quality assurance, enrichment and publishing
Chiara latronico, Europeana Collections 1914-1918 - Ingestion and Aggregation...
Results of aggregator needs europeana cloud
Eudat research data management services | www.eudat.eu |
OSFair2017 Workshop | Service provisioning for excellent sciences
The Europeana Data Model Principles, community and innovation
Metadata ingestion plan presentation
Europeana and open data
EDI Training Module 4: Organizing Data Into Publishable Units
Improving data quality at Europeana (SWIB 2016)
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
Europeana as a Linked Data (Quality) case
Aggravated by Aggregation by Valentine Charles - EuropeanaTech Conference 2018
EDM - American Art Collaborative LOD Meeting
B2FIND Integration | www.eudat.eu |
Ad

More from Europeana_Sounds (20)

PDF
The Europeana Sounds Music Information Retrieval Pilot
PDF
Semantic Enrichment & Crowdsourcing
PPTX
Crowdsourcing and Semantic Enrichments for European Cultural Heritage
PPTX
Data processing for digital libraries: the experience of the BnF with Europea...
PDF
Treasuring the sound heritage: the Europeana Sounds project
PDF
Europeana Sounds: improving access to Europe’s digital audio archives
PPTX
Challenges on modeling annotations in the europeana sounds project
PPTX
A virtual jukebox for europe's sound heritage
PDF
Creating legal access to sound heritage
PDF
The Future of Historic Sounds – a prelude
PPTX
Europeana sounds in a nutshell (August 2015)
PPT
Aggregation status on Year 1
PDF
Publication of Europeana Sounds data in Europeana
PDF
EDM for Europeana Sounds
PPTX
Recap of the previous training session
PPTX
Short introduction to RDF model based on the EDM sounds profile
PPTX
Advanced mappings
PPTX
Europeana publication
PPTX
Europeana sounds in a nutshell (June 2015)
PPTX
Europeana Sounds præsentation (in Danish)
The Europeana Sounds Music Information Retrieval Pilot
Semantic Enrichment & Crowdsourcing
Crowdsourcing and Semantic Enrichments for European Cultural Heritage
Data processing for digital libraries: the experience of the BnF with Europea...
Treasuring the sound heritage: the Europeana Sounds project
Europeana Sounds: improving access to Europe’s digital audio archives
Challenges on modeling annotations in the europeana sounds project
A virtual jukebox for europe's sound heritage
Creating legal access to sound heritage
The Future of Historic Sounds – a prelude
Europeana sounds in a nutshell (August 2015)
Aggregation status on Year 1
Publication of Europeana Sounds data in Europeana
EDM for Europeana Sounds
Recap of the previous training session
Short introduction to RDF model based on the EDM sounds profile
Advanced mappings
Europeana publication
Europeana sounds in a nutshell (June 2015)
Europeana Sounds præsentation (in Danish)
Ad

Recently uploaded (20)

PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Hybrid model detection and classification of lung cancer
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
1. Introduction to Computer Programming.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A comparative study of natural language inference in Swahili using monolingua...
Unlocking AI with Model Context Protocol (MCP)
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Hybrid model detection and classification of lung cancer
Heart disease approach using modified random forest and particle swarm optimi...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Mushroom cultivation and it's methods.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Group 1 Presentation -Planning and Decision Making .pptx
Getting Started with Data Integration: FME Form 101
SOPHOS-XG Firewall Administrator PPT.pptx
NewMind AI Weekly Chronicles - August'25-Week II
1. Introduction to Computer Programming.pptx
A comparative analysis of optical character recognition models for extracting...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Encapsulation theory and applications.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
Zenith AI: Advanced Artificial Intelligence
Accuracy of neural networks in brain wave diagnosis of schizophrenia

Aggregation workflow

  • 1. Aggregation workflow Cécile Devarenne Operations Officer Metadata training, Europeana Sounds project Athens, 23rd/24th of October 2014
  • 2. Content • Europeana's aggregation team • Europeana Publication Policy • Aggregation workflow • Submission deadlines • Ingestion processes and tools • Acceptance criteria and Europeana validation of data • Guidance and help – Europeana pro • Future plans for aggregation workflow
  • 4. Europeana’s aggregation team • Partner relationships, business development, administration • Henning Scholz, Joris Pekel, Gina Van der Linden • Technical support • Operations officers: content@europeana.eu • Data support, feedback and ingestion of your collections into Europeana portal and API
  • 6. Europeana Publication Policy Clear criteria for acceptance or decline of metadata for publication and for take down of legacy metadata from the Europeana database •Ingestion workflow (deadlines, timelines, prioritization) •Content scope (what is a digital object? what content does Europeana aggregate?) •Technical validation of metadata quality (expected values) •Metadata licensing (CC0) •Rights Statements for digital objects • All digital objects with valid edm:rights chosen from http:// pro.europeana.eu/web/guest/available-rights-statements • Public Domain material labelled with the Public Domain mark in edm:rights • edm:rights & dc:rights not in contradiction
  • 7. Aggregation workflow and submission deadlines: how does it work?
  • 8. 8
  • 9. Submission of data: preliminary steps for your project • (1) Data Exchange Agreement to Europeana (DEA) • Europeana Sounds project needs to submit the signed Data Exchange Agreements for each contributing data provider • The Europeana Data Exchange Agreement establishes the terms under which Europeana can make use of the previews and descriptive metadata provided by cultural institution • More information to be found here: http://pro.europeana.eu/ensuring- permissions-for-aggregators • (2) Data contribution form • One form for the whole project • General information on data to be submitted to Europeana • Schedule of data delivery: ingestion planning • (3) Submission of data samples and feedback taken into account
  • 10. Submission of data: (4) publication cycles • Operations officers work on a monthly cycle • Submission of data in the form of datasets: a coherent batch of records, for the Europeana Sounds project, probably one dataset for each of your data providers • A dataset takes on average 40 mins to process • Around 200 datasets are processed by the Operations officers for each cycle of publication • Datasets go through a full flow of operations before they are production ready • Datasets need to be submitted on time in order for this production cycle to work • Datasets are submitted by the technical/content coordinators of your project • The earlier you submit datasets the more feedback we can give!
  • 11. Submission of data: new provider timeline
  • 12. Submission of data: regular ingestion cycle timeline
  • 13. Ingestion processes and tools: what happens to your data when submitted to Europeana?
  • 14. Europeana’s set of ingestion tools • Unified Ingestion Manager (UIM): orchestrator of data flows triggered in various tools and plugins • SugarCRM (Customer Relationship Management): reference entries for datasets and organisations • REPOX: harvester to get the collections uploaded into Europeana • Europeana’s instance of Mint (Metadata INTeroperability): mapping and editing tool for ingested datasets • Data plugins • Itemization, Europeana identifiers generation • Dereferencing • Enrichment • Redirects • Extraction of hierarchies • Thumbnails caching
  • 16. Steps to get data ingested From the moment your data was submitted: • Checks on raw xml (Browser) • Prior to harvesting • Identification of key issues • Creation/update of dataset information, checks on validity of the supplied harvesting information (SugarCRM - REPOX) • Harvesting (REPOX) • Mapping/editing of datasets (Europeana Mint) • Mapping tool for all datasets • Adapted for Europeana in order to process multiple formats (EDM, ESE, any metadata standard with provided XSLT) • Drag and drop appropriate elements • Quality checks and data cleaning if necessary • Transformation and validation of records according to EDM schema and schematron rules • EDM Internal data: Europeana ready material
  • 17. Steps to get data ingested • Operations on data following transformation: • Itemization and creation/management of Europeana identifiers for permalinks to your records in Europeana • Extraction of hierarchies for datasets including EDM hierarchies • Thumbnails caching • Enrichments of data: • From links to linked data exposed ontologies, generation of additional contextual data (dereferencing) • From analysis of the provided data, automated semantic enrichment (Europeana enrichment) • If necessary (when a change of identifiers was communicated to Europeana), creation of redirections between previous and newly generated identifiers • Data ready! monthly deploy on Europeana portal and API
  • 18. Acceptance criteria: how exactly is the Publication Policy implemented?
  • 19. Acceptance criteria • Data Exchange Agreement to Europeana • Datasets submitted via OAI-PMH protocol, FTP or file • Metadata are accepted for publication after the feedback of the Europeana Operations Officers • EDM schema and guidelines • Rights labeling • Datasets are prioritized for publication if the edm:rights in the majority of the metadata of the dataset is PDM, CC0, CC BY or CC BY-SA
  • 20. Automatic validation: • Validation according to the EDM schema • Validation of the mandatory properties • Unique identifiers within a dataset • Metadata records that don’t meet this validation are invalidated or discarded • Providers can fix issues first and resubmit or let Europeana ingest the records that are valid, and fix the invalid records at a later stage • Validation of urls for thumbnail creation (ImageMagick) Europeana validation
  • 21. Applicable class Mandatory Properties (or alternatives) Aggregation edm:dataProvider Aggregation edm:isShownAt or edm:isShownBy Aggregation edm:provider Aggregation edm:rights Aggregation edm:aggregatedCHO Aggregation edm:ugc (when applicable) ProvidedCHO dc:title or dc:description ProvidedCHO dc:language for text objects ProvidedCHO dc:subject or dc:type or dc:coverage or dcterms:spatial ProvidedCHO edm:type Mandatory properties
  • 22. Validation by the Operations officers: • Feedback is according to the EDM schema and guidelines • Checks on the connections between the EDM classes and the general structure of the data • Correct use of vocabularies, recommendations to include geolocations • Checks on the types of values: literals vs resources (e.g. a thumbnail always need to be a valid url) • Checks on links to digital representations of the objects; if direct links to a file, check that they are of reasonable size • Provision of thumbnails highly recommended • Feedback on (near) duplicate records • Feedback on rights statements in edm:rights and dc:rights • Feedback on any other metadata quality related matters (duplication of properties, encoding in the data, wrongly mapped properties, etc.) Europeana validation
  • 23. • The data is represented according to expectations for both sides • Users can search and retrieve rich content: • Developers can make the best use of the API • Objects are clicked through and re-used from the Europeana portal Happy ingestion :-)
  • 26. Guidance and help 
 
 Europeana Professional:
 http://pro.europeana.eu/provide-data
 
 Content inbox – for all ingestion & metadata related matters 
 content@europeana.eu
  • 28. Future plans for aggregation workflow
  • 29. Future plans for aggregation workflow • Future plans to open up part of the Europeana ingestion workflow to providers • Providers can log-in the Europeana ingestion suite, identify the aggregator/project they work for • Providers can select the datasets they want to update, or add new datasets • Providers can upload their data (OAI-PMH and FTP protocols) • Providers can map their data to EDM, or edit data that is already EDM • Providers can validate the data against the EDM schema and preview them prior to submission • Other processes being considered for refactoring: semantic validation, link checking, thumbnail caching, enrichment
  • 30. Future plans for aggregation workflow • Benefits for providers: • Possibility to map to EDM • Validation according to the EDM schema (with schematron rules we implemented) • Preview before publication • Self service, less dependent on Europeana, saving time (you can do many steps yourself, and you spot errors earlier) • Benefits for Europeana: • Operations scaled up – amount of projects, aggregators and therefore datasets has grown exponentially in the last years • More focus on EDM modeling and metadata related questions • Ingestion process transparent and more connected to the process at aggregators side