SlideShare a Scribd company logo
RTÉ Content Discovery Project
Christophe Debruyne
c.debruyne@ria.ie
christophe.debruyne@insight-centre.orgchristophe.debruyne@insight-centre.org
MDN Workshop -- 4th of June 2014
Outline
• Context
• Goal and Challenges of the RTÉ Content Discovery Project
• Tasks and Data Annotation• Tasks and Data Annotation
• EBU Core – Identification of problems
• Addressing the issues
• Using the ontology
• Conclusions and Recommendations
Context
RTÉ, Ireland's National
Television and Radio
Broadcaster
National trusted digital
repository for Ireland's
social and cultural data.
Centre for Data Analytics
Documents
Television
Radio
Stills
Linking and preserving data
held by Irish Institutions with
central internet access point.
• Standards
• Cataloguing
• Archiving
• Preservation
• Insight @ NUIG = DERI
• Semantic Technologies
• Linked Data
• Data Analytics Platform
Goal of the RTÉ Content
Discovery Project
• Discover implicit knowledge
• across the different archives
• and the Web of Data
• To facilitate internal workflows (e.g., search)
• For wider reuse and repackaging RTÉ’s
Documents
Television
Radio
Stills
• For wider reuse and repackaging RTÉ’s
information
• Challenges
• Heterogeneous databases
• Different guidelines and practices
• Legacy data (from previous systems)
• … “Linking Open Data cloud diagram,
by R. Cyganiak and A. Jentzsch.
http://lod-cloud.net/”
Part of a wider ambition …
OUTCOMES FOR RTÉ
RTÉ Content Discovery
In this presentation we focus on Television and Radio archives
Documents Television Radio Stills
• In this presentation we focus on Television and Radio archives
• The Television and Radio archives
• Are maintained on two different instances of the same system
• A system that is EBU Core “compatible”
• Different content, different guidelines, …
Three main tasks
• Annotate the data.
• Using relevant standards, ontologies and vocabularies.
• Resource Description Framework (RDF).
• Obtain an integrated view of the different archives by
creating links between the RDF representations of RTÉ’s
archival assets across the different archives.
• Apply advanced methods for discovering related data for a
given subject in external sources such as the Linked Data
Cloud.
Data annotation
Relational
Database
D2RQ RDF Dump Triplestore
Television
Radio
Map symbols of
database to
predicates
(relations and
concepts) in
chosen ontologies
/ vocabularies
Use D2RQ to
generate RDF
dump
Store RDF dump in
adequate triple
store (Jena TDB)
Which ontologies?
• Dublin Core, DC Terms
• Foaf
• EBU Core OWL
• …
EBU Core OWL
• The RTÉ Content Discovery platform will rely on Semantic
Web technologies to reason. Ontologies will therefore need
to be correct.
• But … while adopting the EBU Core OWL ontology, several
problems where identified.
• We contacted EBU to resolve these issues.
• We provide an overview of some of these problems.
Problems
• (1) Forgotten concept unions
• The property ebucore:description has multiple domain axioms.
<rdfs:domain rdf:resource="&ebu;BusinessObject"/>
<rdfs:domain rdf:resource="&ebu;MediaResource"/>
• Unintentionally the wrong implicit information can be inferred.• Unintentionally the wrong implicit information can be inferred.
• (2a) Property unsatisfiability – via class axioms
<owl:Class rdf:about="&ebu;BusinessObject">
… <owl:disjointWith rdf:resource="&ebu;Resource"/> …
</owl:Class>
• Because of (1) and (2), the property description could not be
used
Problems
• (2b) Property unsatisfiability – role hierarchies and datatypes
• Duration has the range xsd:string
• The subproperties of duration have other ranges (e.g., double in
the case of duration in edit units)
• Because each subproperty also inherits the range of the• Because each subproperty also inherits the range of the
superproperty, all instances in the object of that property must
be at the same time a string, and a double. This type conflict
results in a contradiction.
• With (2a) and (2b) we identified 40 properties that lead to
problems.
Problems
• (3) Inconsistencies between formal and informal definitions
• BusinessObject is defined as: "An image, a document, an annotation
[…], a tag […], or an audiovisual media resource […]. Other types of
BusinessObjects may be defined as subclasses.“BusinessObjects may be defined as subclasses.“
• Resource is defined as: "A manifestation of a BusinessObject." and
disjoint with BusinessObjects. Meaning no individual can be an
element of BusinessObjects and Resources at the same time.
• The domain of a title is BusinessObject, yet, it’s definition is:
"Specifies the title or name given to the resource. […]"
Problems
• (4) User readable labels
• Many different properties have the same human readable label,
which could confuse the end user – e.g., when generating an
Interface.
• E.g., there were 11 properties with the label “Name”• E.g., there were 11 properties with the label “Name”
• Some properties had empty labels
• (5) Roles – Loss of context
• Agents were related to Business Objects (BO)
• Agents were related to a Role
• But … a role did not relate to agents in relationship with a BO
• This lead to a loss of context.
Addressing the issues
• Problems were addressed over email.
• The discussions are “lost”, traces are only known to us …
• The ontology-engineering activities of EBU Core should adopt
appropriate methods and tools for collaboration.
• Participation of others
• Traceability (!)
• The ontology is still being developed as we go along, and we
have been able to make (parts of it) work…
Using the ontology
2014 06-04-presentation-mdn-2014
Conclusions and Recommendations
• RTÉ Archives aims at a wider reuse and repackaging of their
archival content on digital platforms through the innovative
use of Semantic and Linked Data technologies.
• We adopted the EBU Core OWL ontology for annotating the
television and radio archives, yet identified some issues.
• We adopted the EBU Core OWL ontology for annotating the
television and radio archives, yet identified some issues.
• We collaborated on resolving those issues together with EBU
• However, we feel that appropriate collaborative methods and
tools should be adopted to facilitate the ontology-
engineering process and – more importantly – enable other
to participate AND have visible traceability of the decisions.
References
• D2RQ, http://d2rq.org/
• Digital Repository of Ireland, http://www.dri.ie/
• Insight, http://www.insight-centre.org/• Insight, http://www.insight-centre.org/
• Jena TDB, http://jena.apache.org/documentation/tdb/
• RTÉ Archives, http://www.rte.ie/archives

More Related Content

PPTX
Digitization Basics for Libraries, Archives, and Museums
PPT
Intro to Digitization Projects
PPT
Harmony project - JISC Synthesis meeting 2001
PPTX
Lecture semantic augmentation
PDF
Strathclyde University Geospatial Metadata Workshop 20110531
PDF
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
PPTX
Integrating digital traces into a semantic enriched data
PPTX
Linked Open Data Cloud
Digitization Basics for Libraries, Archives, and Museums
Intro to Digitization Projects
Harmony project - JISC Synthesis meeting 2001
Lecture semantic augmentation
Strathclyde University Geospatial Metadata Workshop 20110531
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
Integrating digital traces into a semantic enriched data
Linked Open Data Cloud

Viewers also liked (7)

PDF
Semantic Interoperation of Information Systems by Evolving Ontologies through...
KEY
Indulge At The Jupiter Hotel 2011
PDF
Publishing open data and services for the Flemish Research Information Space
PDF
Using Semantic Technologies to Create Virtual Families from Historical Vital ...
PDF
Award Maker 4 Teachers
PDF
Handwriting Worksheet Maker
PDF
Nameplate Maker 4 Teachers
Semantic Interoperation of Information Systems by Evolving Ontologies through...
Indulge At The Jupiter Hotel 2011
Publishing open data and services for the Flemish Research Information Space
Using Semantic Technologies to Create Virtual Families from Historical Vital ...
Award Maker 4 Teachers
Handwriting Worksheet Maker
Nameplate Maker 4 Teachers
Ad

Similar to 2014 06-04-presentation-mdn-2014 (20)

PDF
AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...
PPT
A Semantic Multimedia Web (Part 2)
PPT
Querying Heterogeneous Datasets on the Linked Data Web
PDF
Annotation and retrieval module of media fragments
PPTX
EDF2013: Invited Talk Bríd Dooley: Cross-archival content discovery in the di...
PPTX
PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...
PDF
D2.2. Specification of lightweight metadata models for multimedia annotation
PPTX
Interdisciplinary Processes at the Digital Repository of Ireland
PDF
LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...
PDF
A distributional structured semantic space for querying rdf graph data
PPTX
Building a linked data based content discovery service for the RTÉ Archives
PDF
Linked data at the BBC
PPT
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...
PPT
PDF
BBC Linked Data Platform (SemTechBiz San Fran 2013)
PPT
Sandra Collins - Building a linked data based content discovery service for t...
PPTX
Semantics and the Humanities: some lessons from my journey 2000-2012
PPT
User ethnographies: informing requirements specifications for Ireland's, nati...
PDF
How to model digital objects within the semantic web
PDF
Building and using ontologies
AGGREGATING AND ENRICHING AUDIO-VISUAL METADATA USING EBUCORE | Athanasios DR...
A Semantic Multimedia Web (Part 2)
Querying Heterogeneous Datasets on the Linked Data Web
Annotation and retrieval module of media fragments
EDF2013: Invited Talk Bríd Dooley: Cross-archival content discovery in the di...
PERICLES Domain Specific Modelling - ‘Eye of the Storm: Preserving Digital Co...
D2.2. Specification of lightweight metadata models for multimedia annotation
Interdisciplinary Processes at the Digital Repository of Ireland
LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...
A distributional structured semantic space for querying rdf graph data
Building a linked data based content discovery service for the RTÉ Archives
Linked data at the BBC
Large-scale Reasoning with a Complex Cultural Heritage Ontology (CIDOC CRM) ...
BBC Linked Data Platform (SemTechBiz San Fran 2013)
Sandra Collins - Building a linked data based content discovery service for t...
Semantics and the Humanities: some lessons from my journey 2000-2012
User ethnographies: informing requirements specifications for Ireland's, nati...
How to model digital objects within the semantic web
Building and using ontologies
Ad

More from Christophe Debruyne (20)

PPTX
BURPing Through RML Test Cases (presented at KGC Workshop @ ESWC 2024)KG
PPTX
One year of DALIDA Data Literacy Workshops for Adults: a Report
PDF
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
PDF
Knowledge Graphs: Concept, mogelijkheden en aandachtspunten
PDF
Reusable SHACL Constraint Components for Validating Geospatial Linked Data
PDF
Hidden Amongst the Data: the Beyond 2022 Knowledge Graph
PDF
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
PDF
Using Maps for Interlinking Geospatial Linked Data
PDF
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
PDF
Towards Generating Policy-compliant Datasets (poster)
PDF
Towards Generating Policy-compliant Datasets
PDF
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
PDF
Uplift – Generating RDF datasets from non-RDF data with R2RML
PDF
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
PDF
Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
PDF
Serving Ireland's Geospatial Information as Linked Data
PDF
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
PDF
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
PDF
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
PDF
Creating and Consuming Metadata from Transcribed Historical Vital Records for...
BURPing Through RML Test Cases (presented at KGC Workshop @ ESWC 2024)KG
One year of DALIDA Data Literacy Workshops for Adults: a Report
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
Knowledge Graphs: Concept, mogelijkheden en aandachtspunten
Reusable SHACL Constraint Components for Validating Geospatial Linked Data
Hidden Amongst the Data: the Beyond 2022 Knowledge Graph
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Using Maps for Interlinking Geospatial Linked Data
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
Towards Generating Policy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Uplift – Generating RDF datasets from non-RDF data with R2RML
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
Creating and Consuming Metadata from Transcribed Historical Vital Records for...

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation theory and applications.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Big Data Technologies - Introduction.pptx
Encapsulation theory and applications.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
20250228 LYD VKU AI Blended-Learning.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Machine Learning_overview_presentation.pptx
Spectroscopy.pptx food analysis technology
Unlocking AI with Model Context Protocol (MCP)

2014 06-04-presentation-mdn-2014

  • 1. RTÉ Content Discovery Project Christophe Debruyne c.debruyne@ria.ie christophe.debruyne@insight-centre.orgchristophe.debruyne@insight-centre.org MDN Workshop -- 4th of June 2014
  • 2. Outline • Context • Goal and Challenges of the RTÉ Content Discovery Project • Tasks and Data Annotation• Tasks and Data Annotation • EBU Core – Identification of problems • Addressing the issues • Using the ontology • Conclusions and Recommendations
  • 3. Context RTÉ, Ireland's National Television and Radio Broadcaster National trusted digital repository for Ireland's social and cultural data. Centre for Data Analytics Documents Television Radio Stills Linking and preserving data held by Irish Institutions with central internet access point. • Standards • Cataloguing • Archiving • Preservation • Insight @ NUIG = DERI • Semantic Technologies • Linked Data • Data Analytics Platform
  • 4. Goal of the RTÉ Content Discovery Project • Discover implicit knowledge • across the different archives • and the Web of Data • To facilitate internal workflows (e.g., search) • For wider reuse and repackaging RTÉ’s Documents Television Radio Stills • For wider reuse and repackaging RTÉ’s information • Challenges • Heterogeneous databases • Different guidelines and practices • Legacy data (from previous systems) • … “Linking Open Data cloud diagram, by R. Cyganiak and A. Jentzsch. http://lod-cloud.net/”
  • 5. Part of a wider ambition …
  • 7. RTÉ Content Discovery In this presentation we focus on Television and Radio archives Documents Television Radio Stills • In this presentation we focus on Television and Radio archives • The Television and Radio archives • Are maintained on two different instances of the same system • A system that is EBU Core “compatible” • Different content, different guidelines, …
  • 8. Three main tasks • Annotate the data. • Using relevant standards, ontologies and vocabularies. • Resource Description Framework (RDF). • Obtain an integrated view of the different archives by creating links between the RDF representations of RTÉ’s archival assets across the different archives. • Apply advanced methods for discovering related data for a given subject in external sources such as the Linked Data Cloud.
  • 9. Data annotation Relational Database D2RQ RDF Dump Triplestore Television Radio Map symbols of database to predicates (relations and concepts) in chosen ontologies / vocabularies Use D2RQ to generate RDF dump Store RDF dump in adequate triple store (Jena TDB) Which ontologies? • Dublin Core, DC Terms • Foaf • EBU Core OWL • …
  • 10. EBU Core OWL • The RTÉ Content Discovery platform will rely on Semantic Web technologies to reason. Ontologies will therefore need to be correct. • But … while adopting the EBU Core OWL ontology, several problems where identified. • We contacted EBU to resolve these issues. • We provide an overview of some of these problems.
  • 11. Problems • (1) Forgotten concept unions • The property ebucore:description has multiple domain axioms. <rdfs:domain rdf:resource="&ebu;BusinessObject"/> <rdfs:domain rdf:resource="&ebu;MediaResource"/> • Unintentionally the wrong implicit information can be inferred.• Unintentionally the wrong implicit information can be inferred. • (2a) Property unsatisfiability – via class axioms <owl:Class rdf:about="&ebu;BusinessObject"> … <owl:disjointWith rdf:resource="&ebu;Resource"/> … </owl:Class> • Because of (1) and (2), the property description could not be used
  • 12. Problems • (2b) Property unsatisfiability – role hierarchies and datatypes • Duration has the range xsd:string • The subproperties of duration have other ranges (e.g., double in the case of duration in edit units) • Because each subproperty also inherits the range of the• Because each subproperty also inherits the range of the superproperty, all instances in the object of that property must be at the same time a string, and a double. This type conflict results in a contradiction. • With (2a) and (2b) we identified 40 properties that lead to problems.
  • 13. Problems • (3) Inconsistencies between formal and informal definitions • BusinessObject is defined as: "An image, a document, an annotation […], a tag […], or an audiovisual media resource […]. Other types of BusinessObjects may be defined as subclasses.“BusinessObjects may be defined as subclasses.“ • Resource is defined as: "A manifestation of a BusinessObject." and disjoint with BusinessObjects. Meaning no individual can be an element of BusinessObjects and Resources at the same time. • The domain of a title is BusinessObject, yet, it’s definition is: "Specifies the title or name given to the resource. […]"
  • 14. Problems • (4) User readable labels • Many different properties have the same human readable label, which could confuse the end user – e.g., when generating an Interface. • E.g., there were 11 properties with the label “Name”• E.g., there were 11 properties with the label “Name” • Some properties had empty labels • (5) Roles – Loss of context • Agents were related to Business Objects (BO) • Agents were related to a Role • But … a role did not relate to agents in relationship with a BO • This lead to a loss of context.
  • 15. Addressing the issues • Problems were addressed over email. • The discussions are “lost”, traces are only known to us … • The ontology-engineering activities of EBU Core should adopt appropriate methods and tools for collaboration. • Participation of others • Traceability (!) • The ontology is still being developed as we go along, and we have been able to make (parts of it) work…
  • 18. Conclusions and Recommendations • RTÉ Archives aims at a wider reuse and repackaging of their archival content on digital platforms through the innovative use of Semantic and Linked Data technologies. • We adopted the EBU Core OWL ontology for annotating the television and radio archives, yet identified some issues. • We adopted the EBU Core OWL ontology for annotating the television and radio archives, yet identified some issues. • We collaborated on resolving those issues together with EBU • However, we feel that appropriate collaborative methods and tools should be adopted to facilitate the ontology- engineering process and – more importantly – enable other to participate AND have visible traceability of the decisions.
  • 19. References • D2RQ, http://d2rq.org/ • Digital Repository of Ireland, http://www.dri.ie/ • Insight, http://www.insight-centre.org/• Insight, http://www.insight-centre.org/ • Jena TDB, http://jena.apache.org/documentation/tdb/ • RTÉ Archives, http://www.rte.ie/archives