SlideShare a Scribd company logo
Serving DBpedia with DOLCE
More than Just Adding a Cherry on Top
DBpedia Extraction Framework
Mappings Wiki
DOLCE
Heiko Paulheim and Aldo Gangemi
10/13/15 Heiko Paulheim and Aldo Gangemi 2
DOLCE Mappings: Served Since DBpedia 2014
10/13/15 Heiko Paulheim and Aldo Gangemi 3
More than Just a Cherry on Top
• DOLCE adds a layer of formalization
– high level axioms
– additional domain and range restrictions
– fundamental disjointness (e.g., physical object vs. social object)
• Enriches DBpedia ontology
• Can be used for consistency checking
10/13/15 Heiko Paulheim and Aldo Gangemi 4
More than Just a Cherry on Top
Tim Berners-Lee Royal Society
Award
award
range
Description
subclass of
Social Agent
disjoint
with
DBpedia
ontology
DBpedia
instances
DOLCE
ontology
Organisation
is a
Social Person
equivalent class
subclass of
10/13/15 Heiko Paulheim and Aldo Gangemi 5
DBpedia in a Nutshell
• Raw extraction from Wikipedia infoboxes
• Infobox types and keys mapped to ontology
– Crowdsourcing process (aka Mappings Wiki)
– 735 classes
– ~2,800 properties
• Almost no disjointness
– only 24 disjointness axioms
– many of those are corner cases
• MovingWalkway vs. Person
f
10/13/15 Heiko Paulheim and Aldo Gangemi 6
DOLCE in a Nutshell
• A top level ontology
• Defines top level classes and relations
• Including rich axiomatization
10/13/15 Heiko Paulheim and Aldo Gangemi 7
DOLCE in a Nutshell
• Original DOLCE ontologies were too heavy weight
– thus: hardly used on the semantic web
– remember: a little semantics goes a long way!
• DOLCE-Zero
– Simplified version
– Contains both DOLCE and D&S (Descriptions and Situations)
– D&S introduces some high level design patterns
10/13/15 Heiko Paulheim and Aldo Gangemi 8
Systematic vs. Individual Errors in DBpedia
• Systematic errors
– occur frequently following a pattern
– e.g., organizations are frequently used as objects of the relation award
• are likely to have a common root cause
– wrong mapping from infobox to ontology
– error in the extraction code
– ...
Tim Berners-Lee Royal Society
Award
award
range
Description
subclass of
Social Agent
disjoint
with
DBpedia
ontology
DBpedia
instances
DOLCE
ontology
Organisation
is a
Social Person
equivalent class
subclass of
10/13/15 Heiko Paulheim and Aldo Gangemi 9
Overall Workflow
• Overall workflow
• For each statement
– add the statement plus all subject/object types to the ontology
– check consistency
– present inconsistent statements and explanations for inspection
; Reasoning 2
RDF Graph
Statements
+ Types
Inconsistent
Statements
+ explanations
User
10/13/15 Heiko Paulheim and Aldo Gangemi 10
Overall Workflow
• Inspecting a single statement takes 2.6 seconds
– on a standard laptop
• DBpedia 2014 has 15,001,543 statements
– in the dbpedia-owl namespace
→ consistency checking would take 451 days
• Solution
– cache results for signatures (predicate + subject types + object types)
– there are only 34,554 different signatures!
; Reasoning 2
RDF Graph
Statements
+ Types
Inconsistent
Statements
+ explanations
UserCache
10/13/15 Heiko Paulheim and Aldo Gangemi 11
Overall Workflow
• Overall, we find 3,654,255 inconsistent statements (24.4%)
– cf.: only 97,749 (0.7%) without DOLCE
• Too much to inspect!
– We are looking for systematic errors
– Cluster explanations w/ DBSCAN
– Each cluster represents a systematic error
; Reasoning Clustering 2
RDF Graph
Statements
+ Types
Inconsistent
Statements
+ explanations
Clusters
UserCache
10/13/15 Heiko Paulheim and Aldo Gangemi 12
Clustering Inconsistent Statements
• Each explanation as a binary vector
– with 0/1 for all axioms involved in the explanations
– 1,467 axioms (dimensions) in total
• DBSCAN
– Manhattan distance (i.e., number of axioms added/removed)
– MinPts=100 (minimum frequency for a systematic error)
– ε=4 (explanations in a cluster differ by two axioms at most)
10/13/15 Heiko Paulheim and Aldo Gangemi 13
Major Systematic Errors
• Inspection of the top 40 clusters
– they contain 96% of all inconsistent statements
• Overcommitment (19) – using properties in different contexts
– e.g., dbo:team is defined as a relation between persons and
sports teams
– but also used for relating participating teams to events
– fix: relax domain/range constraints, or introduce new properties
• Metonymy (11) – ambiguous language terms
– e.g., instances of dbo:Species contain both species
as well as single animals
– hard to refactor
10/13/15 Heiko Paulheim and Aldo Gangemi 14
Major Systematic Errors
• Misalignment (5) – classes/properties mapped to
the wrong concept in DOLCE
– occasionally occurs if intended and actual use differ
– e.g., dbo:commander is more frequently used with events (e.g., battles)
than military units – d0:hasParticipant rather than d0:coparticipatesWith
– fix: change alignment
• Version branching (3) – semantics of dbo concepts have changed
– e.g., dbo:team in DBpedia 3.9: career stations and teams,
in DBpedia 2014: athletes and teams
– fix: change alignment
10/13/15 Heiko Paulheim and Aldo Gangemi 15
`
A Look at the Long Tail
• DBSCAN identifies clusters and “noise”
– i.e., statements that are not contained in clusters
• Manual inspection of a sample of 100 instances
– 64 are erroneous
– 30 are false negatives (i.e., correct statements)
– 6 are questionable
• Typical error sources in the long tail
– are expected to be cross-cutting, i.e.,
occurring with various classes and properties
10/13/15 Heiko Paulheim and Aldo Gangemi 16
A Look at the Long Tail
• Typical error sources in the long tail
• Link in longer text (23)
– e.g., dbr:Cosmo_Kramer dbo:occupation dbr:Bagel .
• Wrong link in Wikipedia (9)
– e.g., dbr#Stone_(band) dbo:associatedMusicArtist dbr#Dementia .
– Dementia should link to the band, not the disease
• Redirects (7)
– e.g., dbpedia#Ben_Casey dbo:company dbpedia#Bing_Crosby .
– The target Bing_Crosby_Productions redirects to Bing_Crosby
• Links with Anchors (6)
– e.g., dbr:Tim_Berners-Lee dbo:award dbr:Royal_Society .
– the original link target is Royal_Society#Fellows
– anchors are ignored by DBpedia
10/13/15 Heiko Paulheim and Aldo Gangemi 17
Conclusions
• We have shown that
– DOLCE helps identifying inconsistent statements
– Cluster analysis allows for identifying systematic errors
– User interaction is minimized
• we analyzed one statement each from 40 clusters
• corresponding to 3,497,068 affected statements
• Outcomes for future DBpedia versions
– DBpedia ontology changes
– Mapping changes
– DOLCE alignment changes
– Bug reports
Serving DBpedia with DOLCE
More than Just Adding a Cherry on Top
DBpedia Extraction Framework
Mappings Wiki
DOLCE
Heiko Paulheim and Aldo Gangemi

More Related Content

ODP
Fast Approximate A-box Consistency Checking using Machine Learning
ODP
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
PDF
Detecting Incorrect Numerical Data in DBpedia
ODP
Combining Ontology Matchers via Anomaly Detection
PDF
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
PDF
Keynote Beamer
PDF
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
PPT
Importing life science at a into Neo4j
Fast Approximate A-box Consistency Checking using Machine Learning
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Detecting Incorrect Numerical Data in DBpedia
Combining Ontology Matchers via Anomaly Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Keynote Beamer
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
Importing life science at a into Neo4j

Similar to Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top (20)

PDF
20150209 improving the_d_bpedia_ontology_v2
ODP
Type Inference on Noisy RDF Data
PPTX
4V - WP3 Progress Report (TIN2013-46238)
PDF
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
PDF
Link Discovery Tutorial Introduction
PDF
Hala skafkeynote@conferencedata2021
PDF
[Conference] Cognitive Graph Analytics on Company Data and News
PDF
OOPS!: on-line ontology diagnosis by Maria Poveda
PPTX
Wi2015 - Clustering of Linked Open Data - the LODeX tool
PDF
SSSW 2015 Sense Making
ODP
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
PDF
Evolution of the Graph Schema
PDF
Ontologies & linked open data
PPTX
Semantic web meetup – sparql tutorial
PPTX
GDG Meets U event - Big data & Wikidata - no lies codelab
PPTX
Knowledge Graph Introduction
PDF
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PDF
Linked Data, Ontologies and Inference
PDF
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
PDF
DBpedia Ontology and Mapping Problems
20150209 improving the_d_bpedia_ontology_v2
Type Inference on Noisy RDF Data
4V - WP3 Progress Report (TIN2013-46238)
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Link Discovery Tutorial Introduction
Hala skafkeynote@conferencedata2021
[Conference] Cognitive Graph Analytics on Company Data and News
OOPS!: on-line ontology diagnosis by Maria Poveda
Wi2015 - Clustering of Linked Open Data - the LODeX tool
SSSW 2015 Sense Making
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Evolution of the Graph Schema
Ontologies & linked open data
Semantic web meetup – sparql tutorial
GDG Meets U event - Big data & Wikidata - no lies codelab
Knowledge Graph Introduction
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
Linked Data, Ontologies and Inference
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
DBpedia Ontology and Mapping Problems
Ad

More from Heiko Paulheim (20)

PDF
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
PDF
What_do_Knowledge_Graph_Embeddings_Learn.pdf
PDF
New Adventures in RDF2vec
PDF
New Adventures in RDF2vec
PDF
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
PDF
From Wikis to Knowledge Graphs
PDF
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
PPT
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
PPT
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
ODP
Machine Learning & Embeddings for Large Knowledge Graphs
ODP
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
ODP
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
ODP
Make Embeddings Semantic Again!
ODP
How much is a Triple?
ODP
Machine Learning with and for Semantic Web Knowledge Graphs
ODP
Weakly Supervised Learning for Fake News Detection on Twitter
PDF
Towards Knowledge Graph Profiling
ODP
Knowledge Graphs on the Web
PPT
Gathering Alternative Surface Forms for DBpedia Entities
ODP
What the Adoption of schema.org Tells about Linked Open Data
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
What_do_Knowledge_Graph_Embeddings_Learn.pdf
New Adventures in RDF2vec
New Adventures in RDF2vec
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
From Wikis to Knowledge Graphs
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Machine Learning & Embeddings for Large Knowledge Graphs
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Make Embeddings Semantic Again!
How much is a Triple?
Machine Learning with and for Semantic Web Knowledge Graphs
Weakly Supervised Learning for Fake News Detection on Twitter
Towards Knowledge Graph Profiling
Knowledge Graphs on the Web
Gathering Alternative Surface Forms for DBpedia Entities
What the Adoption of schema.org Tells about Linked Open Data
Ad

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Quality review (1)_presentation of this 21
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Lecture1 pattern recognition............
PDF
annual-report-2024-2025 original latest.
PPT
ISS -ESG Data flows What is ESG and HowHow
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Fluorescence-microscope_Botany_detailed content
Supervised vs unsupervised machine learning algorithms
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Miokarditis (Inflamasi pada Otot Jantung)
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction-to-Cloud-ComputingFinal.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Quality review (1)_presentation of this 21
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Lecture1 pattern recognition............
annual-report-2024-2025 original latest.
ISS -ESG Data flows What is ESG and HowHow
Reliability_Chapter_ presentation 1221.5784
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Qualitative Qantitative and Mixed Methods.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top

  • 1. Serving DBpedia with DOLCE More than Just Adding a Cherry on Top DBpedia Extraction Framework Mappings Wiki DOLCE Heiko Paulheim and Aldo Gangemi
  • 2. 10/13/15 Heiko Paulheim and Aldo Gangemi 2 DOLCE Mappings: Served Since DBpedia 2014
  • 3. 10/13/15 Heiko Paulheim and Aldo Gangemi 3 More than Just a Cherry on Top • DOLCE adds a layer of formalization – high level axioms – additional domain and range restrictions – fundamental disjointness (e.g., physical object vs. social object) • Enriches DBpedia ontology • Can be used for consistency checking
  • 4. 10/13/15 Heiko Paulheim and Aldo Gangemi 4 More than Just a Cherry on Top Tim Berners-Lee Royal Society Award award range Description subclass of Social Agent disjoint with DBpedia ontology DBpedia instances DOLCE ontology Organisation is a Social Person equivalent class subclass of
  • 5. 10/13/15 Heiko Paulheim and Aldo Gangemi 5 DBpedia in a Nutshell • Raw extraction from Wikipedia infoboxes • Infobox types and keys mapped to ontology – Crowdsourcing process (aka Mappings Wiki) – 735 classes – ~2,800 properties • Almost no disjointness – only 24 disjointness axioms – many of those are corner cases • MovingWalkway vs. Person f
  • 6. 10/13/15 Heiko Paulheim and Aldo Gangemi 6 DOLCE in a Nutshell • A top level ontology • Defines top level classes and relations • Including rich axiomatization
  • 7. 10/13/15 Heiko Paulheim and Aldo Gangemi 7 DOLCE in a Nutshell • Original DOLCE ontologies were too heavy weight – thus: hardly used on the semantic web – remember: a little semantics goes a long way! • DOLCE-Zero – Simplified version – Contains both DOLCE and D&S (Descriptions and Situations) – D&S introduces some high level design patterns
  • 8. 10/13/15 Heiko Paulheim and Aldo Gangemi 8 Systematic vs. Individual Errors in DBpedia • Systematic errors – occur frequently following a pattern – e.g., organizations are frequently used as objects of the relation award • are likely to have a common root cause – wrong mapping from infobox to ontology – error in the extraction code – ... Tim Berners-Lee Royal Society Award award range Description subclass of Social Agent disjoint with DBpedia ontology DBpedia instances DOLCE ontology Organisation is a Social Person equivalent class subclass of
  • 9. 10/13/15 Heiko Paulheim and Aldo Gangemi 9 Overall Workflow • Overall workflow • For each statement – add the statement plus all subject/object types to the ontology – check consistency – present inconsistent statements and explanations for inspection ; Reasoning 2 RDF Graph Statements + Types Inconsistent Statements + explanations User
  • 10. 10/13/15 Heiko Paulheim and Aldo Gangemi 10 Overall Workflow • Inspecting a single statement takes 2.6 seconds – on a standard laptop • DBpedia 2014 has 15,001,543 statements – in the dbpedia-owl namespace → consistency checking would take 451 days • Solution – cache results for signatures (predicate + subject types + object types) – there are only 34,554 different signatures! ; Reasoning 2 RDF Graph Statements + Types Inconsistent Statements + explanations UserCache
  • 11. 10/13/15 Heiko Paulheim and Aldo Gangemi 11 Overall Workflow • Overall, we find 3,654,255 inconsistent statements (24.4%) – cf.: only 97,749 (0.7%) without DOLCE • Too much to inspect! – We are looking for systematic errors – Cluster explanations w/ DBSCAN – Each cluster represents a systematic error ; Reasoning Clustering 2 RDF Graph Statements + Types Inconsistent Statements + explanations Clusters UserCache
  • 12. 10/13/15 Heiko Paulheim and Aldo Gangemi 12 Clustering Inconsistent Statements • Each explanation as a binary vector – with 0/1 for all axioms involved in the explanations – 1,467 axioms (dimensions) in total • DBSCAN – Manhattan distance (i.e., number of axioms added/removed) – MinPts=100 (minimum frequency for a systematic error) – ε=4 (explanations in a cluster differ by two axioms at most)
  • 13. 10/13/15 Heiko Paulheim and Aldo Gangemi 13 Major Systematic Errors • Inspection of the top 40 clusters – they contain 96% of all inconsistent statements • Overcommitment (19) – using properties in different contexts – e.g., dbo:team is defined as a relation between persons and sports teams – but also used for relating participating teams to events – fix: relax domain/range constraints, or introduce new properties • Metonymy (11) – ambiguous language terms – e.g., instances of dbo:Species contain both species as well as single animals – hard to refactor
  • 14. 10/13/15 Heiko Paulheim and Aldo Gangemi 14 Major Systematic Errors • Misalignment (5) – classes/properties mapped to the wrong concept in DOLCE – occasionally occurs if intended and actual use differ – e.g., dbo:commander is more frequently used with events (e.g., battles) than military units – d0:hasParticipant rather than d0:coparticipatesWith – fix: change alignment • Version branching (3) – semantics of dbo concepts have changed – e.g., dbo:team in DBpedia 3.9: career stations and teams, in DBpedia 2014: athletes and teams – fix: change alignment
  • 15. 10/13/15 Heiko Paulheim and Aldo Gangemi 15 ` A Look at the Long Tail • DBSCAN identifies clusters and “noise” – i.e., statements that are not contained in clusters • Manual inspection of a sample of 100 instances – 64 are erroneous – 30 are false negatives (i.e., correct statements) – 6 are questionable • Typical error sources in the long tail – are expected to be cross-cutting, i.e., occurring with various classes and properties
  • 16. 10/13/15 Heiko Paulheim and Aldo Gangemi 16 A Look at the Long Tail • Typical error sources in the long tail • Link in longer text (23) – e.g., dbr:Cosmo_Kramer dbo:occupation dbr:Bagel . • Wrong link in Wikipedia (9) – e.g., dbr#Stone_(band) dbo:associatedMusicArtist dbr#Dementia . – Dementia should link to the band, not the disease • Redirects (7) – e.g., dbpedia#Ben_Casey dbo:company dbpedia#Bing_Crosby . – The target Bing_Crosby_Productions redirects to Bing_Crosby • Links with Anchors (6) – e.g., dbr:Tim_Berners-Lee dbo:award dbr:Royal_Society . – the original link target is Royal_Society#Fellows – anchors are ignored by DBpedia
  • 17. 10/13/15 Heiko Paulheim and Aldo Gangemi 17 Conclusions • We have shown that – DOLCE helps identifying inconsistent statements – Cluster analysis allows for identifying systematic errors – User interaction is minimized • we analyzed one statement each from 40 clusters • corresponding to 3,497,068 affected statements • Outcomes for future DBpedia versions – DBpedia ontology changes – Mapping changes – DOLCE alignment changes – Bug reports
  • 18. Serving DBpedia with DOLCE More than Just Adding a Cherry on Top DBpedia Extraction Framework Mappings Wiki DOLCE Heiko Paulheim and Aldo Gangemi