SlideShare a Scribd company logo
06/01/17 Heiko Paulheim 1
Data-driven Joint Debugging
of the DBpedia Mappings and Ontology
Towards Addressing the Causes
instead of the Symptoms of Data Quality in DBpedia
Heiko Paulheim
06/01/17 Heiko Paulheim 2
Motivation
• Various works on finding errors in Knowledge Graphs
– 2017 survey: 17 approaches
– 15/17 are evaluated on DBpedia
• Question:
– How does DBpedia benefit
from those works?
￘
H. Paulheim: Knowledge Graph Refinement – A Survey
of Approaches and Evaluation Methods. SWJ 8(3), 2017
06/01/17 Heiko Paulheim 3
Motivation
• What comes out of those research works
– A list of (possibly) wrong statements
– Source code for finding erroneous statements
– ...
06/01/17 Heiko Paulheim 4
Motivation
• Possible option 1: Remove erroneous triples from DBpedia
• Challenges
– May remove correct axioms, may need thresholding
– Needs to be repeated for each release
– Needs to be materialized on all of DBpedia
DBpedia
Extraction
FrameworkWikipedia
DBpedia Mappings Wiki
Post
Filter
06/01/17 Heiko Paulheim 5
Motivation
• Materialized on full DBpedia: 8/15 approaches
06/01/17 Heiko Paulheim 6
Motivation
• Possible option 2: Integrate into DBpedia Extraction Framework
• Challenges
– Development workload
– Some approaches are not fully automated (technically or conceptually)
– Scalability
DBpedia
Extraction
Framework
plus filter
module
Wikipedia
DBpedia Mappings Wiki
06/01/17 Heiko Paulheim 7
Motivation
• Scalability analyzed: 6/15
Disclaimer: does not imply
that it is actually scalable!
06/01/17 Heiko Paulheim 8
Motivation
• Do we have a third option?
– Paulheim & Gangemi (2015): >95% of all inconsistencies in DBpedia
boil down to 40 common root causes
Wikipedia
DBpedia Mappings Wiki
DBpedia
Extraction
Framework
Inconsistency
DetectionIdentification
of suspicious
mappings and
ontology
constructs
H. Paulheim, A. Gangemi: Serving DBpedia with DOLCE – More than Just Adding a Cherry on Top (ISWC 2015)
Disclaimer: not equivalent to
“wrong statements”
06/01/17 Heiko Paulheim 9
Approach
dbr:San_Diego_
County,_California
dbr:Agua_Caliente_
Airport
dbo:operator
foaf:name
dbo:Airport dbo:Settlement
dbo:Place
dbo:Infrastructure
dbo:Architectural-
Structure
dbo:Agent
owl:disjoint
With
rdf:type
rdf:type
“Agua Caliente Airport”
dbo:PopulatedPlace
dbo:Organisation
rdfs:range
Obama
free
Example!
06/01/17 Heiko Paulheim 10
Approach
• Find inconsistencies in extracted statements
– Using DBpedia and DOLCE as top level ontology
• Trace them back to mappings
– In the example, there are three candidates
• Property mapping to the predicate dbo:operator
• Class mapping (subject) to dbo:Airport
• Class mapping (object) to dbo:Settlement
• Unfortunately, provenance information for DBpedia
is not that fine-grained
– i.e., we do not know which mapping was responsible for which
statement in the end
– first step: heuristic reconstruction
06/01/17 Heiko Paulheim 11
Approach: Identifying Mapping Elements
[1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016)
Wikipedia Page
DBpedia Resource
• We use the RML representation of the Mapping Wiki contents [1]
https://www.w3.org/TR/r2rml/
06/01/17 Heiko Paulheim 12
Approach: Identifying Mapping Elements
[1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016)
DBpedia Ontology
Class
• We use the RML representation of the Mapping Wiki contents [1]
https://www.w3.org/TR/r2rml/
06/01/17 Heiko Paulheim 13
Approach: Identifying Mapping Elements
[1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016)
DBpedia Ontology
Property
• We use the RML representation of the Mapping Wiki contents [1]
https://www.w3.org/TR/r2rml/
06/01/17 Heiko Paulheim 14
Approach (ctd.)
• After we heuristically reconstructed the mappings, we can determine
– How often is a mapping element involved in an inconsistency?
– How often is a mapping element used, but not involved in an
inconsistency?
06/01/17 Heiko Paulheim 15
Approach (ctd.)
• Using the two counters cm
and im
, we can compute two scores
for the hypothesis that m is problematic
• Borrowed from Association Rule Mining (support and confidence):
• N is the total number of statements in DBpedia
06/01/17 Heiko Paulheim 16
Identifying Interesting Problems
• Hypothesis: high support and high confidence mapping elements
hint at problems worth investigating
– High support: fixing the issue would fix a lot of individual statements
– High confidence: this mapping element actually hints at the root cause
• i.e., fixing this does not break many other things
• Unfortunately, both come at different scales
– Difficult to use average, harmonic mean or the like
– Support: μ = 0.0002, σ = 0.003
– Confidence: μ = 0.114, σ = 0.260
• Fix: use logarithmic support instead
– LogSupport: μ = 0.179, σ = 0.139
06/01/17 Heiko Paulheim 17
Identifying Interesting Problems (ctd.)
• Inspect mappings that have a high harmonic mean of
confidence and log support
0.25 0.5 0.75
more interesting
06/01/17 Heiko Paulheim 18
Example Findings
• Case 1: Mapping to wrong property
• Example:
– branch in infobox military unit
is mapped to dbo:militaryBranch
• but dbo:militaryBranch
has dbo:Person as its domain
– correction: dbo:commandStructure
– Overall score: 0.721
– Affects 12,172 statements
(31% of all dbo:militaryBranch)
06/01/17 Heiko Paulheim 19
Example Findings
• Case 2: Mappings that should be removed
• Example:
– dbo:picture
– Most of the are inconsistent (64.5% places, 23.0% persons)
– Reason: statements are extracted from picture caption
dbo:Brixton_Academy
dbo:picture
dbo:Brixton .
dbo:Justify_My_Love
dbo:picture
dbo:Madonna_(entertainer) .
06/01/17 Heiko Paulheim 20
Example Findings
• Case 3: Ontology problems (domain/range)
• Example 1:
– Populated places (e.g., cities) are used both as place and organization
– For some properties, the range is either one of the two
• e.g., dbo:operator (see introductory example)
– Polysemy should be reflected in the ontology
• Example 2:
– dbo:architect, dbo:designer, dbo:engineer etc.
have dbo:Person as their range
– Significant fractions (8.6%, 7.6%, 58.4%, resp.)
have a dbo:Organization as object
– Range should be broadened
06/01/17 Heiko Paulheim 21
Example Findings
• Case 4: Missing properties
• Example 1:
– dbo:president links an organization to its president
– Majority use (8,354, or 76.2%):
link a person to the president s/he served for
• Example 2:
– dbo:instrument links an artist
to the instrument s/he plays
– Prominent alternative use (3,828, or 7.2%):
links a genre to its characteristic instrument
Obamaexamplealert!
06/01/17 Heiko Paulheim 22
Future Work
• Classify ontology, mapping, and other errors automatically
– Currently ongoing: using different language editions of DBpedia
• Heuristic:
– problem present in many languages → ontology problem
– Problem present only in one language → mapping problem
• From post-processing to live processing
– e.g., on-the-fly validation in DBpedia Mappings Wiki
06/01/17 Heiko Paulheim 23
Take Aways
• Fixing bugs in knowledge graphs is nice
– But often a one-time solution
– Preserving the efforts is hard
• Proposed solution
– Identify and address the root problem
– Scoring mechanism helps
identifying interesting problems
– Preserving the efforts by eliminating
the root causes
• Provenance matters!
– The more we know about how a statement
gets into a knowledge graph
– The better can we automate the error analysis
06/01/17 Heiko Paulheim 24
Data-driven Joint Debugging
of the DBpedia Mappings and Ontology
Towards Addressing the Causes
instead of the Symptoms of Data Quality in DBpedia
Heiko Paulheim

More Related Content

ODP
Fast Approximate A-box Consistency Checking using Machine Learning
PPT
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
ODP
Knowledge Graphs on the Web
PDF
Detecting Incorrect Numerical Data in DBpedia
PPT
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
PDF
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
ODP
Machine Learning with and for Semantic Web Knowledge Graphs
ODP
Combining Ontology Matchers via Anomaly Detection
Fast Approximate A-box Consistency Checking using Machine Learning
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Knowledge Graphs on the Web
Detecting Incorrect Numerical Data in DBpedia
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Machine Learning with and for Semantic Web Knowledge Graphs
Combining Ontology Matchers via Anomaly Detection

What's hot (8)

PDF
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
PDF
New Adventures in RDF2vec
PDF
From Wikis to Knowledge Graphs
PDF
New Adventures in RDF2vec
PDF
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
PPTX
The Semantic Web - Interacting with the Unknown
PPTX
Timeliner: Early Ideas
PPTX
Timeliner, early ideas
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
New Adventures in RDF2vec
From Wikis to Knowledge Graphs
New Adventures in RDF2vec
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
The Semantic Web - Interacting with the Unknown
Timeliner: Early Ideas
Timeliner, early ideas
Ad

Similar to Data-driven Joint Debugging of the DBpedia Mappings and Ontology (20)

PDF
DBpedia Ontology and Mapping Problems
PDF
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
ODP
Type Inference on Noisy RDF Data
PDF
Assessing and Refining Mappings to RDF to Improve Dataset Quality
PDF
20150209 improving the_d_bpedia_ontology_v2
ODP
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
PPTX
Using Linked Data to Mine RDF from Wikipedia's Tables
PDF
Data translation with SPARQL 1.1
PDF
Mappings Validation
PDF
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
PDF
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...
PDF
User-driven Quality Evaluation of DBpedia
PPTX
An Approach for the Incremental Export of Relational Databases into RDF Graphs
PPTX
4V - WP3 Progress Report (TIN2013-46238)
PDF
Sebastian Hellmann
PDF
Hala skafkeynote@conferencedata2021
PDF
2014.12 - Let's Disco - 2 (EDDI 2014)
PDF
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
PPTX
Semantic web meetup – sparql tutorial
PPT
Integrating Government Data New
DBpedia Ontology and Mapping Problems
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
Type Inference on Noisy RDF Data
Assessing and Refining Mappings to RDF to Improve Dataset Quality
20150209 improving the_d_bpedia_ontology_v2
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
Using Linked Data to Mine RDF from Wikipedia's Tables
Data translation with SPARQL 1.1
Mappings Validation
Sw 3 bizer etal-d bpedia-crystallization-point-jws-preprint
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...
User-driven Quality Evaluation of DBpedia
An Approach for the Incremental Export of Relational Databases into RDF Graphs
4V - WP3 Progress Report (TIN2013-46238)
Sebastian Hellmann
Hala skafkeynote@conferencedata2021
2014.12 - Let's Disco - 2 (EDDI 2014)
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
Semantic web meetup – sparql tutorial
Integrating Government Data New
Ad

More from Heiko Paulheim (15)

PDF
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
PDF
What_do_Knowledge_Graph_Embeddings_Learn.pdf
PPT
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
ODP
Machine Learning & Embeddings for Large Knowledge Graphs
ODP
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
ODP
Make Embeddings Semantic Again!
ODP
How much is a Triple?
ODP
Weakly Supervised Learning for Fake News Detection on Twitter
PDF
Towards Knowledge Graph Profiling
PPT
Gathering Alternative Surface Forms for DBpedia Entities
ODP
What the Adoption of schema.org Tells about Linked Open Data
ODP
Linked Open Data enhanced Knowledge Discovery
ODP
Mining the Web of Linked Data with RapidMiner
ODP
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
ODP
Extending DBpedia with Wikipedia List Pages
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Machine Learning & Embeddings for Large Knowledge Graphs
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Make Embeddings Semantic Again!
How much is a Triple?
Weakly Supervised Learning for Fake News Detection on Twitter
Towards Knowledge Graph Profiling
Gathering Alternative Surface Forms for DBpedia Entities
What the Adoption of schema.org Tells about Linked Open Data
Linked Open Data enhanced Knowledge Discovery
Mining the Web of Linked Data with RapidMiner
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Extending DBpedia with Wikipedia List Pages

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Database Infoormation System (DBIS).pptx
Business Acumen Training GuidePresentation.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Ppt On Nestle.pptx huunnnhhgfvu
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
climate analysis of Dhaka ,Banglades.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Knowledge Engineering Part 1
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Quality review (1)_presentation of this 21
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Data_Analytics_and_PowerBI_Presentation.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to machine learning and Linear Models
oil_refinery_comprehensive_20250804084928 (1).pptx

Data-driven Joint Debugging of the DBpedia Mappings and Ontology

  • 1. 06/01/17 Heiko Paulheim 1 Data-driven Joint Debugging of the DBpedia Mappings and Ontology Towards Addressing the Causes instead of the Symptoms of Data Quality in DBpedia Heiko Paulheim
  • 2. 06/01/17 Heiko Paulheim 2 Motivation • Various works on finding errors in Knowledge Graphs – 2017 survey: 17 approaches – 15/17 are evaluated on DBpedia • Question: – How does DBpedia benefit from those works? ￘ H. Paulheim: Knowledge Graph Refinement – A Survey of Approaches and Evaluation Methods. SWJ 8(3), 2017
  • 3. 06/01/17 Heiko Paulheim 3 Motivation • What comes out of those research works – A list of (possibly) wrong statements – Source code for finding erroneous statements – ...
  • 4. 06/01/17 Heiko Paulheim 4 Motivation • Possible option 1: Remove erroneous triples from DBpedia • Challenges – May remove correct axioms, may need thresholding – Needs to be repeated for each release – Needs to be materialized on all of DBpedia DBpedia Extraction FrameworkWikipedia DBpedia Mappings Wiki Post Filter
  • 5. 06/01/17 Heiko Paulheim 5 Motivation • Materialized on full DBpedia: 8/15 approaches
  • 6. 06/01/17 Heiko Paulheim 6 Motivation • Possible option 2: Integrate into DBpedia Extraction Framework • Challenges – Development workload – Some approaches are not fully automated (technically or conceptually) – Scalability DBpedia Extraction Framework plus filter module Wikipedia DBpedia Mappings Wiki
  • 7. 06/01/17 Heiko Paulheim 7 Motivation • Scalability analyzed: 6/15 Disclaimer: does not imply that it is actually scalable!
  • 8. 06/01/17 Heiko Paulheim 8 Motivation • Do we have a third option? – Paulheim & Gangemi (2015): >95% of all inconsistencies in DBpedia boil down to 40 common root causes Wikipedia DBpedia Mappings Wiki DBpedia Extraction Framework Inconsistency DetectionIdentification of suspicious mappings and ontology constructs H. Paulheim, A. Gangemi: Serving DBpedia with DOLCE – More than Just Adding a Cherry on Top (ISWC 2015) Disclaimer: not equivalent to “wrong statements”
  • 9. 06/01/17 Heiko Paulheim 9 Approach dbr:San_Diego_ County,_California dbr:Agua_Caliente_ Airport dbo:operator foaf:name dbo:Airport dbo:Settlement dbo:Place dbo:Infrastructure dbo:Architectural- Structure dbo:Agent owl:disjoint With rdf:type rdf:type “Agua Caliente Airport” dbo:PopulatedPlace dbo:Organisation rdfs:range Obama free Example!
  • 10. 06/01/17 Heiko Paulheim 10 Approach • Find inconsistencies in extracted statements – Using DBpedia and DOLCE as top level ontology • Trace them back to mappings – In the example, there are three candidates • Property mapping to the predicate dbo:operator • Class mapping (subject) to dbo:Airport • Class mapping (object) to dbo:Settlement • Unfortunately, provenance information for DBpedia is not that fine-grained – i.e., we do not know which mapping was responsible for which statement in the end – first step: heuristic reconstruction
  • 11. 06/01/17 Heiko Paulheim 11 Approach: Identifying Mapping Elements [1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016) Wikipedia Page DBpedia Resource • We use the RML representation of the Mapping Wiki contents [1] https://www.w3.org/TR/r2rml/
  • 12. 06/01/17 Heiko Paulheim 12 Approach: Identifying Mapping Elements [1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016) DBpedia Ontology Class • We use the RML representation of the Mapping Wiki contents [1] https://www.w3.org/TR/r2rml/
  • 13. 06/01/17 Heiko Paulheim 13 Approach: Identifying Mapping Elements [1] Dimou et al.: DBpedia Mappings Quality Assessment (ISWC Poster 2016) DBpedia Ontology Property • We use the RML representation of the Mapping Wiki contents [1] https://www.w3.org/TR/r2rml/
  • 14. 06/01/17 Heiko Paulheim 14 Approach (ctd.) • After we heuristically reconstructed the mappings, we can determine – How often is a mapping element involved in an inconsistency? – How often is a mapping element used, but not involved in an inconsistency?
  • 15. 06/01/17 Heiko Paulheim 15 Approach (ctd.) • Using the two counters cm and im , we can compute two scores for the hypothesis that m is problematic • Borrowed from Association Rule Mining (support and confidence): • N is the total number of statements in DBpedia
  • 16. 06/01/17 Heiko Paulheim 16 Identifying Interesting Problems • Hypothesis: high support and high confidence mapping elements hint at problems worth investigating – High support: fixing the issue would fix a lot of individual statements – High confidence: this mapping element actually hints at the root cause • i.e., fixing this does not break many other things • Unfortunately, both come at different scales – Difficult to use average, harmonic mean or the like – Support: μ = 0.0002, σ = 0.003 – Confidence: μ = 0.114, σ = 0.260 • Fix: use logarithmic support instead – LogSupport: μ = 0.179, σ = 0.139
  • 17. 06/01/17 Heiko Paulheim 17 Identifying Interesting Problems (ctd.) • Inspect mappings that have a high harmonic mean of confidence and log support 0.25 0.5 0.75 more interesting
  • 18. 06/01/17 Heiko Paulheim 18 Example Findings • Case 1: Mapping to wrong property • Example: – branch in infobox military unit is mapped to dbo:militaryBranch • but dbo:militaryBranch has dbo:Person as its domain – correction: dbo:commandStructure – Overall score: 0.721 – Affects 12,172 statements (31% of all dbo:militaryBranch)
  • 19. 06/01/17 Heiko Paulheim 19 Example Findings • Case 2: Mappings that should be removed • Example: – dbo:picture – Most of the are inconsistent (64.5% places, 23.0% persons) – Reason: statements are extracted from picture caption dbo:Brixton_Academy dbo:picture dbo:Brixton . dbo:Justify_My_Love dbo:picture dbo:Madonna_(entertainer) .
  • 20. 06/01/17 Heiko Paulheim 20 Example Findings • Case 3: Ontology problems (domain/range) • Example 1: – Populated places (e.g., cities) are used both as place and organization – For some properties, the range is either one of the two • e.g., dbo:operator (see introductory example) – Polysemy should be reflected in the ontology • Example 2: – dbo:architect, dbo:designer, dbo:engineer etc. have dbo:Person as their range – Significant fractions (8.6%, 7.6%, 58.4%, resp.) have a dbo:Organization as object – Range should be broadened
  • 21. 06/01/17 Heiko Paulheim 21 Example Findings • Case 4: Missing properties • Example 1: – dbo:president links an organization to its president – Majority use (8,354, or 76.2%): link a person to the president s/he served for • Example 2: – dbo:instrument links an artist to the instrument s/he plays – Prominent alternative use (3,828, or 7.2%): links a genre to its characteristic instrument Obamaexamplealert!
  • 22. 06/01/17 Heiko Paulheim 22 Future Work • Classify ontology, mapping, and other errors automatically – Currently ongoing: using different language editions of DBpedia • Heuristic: – problem present in many languages → ontology problem – Problem present only in one language → mapping problem • From post-processing to live processing – e.g., on-the-fly validation in DBpedia Mappings Wiki
  • 23. 06/01/17 Heiko Paulheim 23 Take Aways • Fixing bugs in knowledge graphs is nice – But often a one-time solution – Preserving the efforts is hard • Proposed solution – Identify and address the root problem – Scoring mechanism helps identifying interesting problems – Preserving the efforts by eliminating the root causes • Provenance matters! – The more we know about how a statement gets into a knowledge graph – The better can we automate the error analysis
  • 24. 06/01/17 Heiko Paulheim 24 Data-driven Joint Debugging of the DBpedia Mappings and Ontology Towards Addressing the Causes instead of the Symptoms of Data Quality in DBpedia Heiko Paulheim