SlideShare a Scribd company logo
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Source: http://lod-cloud.net/versions/2011-09-19/lod-cloud_colored.png
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
QA systems
Quality
assessment
of the LOD
datasets
The answer lies here!
•
•
Digging into the QA system
Typical IR system performances
measures
● Overall Performance
○ F1
○ Precision
○ Recall
Digging into the QA system
Data & Component/Module
oriented measures
● Search & retrieval module
○ Indexer
○ Retriever
● Preprocessing / Linguistic
○ NLP - POS tags, NER, etc
○ Entity linking & annotation - semantics
○ Relation extraction & annotation
● Query formulation
○ SPARQL conversion
● Datasource/knowledge base
○ Data
Typical IR system performances
measures
● Overall Performance
○ F1
○ Precision
○ Recall
Digging into the QA system
Data & Component/Module
oriented measures
● Search & retrieval module
○ Indexer
■ Top K words accuracy; P@10,
P@1000, etc
○ Retriever
■ Ranking, Re-ranking, MRR, etc
● Preprocessing / Linguistic
○ NLP - POS tags, NER, etc
○ Entity linking & annotation - semantics
○ Relation extraction & annotation
■ annotation accuracy/precision
■ consistency, interlinking, etc
● Query formulation
○ SPARQL conversion
■ conversion accuracy/precision
● Datasource/knowledge base
○ Completeness
○ Data diversity
○ Trust and Provenance
○ Coverage
○ Timeliness (up to date)
○ etc
Typical IR system performances
measures
● Overall Performance
○ F1
○ Precision
○ Recall
Digging into the QA system
Data & Component/Module
oriented measures
● Search & retrieval module
○ Indexer
■ Top K words accuracy; P@10,
P@1000, etc
○ Retriever
■ Ranking, Re-ranking, MRR, etc
● Preprocessing / Linguistic
○ NLP - POS tags, NER, etc
○ Entity linking & annotation - semantics
○ Relation extraction & annotation
■ annotation accuracy/precision
■ consistency, interlinking, etc
● Query formulation
○ SPARQL conversion
■ conversion accuracy/precision
● Datasource/Knowledge base
○ Completeness
○ Data diversity
○ Trust and Provenance
○ Coverage
○ Timeliness (up to date)
○ etc
Typical IR system performances
measures
● Overall Performance
○ F1
○ Precision
○ Recall
•
•
Evaluated in this study
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
•
owl:DatatypeProperty
dc:creator dc:publisher
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
●
○
○
●
○
■
■
■
■
●
○
○
●
○
○
○
●
DBpedia data slice sizes (in MB)Wikidata data slice sizes (in MB)
Dimension Metric DB_Rest DB_Poli DB_Film DB_Soc
Availability
EstimatedDereferenceabilityMetric 0.013 0.013 0.012 0.012
EstimatedDereferenceabilityForwardLinksMetric 0.027 0.027 0.027 0.027
NoMisreportedContentTypesMetric 0 1 1 1
RDFAvailabilityMetric 0 0 0 0
EndPointAvailabilityMetric 0 0 0 0
Interlinking
EstimatedInterlinkDetectionMetric - - - -
EstimatedLinkExternalDataProviders - - - -
EstimatedDereferenceBackLinks 0.012 0.014 0.015 0.022
Semantic
accuracy
OntologyHijacking 1 1 1 1
MisusedOwlDatatypeOrObjectProperties 1 1 1 1
Data diversity
HumanReadableLabelling 0.953 0.985 0.997 1
MultipleLanguageUsageMteric 1 2 3 3
Trust and
Provenance
Basic Provenance 0 0 0 0
Extended Provenance 0 0 0 0
Provenance Richness 0 0 0 0
DBPEDIA SLICE ASSESSMENT RESULTS
WIKIDATA SLICE ASSESSMENT RESULTS
Dimension Metric Wiki_Rest Wiki_Poli Wiki_Film Wiki_Soc
Availability
EstimatedDereferenceabilityMetric 0.051 0.063 0.048 0.062
EstimatedDereferenceabilityForwardLinksMetric 0.093 0.053 0.050 0.064
NoMisreportedContentTypesMetric 0 1 0 1
RDFAvailabilityMetric 0 0 0 0
EndPointAvailabilityMetric 0 0 0 0
Interlinking
EstimatedInterlinkDetectionMetric - - - -
EstimatedLinkExternalDataProviders 5 11 9 8
EstimatedDereferenceBackLinks 0.013 0.098 0.089 0.083
Semantic
accuracy
OntologyHijacking 1 1 1 1
MisusedOwlDatatypeOrObjectProperties 1 1 1 1
Data diversity
HumanReadableLabelling 0.175 0.076 0.091 0.102
MultipleLanguageUsageMteric 2 3 2 3
Trust and
Provenance
Basic Provenance 0 0 0 0
Extended Provenance 0 0 0 0
Provenance Richness 0.055 0.083 0.010 0.025
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
●
○
○
○
●
○
○ …
○
QUESTIONS?
<hthakkar@uni-bonn.de>

More Related Content

PDF
Indexing, searching, and aggregation with redi search and .net
PDF
Graph basedrdf storeforapachecassandra
PDF
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PDF
Graph databases & data integration v2
PPTX
LD4KD 2015 - Demos and tools
PPTX
Hacktoberfest 2020 - Intro to Knowledge Graphs
PDF
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
PDF
Data quality in Real Estate
Indexing, searching, and aggregation with redi search and .net
Graph basedrdf storeforapachecassandra
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
Graph databases & data integration v2
LD4KD 2015 - Demos and tools
Hacktoberfest 2020 - Intro to Knowledge Graphs
Tutorial "Linked Data Query Processing" Part 2 "Theoretical Foundations" (WWW...
Data quality in Real Estate

What's hot (20)

PPT
The Power of Semantic Technologies to Explore Linked Open Data
PDF
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
PDF
ETL All The Things with Ruby
PDF
Henning agt talk-caise-semnet
PPT
Achieving time effective federated information from scalable rdf data using s...
PDF
Proposal for open government data
PDF
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
PDF
Why is JSON-LD Important to Businesses - Franz Inc
PPTX
Normalizing Data for Migrations
PPTX
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
PPTX
LinkML presentation to Yosemite Group
PDF
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
PDF
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
PDF
JSON-LD and SHACL for Knowledge Graphs
PPT
Semantic Pipes and Semantic Mashups
PPTX
NoSql evaluation
PDF
Introduction to data analysis using R
PDF
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
PDF
Clustering output of Apache Nutch using Apache Spark
The Power of Semantic Technologies to Explore Linked Open Data
Sparql querying of-property-graphs-harsh thakkar-graph day 2017 sf
ETL All The Things with Ruby
Henning agt talk-caise-semnet
Achieving time effective federated information from scalable rdf data using s...
Proposal for open government data
Tutorial "Linked Data Query Processing" Part 4 "Execution Process" (WWW 2013 ...
Why is JSON-LD Important to Businesses - Franz Inc
Normalizing Data for Migrations
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
LinkML presentation to Yosemite Group
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
JSON-LD and SHACL for Knowledge Graphs
Semantic Pipes and Semantic Mashups
NoSql evaluation
Introduction to data analysis using R
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
Clustering output of Apache Nutch using Apache Spark
Ad

Similar to Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment (20)

PDF
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
PDF
Data pipelines observability: OpenLineage & Marquez
PDF
CNCF opa
PDF
print mod 2.pdf
PDF
Pivotal OSS meetup - MADlib and PivotalR
PDF
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
PDF
Machine learning pipeline with spark ml
PDF
Heterogenous Persistence
PDF
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
PPTX
Introducing Datawave
PPTX
Preparing Your Legacy Data for Automation in S1000D
PDF
IoT with Azure Machine Learning and InfluxDB
PDF
Instant search - A hands-on tutorial
PDF
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
PPTX
Lessons learned from designing a QA Automation for analytics databases (big d...
PDF
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
PDF
Real-time analytics with Druid at Appsflyer
PDF
Time Series Databases for IoT (On-premises and Azure)
PDF
Big Data processing with Apache Spark
PPTX
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Data pipelines observability: OpenLineage & Marquez
CNCF opa
print mod 2.pdf
Pivotal OSS meetup - MADlib and PivotalR
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Machine learning pipeline with spark ml
Heterogenous Persistence
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
Introducing Datawave
Preparing Your Legacy Data for Automation in S1000D
IoT with Azure Machine Learning and InfluxDB
Instant search - A hands-on tutorial
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
Lessons learned from designing a QA Automation for analytics databases (big d...
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Real-time analytics with Druid at Appsflyer
Time Series Databases for IoT (On-premises and Azure)
Big Data processing with Apache Spark
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Ad

Recently uploaded (20)

PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
composite construction of structures.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Sustainable Sites - Green Building Construction
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
PPT on Performance Review to get promotions
PDF
737-MAX_SRG.pdf student reference guides
PPTX
Artificial Intelligence
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
composite construction of structures.pdf
R24 SURVEYING LAB MANUAL for civil enggi
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Operating System & Kernel Study Guide-1 - converted.pdf
OOP with Java - Java Introduction (Basics)
Mechanical Engineering MATERIALS Selection
Sustainable Sites - Green Building Construction
CYBER-CRIMES AND SECURITY A guide to understanding
PPT on Performance Review to get promotions
737-MAX_SRG.pdf student reference guides
Artificial Intelligence
Embodied AI: Ushering in the Next Era of Intelligent Systems
Automation-in-Manufacturing-Chapter-Introduction.pdf
Foundation to blockchain - A guide to Blockchain Tech
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Construction Project Organization Group 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS

Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment

  • 13. QA systems Quality assessment of the LOD datasets The answer lies here!
  • 15. Digging into the QA system Typical IR system performances measures ● Overall Performance ○ F1 ○ Precision ○ Recall
  • 16. Digging into the QA system Data & Component/Module oriented measures ● Search & retrieval module ○ Indexer ○ Retriever ● Preprocessing / Linguistic ○ NLP - POS tags, NER, etc ○ Entity linking & annotation - semantics ○ Relation extraction & annotation ● Query formulation ○ SPARQL conversion ● Datasource/knowledge base ○ Data Typical IR system performances measures ● Overall Performance ○ F1 ○ Precision ○ Recall
  • 17. Digging into the QA system Data & Component/Module oriented measures ● Search & retrieval module ○ Indexer ■ Top K words accuracy; P@10, P@1000, etc ○ Retriever ■ Ranking, Re-ranking, MRR, etc ● Preprocessing / Linguistic ○ NLP - POS tags, NER, etc ○ Entity linking & annotation - semantics ○ Relation extraction & annotation ■ annotation accuracy/precision ■ consistency, interlinking, etc ● Query formulation ○ SPARQL conversion ■ conversion accuracy/precision ● Datasource/knowledge base ○ Completeness ○ Data diversity ○ Trust and Provenance ○ Coverage ○ Timeliness (up to date) ○ etc Typical IR system performances measures ● Overall Performance ○ F1 ○ Precision ○ Recall
  • 18. Digging into the QA system Data & Component/Module oriented measures ● Search & retrieval module ○ Indexer ■ Top K words accuracy; P@10, P@1000, etc ○ Retriever ■ Ranking, Re-ranking, MRR, etc ● Preprocessing / Linguistic ○ NLP - POS tags, NER, etc ○ Entity linking & annotation - semantics ○ Relation extraction & annotation ■ annotation accuracy/precision ■ consistency, interlinking, etc ● Query formulation ○ SPARQL conversion ■ conversion accuracy/precision ● Datasource/Knowledge base ○ Completeness ○ Data diversity ○ Trust and Provenance ○ Coverage ○ Timeliness (up to date) ○ etc Typical IR system performances measures ● Overall Performance ○ F1 ○ Precision ○ Recall
  • 25.
  • 31. DBpedia data slice sizes (in MB)Wikidata data slice sizes (in MB)
  • 32. Dimension Metric DB_Rest DB_Poli DB_Film DB_Soc Availability EstimatedDereferenceabilityMetric 0.013 0.013 0.012 0.012 EstimatedDereferenceabilityForwardLinksMetric 0.027 0.027 0.027 0.027 NoMisreportedContentTypesMetric 0 1 1 1 RDFAvailabilityMetric 0 0 0 0 EndPointAvailabilityMetric 0 0 0 0 Interlinking EstimatedInterlinkDetectionMetric - - - - EstimatedLinkExternalDataProviders - - - - EstimatedDereferenceBackLinks 0.012 0.014 0.015 0.022 Semantic accuracy OntologyHijacking 1 1 1 1 MisusedOwlDatatypeOrObjectProperties 1 1 1 1 Data diversity HumanReadableLabelling 0.953 0.985 0.997 1 MultipleLanguageUsageMteric 1 2 3 3 Trust and Provenance Basic Provenance 0 0 0 0 Extended Provenance 0 0 0 0 Provenance Richness 0 0 0 0 DBPEDIA SLICE ASSESSMENT RESULTS
  • 33. WIKIDATA SLICE ASSESSMENT RESULTS Dimension Metric Wiki_Rest Wiki_Poli Wiki_Film Wiki_Soc Availability EstimatedDereferenceabilityMetric 0.051 0.063 0.048 0.062 EstimatedDereferenceabilityForwardLinksMetric 0.093 0.053 0.050 0.064 NoMisreportedContentTypesMetric 0 1 0 1 RDFAvailabilityMetric 0 0 0 0 EndPointAvailabilityMetric 0 0 0 0 Interlinking EstimatedInterlinkDetectionMetric - - - - EstimatedLinkExternalDataProviders 5 11 9 8 EstimatedDereferenceBackLinks 0.013 0.098 0.089 0.083 Semantic accuracy OntologyHijacking 1 1 1 1 MisusedOwlDatatypeOrObjectProperties 1 1 1 1 Data diversity HumanReadableLabelling 0.175 0.076 0.091 0.102 MultipleLanguageUsageMteric 2 3 2 3 Trust and Provenance Basic Provenance 0 0 0 0 Extended Provenance 0 0 0 0 Provenance Richness 0.055 0.083 0.010 0.025