SlideShare a Scribd company logo
Page 1 / 20
Survey on Challenges of
Question Answering
in the Semantic Web
Semantic Web journal 2016
Höffner et al.
Leipzig University, Institute of Computer Science, AKSW Group
홍동균 (Saltlux Inc.)
2018. 11. 16
Page 2 / 20
Contents
1. Introduction
2. Methodology (to find SQA systems)
3. 7 Challenges
4. 7 Challenges in Adam QA
5. Conclusion
Page 3 / 20
Introduction
• Semantic question answering (SQA)
– Asking questions in natural language and receiving answers from a RDF
knowledge base.
• SQA systems
– Since natural language is complex and ambiguous, reliable SQA systems
require many different components.
– Instead of a shared effort, however, many essential components are
redeveloped, which is an inefficient use of researcher’s time and resources.
Page 4 / 20
Introduction
• Contributions
– Surveyed existing work with 72 publications about 62 systems developed
from 2010 to 2015.
– Identified challenges faced by those approaches and collected solutions for
them from the 72 publications.
– Made recommendations on how to develop future SQA systems.
Page 5 / 20
Methodology
• Inclusion criteria
– Candidate 1: First 300 publications of Google Scholar search results
 Query: “ ‘question answering’ AND (‘Semantic Web’ OR ‘data web’) “
– Candidate 2: All publications in the proceeding
 Target conference: ISWC, ESWC, WWW, NLDB, QALD challenge
• Exclusion Criteria
– Published before November 2010 or after July 2015
– Not related to SQA
• Result
– 72 publications describing 62 distinct SQA systems.
 (39 of them from candidate 1, 33 of them form candidate 2)
Page 6 / 20
7 Challenges
• Lexical Gap
• Ambiguity
• Multilingualism
• Complex Queries
• Distributed Knowledge
• Procedural, Temporal and Spatial Questions
• Templates
Number of publications per year
addressed challenge
Page 7 / 20
Lexical Gap
• The vocabulary used in a question is different from the one used in
the labels of the knowledge base. (linking problem)
– Different form of the same word
 (run <-> running, ran), (running <-> runnign, runing)
– Different form of the similar meaning
 Synonyms (run <-> sprint)
 hyper-hyponym pair (chemical process - photosynthesis)
– Different phrases of the same RDF property
 “What is the population of A”, “How many people are there in A?” -> ‘population’
Page 8 / 20
Lexical Gap - Different form of the same word
• String normalization
– Conversion to lower case or to base form
 Stemming, Lemmatizing (running, ran -> run)
• Similarity functions
– Quantifying similarity using a function and a threshold can be applied
 Jaro-Winkler distance
 Edit-distance
 Largest common substring
Page 9 / 20
Lexical Gap - Different form of the similar meaning
• Automatic Query Expansion
– Using additional labels from lexical databases such as WordNet
– Increase recall but lead to mismatches between related words and thus can
decrease the precision.
WordNet
Page 10 / 20
Lexical Gap - Different phrases of the same RDF property
• Pattern libraries
– BOA [Gerber et al.] generates patterns for RDF predicates from corpus and a
knowledge base
 E.g. (:writing, “X wrote Y”), (:writer, “X is written by Y”), (:population, “How many
people are there in X?”)
– PARALEX [Fader et al.]
PARALEX’s examples of paraphrase from the QA dataset
(Wikianswers)
PARALEX’s examples of lexical entries
Natural Language Question:
How big is nyc?
Formal query:
Population(?, new-york)
Learning
Page 11 / 20
Ambiguity
• The phenomenon of the same phrase having different meanings.
– Homonymy: same string refers to different concepts
 (money) bank vs. (river) bank
– Polysemy: same string refers to different but related concepts
 bank (as a company) vs. bank (as a building).
“이동국” in Adam KB
Page 12 / 20
Ambiguity - Disambiguation
• Resource-based methods
– Ranking the candidate RDF resources based of their properties and the
connections between them
– gAnswer [Huang et al.]
Q: Who was married to an actor that played in Philadelphia?
Subgraph matching
Page 13 / 20
Complex Queries
• Complex Queries
– Requiring multiple facts, certain restriction, aggregation, filtered results…
 E.g., Comparison, yes/no, quantifiers, superlatives
– PYTHIA [Unger et al.] constructs formal query even for complex query using
ontology-based grammar
Page 14 / 20
Templates
• (1) Template-based approach
– Map input questions to either manually or automatically created SPARQL
query templates
• (2) Template-free approach
– Build SPARQL queries based on the given syntactic structure of the input
question.
Template-based approach:
TBSL [Unger et al.]
Template-free approach:
Xser [Xu et al.]
Page 15 / 20
Others
• Multilingualism
– SQA systems that can handle multiple input languages, which may even
differ from the language used to encode the knowledge.
• Distributed Knowledge
– Some questions are only answerable with multiple knowledge bases
• Procedural Questions
– E.g. How question (step-by-step instructions)
• Temporal Question
– E.g. Temporal question on clinical narratives
• Spatial Questions
– E.g. Relationship of locations such as crossing, inclusion and nearness.
Page 16 / 20
7 Challenges in Adam QA
• Lexical Gap
– String normalization, similarity function, synonyms -> available
– Patterns for RDF predicates -> unavailable
 Current: string matching
• Ambiguity
– Ranking the candidate RDF resources -> Available (but naïve approach)
 Current: resources are ranked by the number of triples
Page 17 / 20
7 Challenges in Adam QA
• Complex Queries
– Comparisons, yes/no, superlatives, quantifiers -> partially available
• Templates
– Template-based approach -> available
– Template-free approach -> soon (GBQA?)
Page 18 / 20
7 Challenges in Adam QA
• Multilingualism
– Unavailable
• Distributed Knowledge
– Unavailable
• Procedural, Temporal and Spatial Questions
– Partially available
Page 19 / 20
Conclusion
• Analyzing 62 systems and their contributions to seven challenges for
SQA systems.
• Recommendation on future SQA system
– Modularization & Reusing existing parts
– Benchmarking single algorithmic modules instead of benchmarking a
system as a whole.
Page 20 / 20
Thank you.

More Related Content

PPT
PDF
Dr. Iztok Kosem - Innovations in Slovenian (e-)lexicography: from (semi-)auto...
PDF
Federated data stores using semantic web technology
PDF
Sparql a simple knowledge query
PPTX
Using OWL for the RESO Data Dictionary
PDF
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
PDF
Knowledge Patterns for the Web: extraction, transformation, and reuse
PDF
Lecture: Question Answering
Dr. Iztok Kosem - Innovations in Slovenian (e-)lexicography: from (semi-)auto...
Federated data stores using semantic web technology
Sparql a simple knowledge query
Using OWL for the RESO Data Dictionary
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Knowledge Patterns for the Web: extraction, transformation, and reuse
Lecture: Question Answering

What's hot (20)

PDF
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
PPT
Semantic Web: From Representations to Applications
PDF
The Web Ontology Language
PPT
Query Translation for Ontology-extended Data Sources
PPTX
Improving data quality at Europeana (SWIB 2016)
PPTX
Owl web ontology language
PDF
Best Practices for Large Scale Text Mining Processing
PDF
GTTS System for the Spoken Web Search Task at MediaEval 2012
PPT
OWL briefing
PPTX
Publishing and Using Linked Open Data - Day 2
PDF
OWL Web Ontology Language Overview
PPTX
Jarrar: OWL -Web Ontology Language
PDF
Jarrar: OWL (Web Ontology Language)
PDF
The Standards Mosaic Opening the Way to New Technologies
KEY
Snac webinar v3
PPTX
RDA: thinking globally, acting globally
PPT
Introduction to question answering for linked data & big data
PPTX
Xml unit1
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
Semantic Web: From Representations to Applications
The Web Ontology Language
Query Translation for Ontology-extended Data Sources
Improving data quality at Europeana (SWIB 2016)
Owl web ontology language
Best Practices for Large Scale Text Mining Processing
GTTS System for the Spoken Web Search Task at MediaEval 2012
OWL briefing
Publishing and Using Linked Open Data - Day 2
OWL Web Ontology Language Overview
Jarrar: OWL -Web Ontology Language
Jarrar: OWL (Web Ontology Language)
The Standards Mosaic Opening the Way to New Technologies
Snac webinar v3
RDA: thinking globally, acting globally
Introduction to question answering for linked data & big data
Xml unit1
Ad

Similar to 20181106 survey on challenges of question answering in the semantic web saltlux (20)

PPTX
semantic web & natural language
PDF
Using and learning phrases
PPTX
What is word2vec?
PPTX
Semantic Application for Healthcare
PPT
Knowledge engineering and the Web
PPT
Analysis on semantic web layer cake entities
PDF
Language Models for Information Retrieval
PDF
Approach to leverage Websites to APIs through Semantics
PPTX
Knowledge Representation, Semantic Web
PPT
A review of the state of the art in Machine Learning on the Semantic Web
PPT
Tutorial on Semantic Digital Libraries (WWW'2007)
PDF
Innovative methods for data integration: Linked Data and NLP
PDF
EAA2014 Istanbul - Barriers and Opportunities for Linked Open Data use in Arc...
PPTX
Knowledge mangement
PDF
Fri schreiber key_knowledge engineering
PPTX
From ontology to wiki
PDF
Hide the Stack: Toward Usable Linked Data
PPTX
NLP & DBpedia
PDF
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
PPTX
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
semantic web & natural language
Using and learning phrases
What is word2vec?
Semantic Application for Healthcare
Knowledge engineering and the Web
Analysis on semantic web layer cake entities
Language Models for Information Retrieval
Approach to leverage Websites to APIs through Semantics
Knowledge Representation, Semantic Web
A review of the state of the art in Machine Learning on the Semantic Web
Tutorial on Semantic Digital Libraries (WWW'2007)
Innovative methods for data integration: Linked Data and NLP
EAA2014 Istanbul - Barriers and Opportunities for Linked Open Data use in Arc...
Knowledge mangement
Fri schreiber key_knowledge engineering
From ontology to wiki
Hide the Stack: Toward Usable Linked Data
NLP & DBpedia
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
Ad

More from DongGyun Hong (7)

PDF
20170928 session basedrec_hyu_dake
PDF
20170216 conv mf_hyu_dake
PDF
180212 normalization hyu_dake
PDF
20190901 seq2 sparql_kips
PDF
20181103 kbcqa kips
PDF
20181217 sac dong_gyun_hong
PDF
20200923 open domain-qa_saltlux
20170928 session basedrec_hyu_dake
20170216 conv mf_hyu_dake
180212 normalization hyu_dake
20190901 seq2 sparql_kips
20181103 kbcqa kips
20181217 sac dong_gyun_hong
20200923 open domain-qa_saltlux

Recently uploaded (20)

PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
STL Containers in C++ : Sequence Container : Vector
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
Custom Software Development Services.pptx.pptx
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
Cost to Outsource Software Development in 2025
PDF
Digital Systems & Binary Numbers (comprehensive )
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Computer Software and OS of computer science of grade 11.pptx
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
STL Containers in C++ : Sequence Container : Vector
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
Patient Appointment Booking in Odoo with online payment
Topaz Photo AI Crack New Download (Latest 2025)
Oracle Fusion HCM Cloud Demo for Beginners
Why Generative AI is the Future of Content, Code & Creativity?
Salesforce Agentforce AI Implementation.pdf
Custom Software Development Services.pptx.pptx
DNT Brochure 2025 – ISV Solutions @ D365
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Designing Intelligence for the Shop Floor.pdf
Monitoring Stack: Grafana, Loki & Promtail
Cost to Outsource Software Development in 2025
Digital Systems & Binary Numbers (comprehensive )
Embracing Complexity in Serverless! GOTO Serverless Bengaluru

20181106 survey on challenges of question answering in the semantic web saltlux

  • 1. Page 1 / 20 Survey on Challenges of Question Answering in the Semantic Web Semantic Web journal 2016 Höffner et al. Leipzig University, Institute of Computer Science, AKSW Group 홍동균 (Saltlux Inc.) 2018. 11. 16
  • 2. Page 2 / 20 Contents 1. Introduction 2. Methodology (to find SQA systems) 3. 7 Challenges 4. 7 Challenges in Adam QA 5. Conclusion
  • 3. Page 3 / 20 Introduction • Semantic question answering (SQA) – Asking questions in natural language and receiving answers from a RDF knowledge base. • SQA systems – Since natural language is complex and ambiguous, reliable SQA systems require many different components. – Instead of a shared effort, however, many essential components are redeveloped, which is an inefficient use of researcher’s time and resources.
  • 4. Page 4 / 20 Introduction • Contributions – Surveyed existing work with 72 publications about 62 systems developed from 2010 to 2015. – Identified challenges faced by those approaches and collected solutions for them from the 72 publications. – Made recommendations on how to develop future SQA systems.
  • 5. Page 5 / 20 Methodology • Inclusion criteria – Candidate 1: First 300 publications of Google Scholar search results  Query: “ ‘question answering’ AND (‘Semantic Web’ OR ‘data web’) “ – Candidate 2: All publications in the proceeding  Target conference: ISWC, ESWC, WWW, NLDB, QALD challenge • Exclusion Criteria – Published before November 2010 or after July 2015 – Not related to SQA • Result – 72 publications describing 62 distinct SQA systems.  (39 of them from candidate 1, 33 of them form candidate 2)
  • 6. Page 6 / 20 7 Challenges • Lexical Gap • Ambiguity • Multilingualism • Complex Queries • Distributed Knowledge • Procedural, Temporal and Spatial Questions • Templates Number of publications per year addressed challenge
  • 7. Page 7 / 20 Lexical Gap • The vocabulary used in a question is different from the one used in the labels of the knowledge base. (linking problem) – Different form of the same word  (run <-> running, ran), (running <-> runnign, runing) – Different form of the similar meaning  Synonyms (run <-> sprint)  hyper-hyponym pair (chemical process - photosynthesis) – Different phrases of the same RDF property  “What is the population of A”, “How many people are there in A?” -> ‘population’
  • 8. Page 8 / 20 Lexical Gap - Different form of the same word • String normalization – Conversion to lower case or to base form  Stemming, Lemmatizing (running, ran -> run) • Similarity functions – Quantifying similarity using a function and a threshold can be applied  Jaro-Winkler distance  Edit-distance  Largest common substring
  • 9. Page 9 / 20 Lexical Gap - Different form of the similar meaning • Automatic Query Expansion – Using additional labels from lexical databases such as WordNet – Increase recall but lead to mismatches between related words and thus can decrease the precision. WordNet
  • 10. Page 10 / 20 Lexical Gap - Different phrases of the same RDF property • Pattern libraries – BOA [Gerber et al.] generates patterns for RDF predicates from corpus and a knowledge base  E.g. (:writing, “X wrote Y”), (:writer, “X is written by Y”), (:population, “How many people are there in X?”) – PARALEX [Fader et al.] PARALEX’s examples of paraphrase from the QA dataset (Wikianswers) PARALEX’s examples of lexical entries Natural Language Question: How big is nyc? Formal query: Population(?, new-york) Learning
  • 11. Page 11 / 20 Ambiguity • The phenomenon of the same phrase having different meanings. – Homonymy: same string refers to different concepts  (money) bank vs. (river) bank – Polysemy: same string refers to different but related concepts  bank (as a company) vs. bank (as a building). “이동국” in Adam KB
  • 12. Page 12 / 20 Ambiguity - Disambiguation • Resource-based methods – Ranking the candidate RDF resources based of their properties and the connections between them – gAnswer [Huang et al.] Q: Who was married to an actor that played in Philadelphia? Subgraph matching
  • 13. Page 13 / 20 Complex Queries • Complex Queries – Requiring multiple facts, certain restriction, aggregation, filtered results…  E.g., Comparison, yes/no, quantifiers, superlatives – PYTHIA [Unger et al.] constructs formal query even for complex query using ontology-based grammar
  • 14. Page 14 / 20 Templates • (1) Template-based approach – Map input questions to either manually or automatically created SPARQL query templates • (2) Template-free approach – Build SPARQL queries based on the given syntactic structure of the input question. Template-based approach: TBSL [Unger et al.] Template-free approach: Xser [Xu et al.]
  • 15. Page 15 / 20 Others • Multilingualism – SQA systems that can handle multiple input languages, which may even differ from the language used to encode the knowledge. • Distributed Knowledge – Some questions are only answerable with multiple knowledge bases • Procedural Questions – E.g. How question (step-by-step instructions) • Temporal Question – E.g. Temporal question on clinical narratives • Spatial Questions – E.g. Relationship of locations such as crossing, inclusion and nearness.
  • 16. Page 16 / 20 7 Challenges in Adam QA • Lexical Gap – String normalization, similarity function, synonyms -> available – Patterns for RDF predicates -> unavailable  Current: string matching • Ambiguity – Ranking the candidate RDF resources -> Available (but naïve approach)  Current: resources are ranked by the number of triples
  • 17. Page 17 / 20 7 Challenges in Adam QA • Complex Queries – Comparisons, yes/no, superlatives, quantifiers -> partially available • Templates – Template-based approach -> available – Template-free approach -> soon (GBQA?)
  • 18. Page 18 / 20 7 Challenges in Adam QA • Multilingualism – Unavailable • Distributed Knowledge – Unavailable • Procedural, Temporal and Spatial Questions – Partially available
  • 19. Page 19 / 20 Conclusion • Analyzing 62 systems and their contributions to seven challenges for SQA systems. • Recommendation on future SQA system – Modularization & Reusing existing parts – Benchmarking single algorithmic modules instead of benchmarking a system as a whole.
  • 20. Page 20 / 20 Thank you.