SlideShare a Scribd company logo
© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Crossing the Vocabulary Gap for
Querying Complex and Heterogeneous
Databases:
A Distributional-Compositional Semantics
Perspective
André Freitas, Sean O’Riain, Edward Curry
DEOS 2013, Oxford, UK
Digital Enterprise Research Institute www.deri.ie
Big Data
 Big Data: More complete data-based picture of the
world.
Digital Enterprise Research Institute www.deri.ie
Growing Schema Size
10s-100s attributes
1,000s-1,000,000s attributes
 Heterogeneous, complex and large-scale
databases.
 Very-large and dynamic “schemas”.
Digital Enterprise Research Institute www.deri.ie
Growing Semantic Heterogeneity
 Multiple perspectives (conceptualizations) of the
reality.
 Ambiguity, vagueness, inconsitency.
Digital Enterprise Research Institute www.deri.ie
Problem
 Structured queries are still the primary way
to query databases.
Digital Enterprise Research Institute www.deri.ie
Structured query
Schema size &
heterogeneity
Query
construction time
HighLow
High
Low
10-100s
attributes
103
-106
s
attributes
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to?
Schema-agnostic queries
Possible representations
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
Query:
Data
Digital Enterprise Research Institute www.deri.ie
Solution: Schema-agnostic queries
Lexical-level
Abstraction-level
Structural-level
Distributional Semantics
Compositional Semantics
Based on the statistical
analysis of large
unstructured corpora
Query Processing and
Planning
Digital Enterprise Research Institute www.deri.ie
Statistical
analysis
Datasets
Digital Enterprise Research Institute www.deri.ie
Statistical
analysis
Datasets
Digital Enterprise Research Institute www.deri.ie
Core Elements of the Proposed Approach
 Hybrid model database/IR/QA.
 Ranked query results.
 Existing IR approaches: traditional Vector Space
Models (VSMs) were not able to:
 (i) capture the structure of data.
 (ii) support a precise and comprehensive semantic
matching.
 A VSM supporting these two requirements was
formulated: Ƭ-Space.
 Ranking function based on a distributional
semantic relatedness measure.
Digital Enterprise Research Institute www.deri.ie
Does it work?
 DBpedia 3.7 + YAGO.
 102 natural language queries (QALD 2011).
Entity-Attribute-Value (EAV) Dataset:
45,767 predicates
5,556,492 classes
9,434,677 instances
Digital Enterprise Research Institute www.deri.ie
Digital Enterprise Research Institute www.deri.ie
Digital Enterprise Research Institute www.deri.ie
Digital Enterprise Research Institute www.deri.ie
Selected Publications
André Freitas, Edward Curry, João Gabriel Oliveira, João C. Pereira da Silva, Sean
O'Riain, Querying the Semantic Web using Semantic Relatedness: A Vocabulary
Independent Approach. Data & Knowledge Engineering (DKE) Journal, 2013. (Article).
 
André Freitas, Fabricio de Faria, Sean O'Riain, Edward Curry, Answering Natural
Language Queries over Linked Data Graphs: A Distributional Semantics Approach, In
Proceedings of the 36th Annual ACM SIGIR Conference, Dublin, Ireland,
2013. (Demonstration Paper in Proceedings).
André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, Querying
Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and
Trends. IEEE Internet Computing, Special Issue on Internet-Scale Data, 2012 (Article).
André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional
Structured Semantic Space for Querying RDF Graph Data. International Journal of
Semantic Computing (IJSC), 2012 (Article).
 
Digital Enterprise Research Institute www.deri.ie
http://treo.deri.ie

More Related Content

PPTX
D paul ecn2013
PDF
Beyond Preservation: Situating Archaeological Data in Professional Practice
PPTX
An Environmental Chargeback for Data Center and Cloud Computing Consumers
PPT
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
PPT
BeSTGRID OpenGridForum 29 GIN session
PPTX
HDRIO Presentation - 2018
PDF
SciNet -- Pushing scientific boundaries
PPTX
SEEKing our way to better presentation of data and models from scientific inv...
D paul ecn2013
Beyond Preservation: Situating Archaeological Data in Professional Practice
An Environmental Chargeback for Data Center and Cloud Computing Consumers
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
BeSTGRID OpenGridForum 29 GIN session
HDRIO Presentation - 2018
SciNet -- Pushing scientific boundaries
SEEKing our way to better presentation of data and models from scientific inv...

What's hot (9)

PDF
Functional and Architectural Requirements for Metadata: Supporting Discovery...
PDF
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
PDF
OpenData Public Research, University of Toronto, Open Access Week, 25/11/2011
PDF
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
PPTX
Summary of 3DPAS
PPTX
Massive-Scale Analytics Applied to Real-World Problems
PPTX
Open Access: Open Access Looking for ways to increase the reach and impact of...
PPT
100503 bioinfo instsymp
PPT
Shifting the Burden from the User to the Data Provider
Functional and Architectural Requirements for Metadata: Supporting Discovery...
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
OpenData Public Research, University of Toronto, Open Access Week, 25/11/2011
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
Summary of 3DPAS
Massive-Scale Analytics Applied to Real-World Problems
Open Access: Open Access Looking for ways to increase the reach and impact of...
100503 bioinfo instsymp
Shifting the Burden from the User to the Data Provider
Ad

Similar to Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases (20)

PPT
Introduction to question answering for linked data & big data
PDF
A distributional structured semantic space for querying rdf graph data
PPT
Querying Heterogeneous Datasets on the Linked Data Web
PDF
Semantics at Scale: A Distributional Approach
PPTX
A Compositional-distributional Semantic Model over Structured Data
PDF
Schema-agnositc queries over large-schema databases: a distributional semanti...
PPTX
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
PDF
From Linked Data to Semantic Applications
PDF
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
PDF
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
PPT
ontology.ppt
PPT
Analysis on semantic web layer cake entities
PPTX
Knowledge Graph Introduction
PPTX
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
PDF
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
PPTX
Towards a Distributional Semantic Web Stack
PDF
Expressive Querying of Semantic Databases with Incremental Query Rewriting
PPTX
Large-Scale Semantic Search
PPT
QALL-ME: Ontology and Semantic Web
PPTX
Linked Data Modeling for Beginner
Introduction to question answering for linked data & big data
A distributional structured semantic space for querying rdf graph data
Querying Heterogeneous Datasets on the Linked Data Web
Semantics at Scale: A Distributional Approach
A Compositional-distributional Semantic Model over Structured Data
Schema-agnositc queries over large-schema databases: a distributional semanti...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
From Linked Data to Semantic Applications
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
ontology.ppt
Analysis on semantic web layer cake entities
Knowledge Graph Introduction
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Towards a Distributional Semantic Web Stack
Expressive Querying of Semantic Databases with Incremental Query Rewriting
Large-Scale Semantic Search
QALL-ME: Ontology and Semantic Web
Linked Data Modeling for Beginner
Ad

More from Andre Freitas (20)

PDF
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
PDF
AI Systems @ Manchester
PDF
AI Beyond Deep Learning
PPTX
Building AI Applications using Knowledge Graphs
PDF
Open IE tutorial 2018
PDF
Effective Semantics for Engineering NLP Systems
PPTX
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
PPTX
Semantic Perspectives for Contemporary Question Answering Systems
PPTX
Semantic Relation Classification: Task Formalisation and Refinement
PPTX
Categorization of Semantic Roles for Dictionary Definitions
PPTX
Word Tagging with Foundational Ontology Classes
PPTX
Different Semantic Perspectives for Question Answering Systems
PPTX
WiSS Challenge - Day 2
PPTX
WISS QA Do it yourself Question answering over Linked Data
PPTX
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
PDF
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
PDF
How Semantic Technologies can help to cure Hearing Loss?
PPTX
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
PDF
Question Answering over Linked Data (Reasoning Web Summer School)
PPTX
Introduction to Distributional Semantics
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI Systems @ Manchester
AI Beyond Deep Learning
Building AI Applications using Knowledge Graphs
Open IE tutorial 2018
Effective Semantics for Engineering NLP Systems
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Relation Classification: Task Formalisation and Refinement
Categorization of Semantic Roles for Dictionary Definitions
Word Tagging with Foundational Ontology Classes
Different Semantic Perspectives for Question Answering Systems
WiSS Challenge - Day 2
WISS QA Do it yourself Question answering over Linked Data
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
How Semantic Technologies can help to cure Hearing Loss?
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
Question Answering over Linked Data (Reasoning Web Summer School)
Introduction to Distributional Semantics

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
Teaching material agriculture food technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Spectroscopy.pptx food analysis technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Teaching material agriculture food technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
Understanding_Digital_Forensics_Presentation.pptx
Programs and apps: productivity, graphics, security and other tools
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...
20250228 LYD VKU AI Blended-Learning.pptx
Spectroscopy.pptx food analysis technology

Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases

  • 1. © Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.ie Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases: A Distributional-Compositional Semantics Perspective André Freitas, Sean O’Riain, Edward Curry DEOS 2013, Oxford, UK
  • 2. Digital Enterprise Research Institute www.deri.ie Big Data  Big Data: More complete data-based picture of the world.
  • 3. Digital Enterprise Research Institute www.deri.ie Growing Schema Size 10s-100s attributes 1,000s-1,000,000s attributes  Heterogeneous, complex and large-scale databases.  Very-large and dynamic “schemas”.
  • 4. Digital Enterprise Research Institute www.deri.ie Growing Semantic Heterogeneity  Multiple perspectives (conceptualizations) of the reality.  Ambiguity, vagueness, inconsitency.
  • 5. Digital Enterprise Research Institute www.deri.ie Problem  Structured queries are still the primary way to query databases.
  • 6. Digital Enterprise Research Institute www.deri.ie Structured query Schema size & heterogeneity Query construction time HighLow High Low 10-100s attributes 103 -106 s attributes
  • 7. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to? Schema-agnostic queries Possible representations
  • 8. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to ? Semantic Gap Lexical-level Abstraction-level Structural-level
  • 9. Digital Enterprise Research Institute www.deri.ie Vocabulary Problem for Databases Who is the daughter of Bill Clinton married to ? Semantic Gap Lexical-level Abstraction-level Structural-level Query: Data
  • 10. Digital Enterprise Research Institute www.deri.ie Solution: Schema-agnostic queries Lexical-level Abstraction-level Structural-level Distributional Semantics Compositional Semantics Based on the statistical analysis of large unstructured corpora Query Processing and Planning
  • 11. Digital Enterprise Research Institute www.deri.ie Statistical analysis Datasets
  • 12. Digital Enterprise Research Institute www.deri.ie Statistical analysis Datasets
  • 13. Digital Enterprise Research Institute www.deri.ie Core Elements of the Proposed Approach  Hybrid model database/IR/QA.  Ranked query results.  Existing IR approaches: traditional Vector Space Models (VSMs) were not able to:  (i) capture the structure of data.  (ii) support a precise and comprehensive semantic matching.  A VSM supporting these two requirements was formulated: Ƭ-Space.  Ranking function based on a distributional semantic relatedness measure.
  • 14. Digital Enterprise Research Institute www.deri.ie Does it work?  DBpedia 3.7 + YAGO.  102 natural language queries (QALD 2011). Entity-Attribute-Value (EAV) Dataset: 45,767 predicates 5,556,492 classes 9,434,677 instances
  • 15. Digital Enterprise Research Institute www.deri.ie
  • 16. Digital Enterprise Research Institute www.deri.ie
  • 17. Digital Enterprise Research Institute www.deri.ie
  • 18. Digital Enterprise Research Institute www.deri.ie Selected Publications André Freitas, Edward Curry, João Gabriel Oliveira, João C. Pereira da Silva, Sean O'Riain, Querying the Semantic Web using Semantic Relatedness: A Vocabulary Independent Approach. Data & Knowledge Engineering (DKE) Journal, 2013. (Article).   André Freitas, Fabricio de Faria, Sean O'Riain, Edward Curry, Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach, In Proceedings of the 36th Annual ACM SIGIR Conference, Dublin, Ireland, 2013. (Demonstration Paper in Proceedings). André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends. IEEE Internet Computing, Special Issue on Internet-Scale Data, 2012 (Article). André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data. International Journal of Semantic Computing (IJSC), 2012 (Article).  
  • 19. Digital Enterprise Research Institute www.deri.ie http://treo.deri.ie

Editor's Notes

  • #3: Part of the Big Data vision
  • #4: Part of the Big Data vision
  • #5: Part of the Big Data vision
  • #6: Part of the Big Data vision
  • #12: Include user feedbacks
  • #13: Include user feedbacks