SlideShare a Scribd company logo
Digital Enterprise Research Institute                                          www.deri.ie




            A Distributional Structured Semantic
            Space for Querying RDF Graph Data
                     André Freitas, Edward Curry, João G. Oliveira,
                                      Seán O’Riain




 Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Outline
Digital Enterprise Research Institute              www.deri.ie




           Motivation & Problem Space
           Description of the Proposed Approach
           Evaluation
           Conclusions
Motivation & Problem Space
Digital Enterprise Research Institute                   www.deri.ie




           Linked Data: Heterogeneous and distributed data
            environment.
           Traditional Database and Information Retrieval
            approaches for querying/searching Linked Data
            provide limited solutions for casual users.
           Querying and searching Linked Data remains a
            fundamental challenge.



                                        ?
Query/Search Spectrum
Digital Enterprise Research Institute                             www.deri.ie




                                        Adapted from Kauffman et al (2009)
Semantic Gap
Digital Enterprise Research Institute            www.deri.ie


      From which university did the wife of
      Barack Obama graduate?
          Data model independent (DMI) queries
                           Semantic Matching
Semantic Matching Problem
Digital Enterprise Research Institute                                 www.deri.ie



       From which university did the wife of Barack Obama graduate?
Semantic Matching Problem
Digital Enterprise Research Institute                                 www.deri.ie



       From which university did the wife of Barack Obama graduate?



                                        Semantic matching
Semantic Matching Requirements
Digital Enterprise Research Institute                                            www.deri.ie


           Expressivity:
                 Supporting expressive natural language queries.
                 Path, conjunctions, disjunctions, aggregations, conditions.
           Flexibility & Accuracy:
                 Accurate and flexible semantic matching approach.
           Maintainability:
                 Easily transportable across datasets from different domains.
                 Can support Open domain (e.g. Wikipedia) and doman-specific
                  (e.g. Financial) datasets without customization.
           Performance:
                 Suitable for realtime querying (low query execution time).
           Scalability:
                 Scalable to a large number of datasets (Organization-scale, Web-
                  scale).
Research Questions
Digital Enterprise Research Institute                    www.deri.ie




       Q1: How to provide a query mechanism which allows
       users to expressively query linked datasets without a
       previous understanding of the vocabularies behind
       the data?

       Q2: Which semantic model could support this query
       mechanism?
Digital Enterprise Research Institute                www.deri.ie




                                 Proposed Approach
Proposed Approach (Outline)
Digital Enterprise Research Institute                                      www.deri.ie




           Treo:
                 Query approach based on dynamic Linked Data navigation.
                 Performance limitations.
           Treo T-Space:
                 Existing IR approaches: traditional Vector Space Models
                  (VSMs) were not able to: (i) capture the structure of graphs
                  and (ii) support a precise semantic matching behavior.
                 A VSM supporting these two requirements was formulated:
                  T-Space.
                 Ranking function based on semantic relatedness.
Proposed Approach (Outline)
Digital Enterprise Research Institute   www.deri.ie
First Approach (Treo)
Digital Enterprise Research Institute                        www.deri.ie




           Natural language queries.
           Ranked query results.
           Two phase search process combining entity search
            with spreading activation search.
           Use of Wikipedia-based semantic relatedness to
            semantically match query terms to dataset terms.
           Limited query execution performance.

       Requirements: Expressivity, Flexibility & Accuracy.
Semantic Relatedness
Digital Enterprise Research Institute                    www.deri.ie




           Computation of a measure of “semantic proximity”
            between two terms.
           Allows a semantic approximate matching between
            query terms and dataset terms.
           Most existing approaches use WordNet-based
            solutions for approximate semantic matching
            (limited solution).
           Use of Wikipedia-based semantic relatedness
            measures (addresses the limitations of WordNet-
            based measures).
Query Approach Rationale (Treo)
Digital Enterprise Research Institute   www.deri.ie
Query Approach Rationale (Treo)
Digital Enterprise Research Institute         www.deri.ie




                                        RDF
Query Approach Rationale (Treo)
Digital Enterprise Research Institute             www.deri.ie




                                        Wikipedia Link
                                           Measure
Query Approach Rationale (Treo)
Digital Enterprise Research Institute   www.deri.ie
Query Approach Rationale (Treo)
Digital Enterprise Research Institute         www.deri.ie




                                        RDF
Query Approach Rationale (Treo)
Digital Enterprise Research Institute   www.deri.ie
Query Approach Rationale (Treo)
Digital Enterprise Research Institute   www.deri.ie
Query Approach Rationale (Treo)
Digital Enterprise Research Institute   www.deri.ie
Query Approach Rationale (Treo)
Digital Enterprise Research Institute                                       www.deri.ie


  Final Query- Data Matching:                Currently limited to path queries




  Querying Linked Data using Semantic Relatedness: A Vocabulary Independent
  Approach. In Proceedings of the 16th International Conference on Applications of
  Natural Language to Information Systems (NLDB) 2011.
Digital Enterprise Research Institute                              www.deri.ie




                                        Treo (Irish): Direction, path
From which university did the wife of Barack Obama
 graduate?
Digital Enterprise Research Institute                www.deri.ie
Second Approach: Treo T- Space
Digital Enterprise Research Institute                                        www.deri.ie


      Motivation:
          Performance                  problem of the original approach.
          Transportability                across different domains.
      Rationale:
          Rephrasing   of the proposed approach as a vector
             space model.
          Addressing     the limitations of the vector space
             model (BoW: lack of structure) to represent the
             structure present in the data.
          Keeping               the semantic matching behavior.
                 Requirements: Performance & Scalability, Maintainability.
T- Space
Digital Enterprise Research Institute            www.deri.ie




      A Vector Space Model for building RDF indexes
       that preserve the graph structure and which
       allows a semantic matching/search behavior.
Generalization (Treo T- Space)
Digital Enterprise Research Institute                                    www.deri.ie




      Approach:
          Ranked               results.
          Natural              language queries.
          Two    phase search process combining entity search
             with spreading activation search.
          Use   of distributional semantics to semantically
             match query terms to dataset terms.
          Semantic    manifold               (‘structured   vector   space’)
             model: T-Space.
Distributional Semantic Model
Digital Enterprise Research Institute                             www.deri.ie




           Assumption: the context surrounding a given word
            in a text provides important information about its
            meaning.
           Simplified semantic model.
           Explicit Semantic Analysis (ESA) is the primary
            distributional model used in this work.

       A wife is a female partner in a marriage. The term "wife"
       seems to be a close term to bride, the latter is a female
       participant in a wedding ceremony, while a wife is a married
       woman during her marriage.
       ...
T- Space
Digital Enterprise Research Institute                                               www.deri.ie


                                        Embedding the structure of a graph into a
                                                semantic vector space
    Distributional
                                                         relations
  semantic model:
 Semantic statistical
knowledge extracted
   from large Web
       corpora
                              instances



                                               classes
Querying the T- Space
Digital Enterprise Research Institute   www.deri.ie
Querying the T- Space
Digital Enterprise Research Institute   www.deri.ie
Querying the T- Space
Digital Enterprise Research Institute   www.deri.ie
Querying the T- Space
Digital Enterprise Research Institute                                         www.deri.ie




  Querying Heterogeneous Datasets on the Data Model Independent Queries over RDF
   A Multidimensional Semantic Space for Linked Data Web: Challenges, Approaches
  and Trends.Proceedings of the 5th Special Issue on Internet-Scale Data, Semantic
   Data. In IEEE Internet Computing, IEEE International Conference on 2012.
   Computing (ICSC), 2011.
Ranking/Filtering Results
Digital Enterprise Research Institute                                    www.deri.ie


           How to filter non-related results.
           Analysis of semantic relatedness as a ranking
            function.
           Creation of a methodology for the determination of
            a semantic relatedness threshold.
           Semantic differential analysis.

            Requirements: Flexibility & Accuracy.

        A Distributional Approach for Terminological Semantic Search on the
        Linked Data Web. 27th ACM Applied Computing Symposium,
        Semantic Web and Its Applications Track, 2012.
VSM & Formalization
Digital Enterprise Research Institute                    www.deri.ie




           Based on the Generalized Vector Space Model.
           Simplification  and    clarification of  T-Space
            properties.
           Tensors are used to support domain specific
            meaning.
           Tensors can be used to connect concepts across
            datasets and T-Spaces
VSM & Formalization
Digital Enterprise Research Institute                 www.deri.ie




     1. The distributional
      space     can     be
      transformed to the
      keyword space.


       Requirements:                    Flexibility
       & Accuracy
VSM & Formalization
Digital Enterprise Research Institute   www.deri.ie



    2. The vector field defines
    an additional structure
    over the vector space
    model.


  Requirements: Expressivity
Customized Distributional Model
Digital Enterprise Research Institute                   www.deri.ie


   Distributional models customizable for different datasets.
Customized Distributional Model
Digital Enterprise Research Institute                                       www.deri.ie


                    Transformation from different distributional models




  Wikipedia                                                               Financial
                                                                           Corpus
                                DBPedia         Financial Dataset




      A Distributional Structured Semantic Space for Querying RDF Graph
                              Requirements: Maintainability
      Data. International Journal of Semantic Computing (IJSC), 2012.
Digital Enterprise Research Institute                     www.deri.ie




                                          Evaluation
                                        (Treo T- Space)
Quality of Results
Digital Enterprise Research Institute                             www.deri.ie



          Q1: How does the approach work for addressing the
           vocabulary problem?
          Q2: How does the approach work for addressing
           natural language queries?

                 Precision, recall, mean reciprocal rank, % of
                  answered queries.
Quality of Results
Digital Enterprise Research Institute                                          www.deri.ie



          QALD DBPedia Training Set.
          50 natural language queries.
          DBpedia 3.6.
                                 Full DBPedia QuerySet (50 queries)
              Avg. Precision            Avg. Recall   MRR      % of queries
                                                                answered
                      0.482               0.491       0.516           58%


                               Partial DBPedia QuerySet (38 queries)
               Avg. Precision           Avg. Recall    MRR      % of queries
                                                                 answered
                      0.634               0. 645      0.679           76%
Error Distribution
Digital Enterprise Research Institute                   www.deri.ie




          Q4: What is the error associated with each
           component of the search approach?

                 Error distribution measurements.
Error Distribution
Digital Enterprise Research Institute               www.deri.ie




          Conclusion: Robustness of the distributional
           semantic model for open domain queries.
          Conclusion: Major need for improvement of the
           pre/post processing phases.
Evaluating Terminology- level Semantic
   Matching
Digital Enterprise Research Institute                          www.deri.ie


          Q1: How does the approach work for addressing the
           vocabulary problem?
          Q5: Does the approach improve the semantic
           matching of the queries?

                 Quantitative evaluation: P@5, P@10, MRR, % of the
                  queries answered, comparative evaluation using
                  string matching and WordNet-based query
                  expansion.
Evaluating Terminology- level Semantic
   Matching
Digital Enterprise Research Institute                                             www.deri.ie




               Avg.                         Avg.       MRR         % of queries
           Precision@5                  Precision@10                answered
                 0.732                     0.646       0.646            92.25%


                           Approach                    % of queries answered

                                 ESA                           92.25%
                      String matching                          45.77%
         String matching + WordNet QE                          52.48%
Comparative Analysis
Digital Enterprise Research Institute                         www.deri.ie




          Q3: What is the best distributional semantic model
           for the vocabulary problem?

                 Preliminary comparative analysis between
                  different distributional semantic models.
Comparative Analysis (Treo vs Treo T- Space)
Digital Enterprise Research Institute                                                 www.deri.ie


        Wikipedia Link Measure (WLM) vs Explicit Semantic Analysis (ESA)

                                            ESA (Full Query Set)
        Avg. Precision                  Avg. Recall     MRR        % of queries answered
                0.482                     0.491         0.516              58%

                                            WLM (Full Query Set)
        Avg. Precision                  Avg. Recall      MRR       % of queries answered
                0.395                      0. 451        0.489              56%

                                                Improvement
     % Avg. Precision                   % Avg. Recall     % MRR      % of queries answered

                 18%                         8.2%          5.2%              3.5%
Comparative Analysis
Digital Enterprise Research Institute                        www.deri.ie


          Conclusion: From the two tested approaches, ESA
           provides a better semantic model.
Conclusions
Digital Enterprise Research Institute                               www.deri.ie



          The T-Space model provides a principled way to build data
           model intependent queries over RDF graphs.
          The distributional semantic model supports a flexible
           matching between query terms and dataset terms in a
           semantic best-effort scenario.
          The ESA semantic model provides a better distributional model
           compared to WLM.
          Improvements are needed on the pre/post processing phase of
           the approach.
Associated Article
Digital Enterprise Research Institute                                     www.deri.ie



   André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A
   Distributional Structured Semantic Space for Querying RDF Graph
   Data. International Journal of Semantic Computing (IJSC),2012.


     http://www.worldscinet.com/ijsc/05/0504/S1793351X1100133X.html


     http://andrefreitas.org/papers/preprint_distributional_structured_space.pdf
Related Publications
Digital Enterprise Research Institute                                                                                  www.deri.ie


   André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, Querying Heterogeneous Datasets on the Linked Data Web:
  Challenges, Approaches and Trends. IEEE Internet Computing, Special Issue on Internet-Scale Data, 2012 (Article).

  André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional Structured Semantic Space for Querying
  RDF Graph Data. International Journal of Semantic Computing (IJSC), 2012 (Article).

  André Freitas, Sean O'Riain, Edward Curry, A Distributional Approach for Terminological Semantic Search on the Linked Data
  Web. 27th ACM Applied Computing Symposium, Semantic Web and Its Applications Track, 2012 (Conference Full Paper).

  André Freitas, João Gabriel Oliveira, Edward Curry, Sean O'Riain, A Multidimensional Semantic Space for Data Model
  Independent Queries over RDF Data. In Proceedings of the 5th International Conference on Semantic Computing (ICSC),
  2011. (Conference Full Paper).

  André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, Querying Linked Data using
  Semantic Relatedness: A Vocabulary Independent Approach. In Proceedings of the 16th International Conference on
  Applications of Natural Language to Information Systems (NLDB) 2011. (Conference Full Paper).

  André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, Treo: Combining Entity-Search,
  Spreading Activation and Semantic Relatedness for Querying Linked Data, In 1st Workshop on Question Answering over
  Linked Data (QALD-1) Workshop at 8th Extended Semantic Web Conference (ESWC), 2011 (Workshop Full Paper) .

  André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, Treo: Best-Effort Natural
  Language Queries over Linked Data, In Proceedings of the 16th International Conference on Applications of Natural
  Language to Information Systems (NLDB), 2011 (Poster in Proceedings).
Digital Enterprise Research Institute                             www.deri.ie




                                        http://andrefreitas.org

More Related Content

PDF
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
PPT
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs fr...
PDF
Presentation of current research: distributed architecture for recommendation...
PDF
Swap2010 agave
PDF
RDFa: putting RDF on the Web
PDF
From Linked Data to Semantic Applications
PPTX
Hello Open World - Semtech 2009
PPTX
A Privacy Preference Manager for the Social Semantic Web
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs fr...
Presentation of current research: distributed architecture for recommendation...
Swap2010 agave
RDFa: putting RDF on the Web
From Linked Data to Semantic Applications
Hello Open World - Semtech 2009
A Privacy Preference Manager for the Social Semantic Web

What's hot (20)

PPTX
Explicit vs. latent concept models for cross language information retrieval
PPTX
PPT
Introduction to question answering for linked data & big data
PDF
Question Answering over Linked Data (Reasoning Web Summer School)
PDF
Convolutional Neural Networks
PPT
Knowledge Discovery in Remote Access Databases
PDF
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
PDF
Multilabel Image Annotation using Multimodal Analysis
PPTX
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
PDF
Transitioning web application frameworks towards the Semantic Web (master the...
PDF
Knowledge Representation on the Web
PDF
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
PDF
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
PDF
Rethinking Microblogging: Open Distributed Semantic
PPTX
Content + Signals: The value of the entire data estate for machine learning
PDF
Reflected Intelligence: Real world AI in Digital Transformation
PPTX
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
PDF
Introduction to the Semantic Web
PDF
Word Embedding In IR
PDF
Review of plagiarism detection and control & copyrights in India
Explicit vs. latent concept models for cross language information retrieval
Introduction to question answering for linked data & big data
Question Answering over Linked Data (Reasoning Web Summer School)
Convolutional Neural Networks
Knowledge Discovery in Remote Access Databases
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
Multilabel Image Annotation using Multimodal Analysis
BDW17 London - Edward Kibardin - Mitie PLC - Learning and Topological Data A...
Transitioning web application frameworks towards the Semantic Web (master the...
Knowledge Representation on the Web
A Domain Based Approach to Information Retrieval in Digital Libraries - Rotel...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Rethinking Microblogging: Open Distributed Semantic
Content + Signals: The value of the entire data estate for machine learning
Reflected Intelligence: Real world AI in Digital Transformation
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Introduction to the Semantic Web
Word Embedding In IR
Review of plagiarism detection and control & copyrights in India
Ad

Similar to A distributional structured semantic space for querying rdf graph data (20)

PPT
Querying Heterogeneous Datasets on the Linked Data Web
PPT
Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases
PPTX
VoID: Metadata for RDF Datasets
PDF
NetIKX Semantic Search Presentation
PPTX
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
PDF
dcat: An RDF vocabulary for interoperability of data catalogues
PDF
What is SDMX-RDF?
PPTX
Semantic Search tutorial at SemTech 2012
PPTX
Linked data for Enterprise Data Integration
PPTX
How to Publish Open Data
PPT
Corrib.org - OpenSource and Research
PPT
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
PPT
Exploring Linked Data
PPTX
PhD Day: Entity Linking using Generic Linked Data Datasets
PPT
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
PPTX
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...
PPT
Explaining The Semantic Web
PPT
Spivack Blogtalk 2008
PDF
Semantic Search Tutorial at SemTech 2012
PPTX
Vector-Databases-Powering-the-Next-Generation-of-AI-Applications.pptx
Querying Heterogeneous Datasets on the Linked Data Web
Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases
VoID: Metadata for RDF Datasets
NetIKX Semantic Search Presentation
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
dcat: An RDF vocabulary for interoperability of data catalogues
What is SDMX-RDF?
Semantic Search tutorial at SemTech 2012
Linked data for Enterprise Data Integration
How to Publish Open Data
Corrib.org - OpenSource and Research
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Exploring Linked Data
PhD Day: Entity Linking using Generic Linked Data Datasets
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Discovering Semantic Equivalence of People behind Online Profiles (RED 2012 -...
Explaining The Semantic Web
Spivack Blogtalk 2008
Semantic Search Tutorial at SemTech 2012
Vector-Databases-Powering-the-Next-Generation-of-AI-Applications.pptx
Ad

More from Andre Freitas (20)

PDF
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
PDF
AI Systems @ Manchester
PDF
AI Beyond Deep Learning
PPTX
Building AI Applications using Knowledge Graphs
PDF
Open IE tutorial 2018
PDF
Effective Semantics for Engineering NLP Systems
PPTX
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
PPTX
Semantic Perspectives for Contemporary Question Answering Systems
PPTX
Semantic Relation Classification: Task Formalisation and Refinement
PPTX
Categorization of Semantic Roles for Dictionary Definitions
PPTX
Word Tagging with Foundational Ontology Classes
PPTX
Different Semantic Perspectives for Question Answering Systems
PPTX
WiSS Challenge - Day 2
PPTX
WISS QA Do it yourself Question answering over Linked Data
PDF
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
PPTX
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
PDF
Semantics at Scale: A Distributional Approach
PDF
Schema-agnositc queries over large-schema databases: a distributional semanti...
PDF
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
PDF
How Semantic Technologies can help to cure Hearing Loss?
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI Systems @ Manchester
AI Beyond Deep Learning
Building AI Applications using Knowledge Graphs
Open IE tutorial 2018
Effective Semantics for Engineering NLP Systems
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Relation Classification: Task Formalisation and Refinement
Categorization of Semantic Roles for Dictionary Definitions
Word Tagging with Foundational Ontology Classes
Different Semantic Perspectives for Question Answering Systems
WiSS Challenge - Day 2
WISS QA Do it yourself Question answering over Linked Data
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
Semantics at Scale: A Distributional Approach
Schema-agnositc queries over large-schema databases: a distributional semanti...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
How Semantic Technologies can help to cure Hearing Loss?

Recently uploaded (20)

PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
01-Introduction-to-Information-Management.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
master seminar digital applications in india
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Basic Mud Logging Guide for educational purpose
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Complications of Minimal Access Surgery at WLH
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Renaissance Architecture: A Journey from Faith to Humanism
01-Introduction-to-Information-Management.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
STATICS OF THE RIGID BODIES Hibbelers.pdf
master seminar digital applications in india
2.FourierTransform-ShortQuestionswithAnswers.pdf
GDM (1) (1).pptx small presentation for students
102 student loan defaulters named and shamed – Is someone you know on the list?
VCE English Exam - Section C Student Revision Booklet
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Supply Chain Operations Speaking Notes -ICLT Program
FourierSeries-QuestionsWithAnswers(Part-A).pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Basic Mud Logging Guide for educational purpose
Module 4: Burden of Disease Tutorial Slides S2 2025
Final Presentation General Medicine 03-08-2024.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Complications of Minimal Access Surgery at WLH
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...

A distributional structured semantic space for querying rdf graph data

  • 1. Digital Enterprise Research Institute www.deri.ie A Distributional Structured Semantic Space for Querying RDF Graph Data André Freitas, Edward Curry, João G. Oliveira, Seán O’Riain  Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  • 2. Outline Digital Enterprise Research Institute www.deri.ie  Motivation & Problem Space  Description of the Proposed Approach  Evaluation  Conclusions
  • 3. Motivation & Problem Space Digital Enterprise Research Institute www.deri.ie  Linked Data: Heterogeneous and distributed data environment.  Traditional Database and Information Retrieval approaches for querying/searching Linked Data provide limited solutions for casual users.  Querying and searching Linked Data remains a fundamental challenge. ?
  • 4. Query/Search Spectrum Digital Enterprise Research Institute www.deri.ie Adapted from Kauffman et al (2009)
  • 5. Semantic Gap Digital Enterprise Research Institute www.deri.ie From which university did the wife of Barack Obama graduate? Data model independent (DMI) queries Semantic Matching
  • 6. Semantic Matching Problem Digital Enterprise Research Institute www.deri.ie From which university did the wife of Barack Obama graduate?
  • 7. Semantic Matching Problem Digital Enterprise Research Institute www.deri.ie From which university did the wife of Barack Obama graduate? Semantic matching
  • 8. Semantic Matching Requirements Digital Enterprise Research Institute www.deri.ie  Expressivity:  Supporting expressive natural language queries.  Path, conjunctions, disjunctions, aggregations, conditions.  Flexibility & Accuracy:  Accurate and flexible semantic matching approach.  Maintainability:  Easily transportable across datasets from different domains.  Can support Open domain (e.g. Wikipedia) and doman-specific (e.g. Financial) datasets without customization.  Performance:  Suitable for realtime querying (low query execution time).  Scalability:  Scalable to a large number of datasets (Organization-scale, Web- scale).
  • 9. Research Questions Digital Enterprise Research Institute www.deri.ie Q1: How to provide a query mechanism which allows users to expressively query linked datasets without a previous understanding of the vocabularies behind the data? Q2: Which semantic model could support this query mechanism?
  • 10. Digital Enterprise Research Institute www.deri.ie Proposed Approach
  • 11. Proposed Approach (Outline) Digital Enterprise Research Institute www.deri.ie  Treo:  Query approach based on dynamic Linked Data navigation.  Performance limitations.  Treo T-Space:  Existing IR approaches: traditional Vector Space Models (VSMs) were not able to: (i) capture the structure of graphs and (ii) support a precise semantic matching behavior.  A VSM supporting these two requirements was formulated: T-Space.  Ranking function based on semantic relatedness.
  • 12. Proposed Approach (Outline) Digital Enterprise Research Institute www.deri.ie
  • 13. First Approach (Treo) Digital Enterprise Research Institute www.deri.ie  Natural language queries.  Ranked query results.  Two phase search process combining entity search with spreading activation search.  Use of Wikipedia-based semantic relatedness to semantically match query terms to dataset terms.  Limited query execution performance. Requirements: Expressivity, Flexibility & Accuracy.
  • 14. Semantic Relatedness Digital Enterprise Research Institute www.deri.ie  Computation of a measure of “semantic proximity” between two terms.  Allows a semantic approximate matching between query terms and dataset terms.  Most existing approaches use WordNet-based solutions for approximate semantic matching (limited solution).  Use of Wikipedia-based semantic relatedness measures (addresses the limitations of WordNet- based measures).
  • 15. Query Approach Rationale (Treo) Digital Enterprise Research Institute www.deri.ie
  • 16. Query Approach Rationale (Treo) Digital Enterprise Research Institute www.deri.ie RDF
  • 17. Query Approach Rationale (Treo) Digital Enterprise Research Institute www.deri.ie Wikipedia Link Measure
  • 18. Query Approach Rationale (Treo) Digital Enterprise Research Institute www.deri.ie
  • 19. Query Approach Rationale (Treo) Digital Enterprise Research Institute www.deri.ie RDF
  • 20. Query Approach Rationale (Treo) Digital Enterprise Research Institute www.deri.ie
  • 21. Query Approach Rationale (Treo) Digital Enterprise Research Institute www.deri.ie
  • 22. Query Approach Rationale (Treo) Digital Enterprise Research Institute www.deri.ie
  • 23. Query Approach Rationale (Treo) Digital Enterprise Research Institute www.deri.ie Final Query- Data Matching: Currently limited to path queries Querying Linked Data using Semantic Relatedness: A Vocabulary Independent Approach. In Proceedings of the 16th International Conference on Applications of Natural Language to Information Systems (NLDB) 2011.
  • 24. Digital Enterprise Research Institute www.deri.ie Treo (Irish): Direction, path
  • 25. From which university did the wife of Barack Obama graduate? Digital Enterprise Research Institute www.deri.ie
  • 26. Second Approach: Treo T- Space Digital Enterprise Research Institute www.deri.ie  Motivation:  Performance problem of the original approach.  Transportability across different domains.  Rationale:  Rephrasing of the proposed approach as a vector space model.  Addressing the limitations of the vector space model (BoW: lack of structure) to represent the structure present in the data.  Keeping the semantic matching behavior. Requirements: Performance & Scalability, Maintainability.
  • 27. T- Space Digital Enterprise Research Institute www.deri.ie  A Vector Space Model for building RDF indexes that preserve the graph structure and which allows a semantic matching/search behavior.
  • 28. Generalization (Treo T- Space) Digital Enterprise Research Institute www.deri.ie  Approach:  Ranked results.  Natural language queries.  Two phase search process combining entity search with spreading activation search.  Use of distributional semantics to semantically match query terms to dataset terms.  Semantic manifold (‘structured vector space’) model: T-Space.
  • 29. Distributional Semantic Model Digital Enterprise Research Institute www.deri.ie  Assumption: the context surrounding a given word in a text provides important information about its meaning.  Simplified semantic model.  Explicit Semantic Analysis (ESA) is the primary distributional model used in this work. A wife is a female partner in a marriage. The term "wife" seems to be a close term to bride, the latter is a female participant in a wedding ceremony, while a wife is a married woman during her marriage. ...
  • 30. T- Space Digital Enterprise Research Institute www.deri.ie Embedding the structure of a graph into a semantic vector space Distributional relations semantic model: Semantic statistical knowledge extracted from large Web corpora instances classes
  • 31. Querying the T- Space Digital Enterprise Research Institute www.deri.ie
  • 32. Querying the T- Space Digital Enterprise Research Institute www.deri.ie
  • 33. Querying the T- Space Digital Enterprise Research Institute www.deri.ie
  • 34. Querying the T- Space Digital Enterprise Research Institute www.deri.ie Querying Heterogeneous Datasets on the Data Model Independent Queries over RDF A Multidimensional Semantic Space for Linked Data Web: Challenges, Approaches and Trends.Proceedings of the 5th Special Issue on Internet-Scale Data, Semantic Data. In IEEE Internet Computing, IEEE International Conference on 2012. Computing (ICSC), 2011.
  • 35. Ranking/Filtering Results Digital Enterprise Research Institute www.deri.ie  How to filter non-related results.  Analysis of semantic relatedness as a ranking function.  Creation of a methodology for the determination of a semantic relatedness threshold.  Semantic differential analysis. Requirements: Flexibility & Accuracy. A Distributional Approach for Terminological Semantic Search on the Linked Data Web. 27th ACM Applied Computing Symposium, Semantic Web and Its Applications Track, 2012.
  • 36. VSM & Formalization Digital Enterprise Research Institute www.deri.ie  Based on the Generalized Vector Space Model.  Simplification and clarification of T-Space properties.  Tensors are used to support domain specific meaning.  Tensors can be used to connect concepts across datasets and T-Spaces
  • 37. VSM & Formalization Digital Enterprise Research Institute www.deri.ie 1. The distributional space can be transformed to the keyword space. Requirements: Flexibility & Accuracy
  • 38. VSM & Formalization Digital Enterprise Research Institute www.deri.ie 2. The vector field defines an additional structure over the vector space model. Requirements: Expressivity
  • 39. Customized Distributional Model Digital Enterprise Research Institute www.deri.ie Distributional models customizable for different datasets.
  • 40. Customized Distributional Model Digital Enterprise Research Institute www.deri.ie Transformation from different distributional models Wikipedia Financial Corpus DBPedia Financial Dataset A Distributional Structured Semantic Space for Querying RDF Graph Requirements: Maintainability Data. International Journal of Semantic Computing (IJSC), 2012.
  • 41. Digital Enterprise Research Institute www.deri.ie Evaluation (Treo T- Space)
  • 42. Quality of Results Digital Enterprise Research Institute www.deri.ie  Q1: How does the approach work for addressing the vocabulary problem?  Q2: How does the approach work for addressing natural language queries?  Precision, recall, mean reciprocal rank, % of answered queries.
  • 43. Quality of Results Digital Enterprise Research Institute www.deri.ie  QALD DBPedia Training Set.  50 natural language queries.  DBpedia 3.6. Full DBPedia QuerySet (50 queries) Avg. Precision Avg. Recall MRR % of queries answered 0.482 0.491 0.516 58% Partial DBPedia QuerySet (38 queries) Avg. Precision Avg. Recall MRR % of queries answered 0.634 0. 645 0.679 76%
  • 44. Error Distribution Digital Enterprise Research Institute www.deri.ie  Q4: What is the error associated with each component of the search approach?  Error distribution measurements.
  • 45. Error Distribution Digital Enterprise Research Institute www.deri.ie  Conclusion: Robustness of the distributional semantic model for open domain queries.  Conclusion: Major need for improvement of the pre/post processing phases.
  • 46. Evaluating Terminology- level Semantic Matching Digital Enterprise Research Institute www.deri.ie  Q1: How does the approach work for addressing the vocabulary problem?  Q5: Does the approach improve the semantic matching of the queries?  Quantitative evaluation: P@5, P@10, MRR, % of the queries answered, comparative evaluation using string matching and WordNet-based query expansion.
  • 47. Evaluating Terminology- level Semantic Matching Digital Enterprise Research Institute www.deri.ie Avg. Avg. MRR % of queries Precision@5 Precision@10 answered 0.732 0.646 0.646 92.25% Approach % of queries answered ESA 92.25% String matching 45.77% String matching + WordNet QE 52.48%
  • 48. Comparative Analysis Digital Enterprise Research Institute www.deri.ie  Q3: What is the best distributional semantic model for the vocabulary problem?  Preliminary comparative analysis between different distributional semantic models.
  • 49. Comparative Analysis (Treo vs Treo T- Space) Digital Enterprise Research Institute www.deri.ie  Wikipedia Link Measure (WLM) vs Explicit Semantic Analysis (ESA) ESA (Full Query Set) Avg. Precision Avg. Recall MRR % of queries answered 0.482 0.491 0.516 58% WLM (Full Query Set) Avg. Precision Avg. Recall MRR % of queries answered 0.395 0. 451 0.489 56% Improvement % Avg. Precision % Avg. Recall % MRR % of queries answered 18% 8.2% 5.2% 3.5%
  • 50. Comparative Analysis Digital Enterprise Research Institute www.deri.ie  Conclusion: From the two tested approaches, ESA provides a better semantic model.
  • 51. Conclusions Digital Enterprise Research Institute www.deri.ie  The T-Space model provides a principled way to build data model intependent queries over RDF graphs.  The distributional semantic model supports a flexible matching between query terms and dataset terms in a semantic best-effort scenario.  The ESA semantic model provides a better distributional model compared to WLM.  Improvements are needed on the pre/post processing phase of the approach.
  • 52. Associated Article Digital Enterprise Research Institute www.deri.ie André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data. International Journal of Semantic Computing (IJSC),2012. http://www.worldscinet.com/ijsc/05/0504/S1793351X1100133X.html http://andrefreitas.org/papers/preprint_distributional_structured_space.pdf
  • 53. Related Publications Digital Enterprise Research Institute www.deri.ie André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends. IEEE Internet Computing, Special Issue on Internet-Scale Data, 2012 (Article). André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data. International Journal of Semantic Computing (IJSC), 2012 (Article). André Freitas, Sean O'Riain, Edward Curry, A Distributional Approach for Terminological Semantic Search on the Linked Data Web. 27th ACM Applied Computing Symposium, Semantic Web and Its Applications Track, 2012 (Conference Full Paper). André Freitas, João Gabriel Oliveira, Edward Curry, Sean O'Riain, A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data. In Proceedings of the 5th International Conference on Semantic Computing (ICSC), 2011. (Conference Full Paper). André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, Querying Linked Data using Semantic Relatedness: A Vocabulary Independent Approach. In Proceedings of the 16th International Conference on Applications of Natural Language to Information Systems (NLDB) 2011. (Conference Full Paper). André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, Treo: Combining Entity-Search, Spreading Activation and Semantic Relatedness for Querying Linked Data, In 1st Workshop on Question Answering over Linked Data (QALD-1) Workshop at 8th Extended Semantic Web Conference (ESWC), 2011 (Workshop Full Paper) . André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, Treo: Best-Effort Natural Language Queries over Linked Data, In Proceedings of the 16th International Conference on Applications of Natural Language to Information Systems (NLDB), 2011 (Poster in Proceedings).
  • 54. Digital Enterprise Research Institute www.deri.ie http://andrefreitas.org