SlideShare a Scribd company logo
Improving Semantic Search Using
      Query Log Analysis

            Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna
                                                      OAK Research Group,
                                           Department of Computer Science,
                                                  University of Sheffield, UK
Outline

• Introduction
• Semantic Query Logs Analysis
  - Query-Concepts Model
  - Concepts-Predicates Model
  - Instance-Types Model
• Results Augmentation
• Data Visualisation
INTRODUCTION
Motivation

• Little work on results returned (answers) and
  presentation style.
   – Users want direct answers augmented with more
     information for richer experience1
   – Users want more user-friendly and attractive results
     presentation format1

• Semantic query logs: logs of queries issued to repositories
  containing RDF data.


1. See our paper from this morning’s IWEST 2012 workshop
Related Work
Semantic query logs analysis:
• Moller et al. identified patterns of Linked Data usage with
  respect to different types of agents.

• Arias et al. analysed the structure of the SPARQL queries
  to identify most frequent language elements.

• Luczak-Rösch et al. analysed query logs to detect errors
  and weaknesses in LD ontologies and support their
  maintenance.
Related Work (cont’d)

How our work is different:
Analyze semantic query logs to produce models capturing
different patterns of information needs on Linked Data:

 Concepts used together in a query: query-concepts model
 Predicate used with a concept: concept-predicates model
 Concepts used as types of a LD entity: instance-types model

The models make use of the “collaborative knowledge”
inherent in the logs to enhance the search process.
SEMANTIC QUERY LOG ANALYSIS
Extraction
• Query logs entries follow the Combined Log Format (CLF):




                                                        Extract SPARQL query


   SELECT DISTINCT ?genre, ?instrument WHERE
   {
       <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
       <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre.
       <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument.
   }
Analysis
   SELECT DISTINCT ?genre, ?instrument WHERE
   {
       <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
       <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre.
       <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument.
   }


• For each bound resource (subject or object) ->
   query endpoint for the type of the resource

              http://dbpedia.org/resource/Ringo_Starr


       type
                        http://dbpedia.org/ontology/MusicalArtist
Query-Concepts Model
   SELECT DISTINCT ?genre, ?instrument WHERE

   { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
     <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }



1) Retrieve types of resources in the query:
   Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer
   The_Beatles type dbpedia-owl:Band, schema:MusicGroup


2) Increment the co-occurrence of each concept in the first list
   with each concept in the second:

   MusicalArtist Band       MusicalPerformer MusicGroup

MusicalArtist MusicGroup       MusicalPerformer    Band
Concept-Predicates Model
    SELECT DISTINCT ?genre, ?instrument WHERE

    { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
       <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre.
       <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }


1) Retrieve types of resources used as subjects in the query:
    Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer


2) Identify bound predicates (dbpedia:genre, dbpedia:instrument)

3) Increment the co-occurrence of each type with the predicate used in
    the same triple pattern:

MusicalPerformer genre        MusicalPerformer instrument

 MusicalArtist genre       MusicalArtist instrument
Instance-Types Model
   SELECT DISTINCT ?genre, ?instrument WHERE

   { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>.
     <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }



1) Retrieve types of resources in the query:
   Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer
   The_Beatles type dbpedia-owl:Band, schema:MusicGroup


2) Increment the co-occurrence of concepts found as types for the
   same instance:

             MusicalArtist MusicalPerformer

                Band      MusicGroup
RESULT AUGMENTATION
Dataset
• Two sets of DBpedia query logs made available at the
  USEWOD2011 and USEWOD2012 workshops.

• The logs contained around 5 million queries issued to
  DBpedia over a time period spanning almost 2 years

                                            USEWOD2012   USEWOD2011
   Number of analyzed queries               8866028      4951803
   Number of unique triple patterns         4095011      2641098
   Number of unique bound triple patterns   3619216      2571662
Results Enhancement
• Google, Yahoo!, Bing, etc. enhance search
  results using structured data




• FalconS and VisiNav return extra information together
  with each entity in the answers (e.g. type, label)

• Evaluation of Semantic Search showed that augmenting
  answers with extra information provides a richer user
  experience2.
2. See our paper from this morning’s IWEST 2012 workshop
FalconS Results
Query: `population of New York city’




• Information chosen depend on manually (randomly)
  predefined set.
Motivation for proposed approach
• Utilizing query logs as a source of collaborative knowledge
  able to capture implicit associations between Linked Data
  entities and properties.

• Use this to select which information to show the user.

• Two recent studies3 analyzed semantic query logs and
  observed that a class of entities is usually queried with
  similar relations and concepts.


 3. Luczak-Rösch et al. ; Elbedweihy et al.
Two Related Types of Result Augmentation
1. Additional result-related information.
  – More details about each result item
  – Provides better understanding of the answer.


2. Additional query-related information.
  – More results related to the query entities
  – Assists users in discovering useful findings
    (serendipity)
Return additional result-related information
Steps
1) For each result item, find types of instance.

1) Most frequently queried predicates associated with them
   are extracted from the concept-predicates model.

2) Generate queries with each pair (instance, predicate).
     e.g. (<…dbpedia.org…/Ringo_Starr> , genre)

3) Show aggregated results to the user.
Return additional result-related information
• MusicalArtist-> genre, associatedBand, occupation, instrument,
  birthDate, birthPlace, hometown, prop:yearsActive, foaf:surname,
  prop:associatedActs, …

Query: “Who played drums for the Beatles?”

Result: Ringo Starr
  Pop music, Rock music (genre)
  Keyboard, Drum,Acousticguitar(instrument)
  The Beatles, Plastic Ono Band, Rory Storm,(assoc.Band)
Return additional query-related information
Steps
1) Extract all concepts from query.

2) For any instances, find their types.

3) For each query concept, find most frequently occurring
   concepts from the query-concepts model.

4) For each related concept, query for instances that have
   relation with the originating instance.

5) Show aggregated results to the user.
Return additional query-related information
• City-> Book, Person, Country, Organisation, SportsTeam, MusicGroup,
  Film, RadioStation, River, University, SoccerPlayer, Hospital, ...


Query: “Where is the University of Sheffield located?”

Result: Sheffield,UK
  NickClegg,CliveBetts, DavidBlunkett(Person)
  SheffieldUnitedF SheffieldWednesday (SportsT
                    .C.,                             eam)
  Hallam FM,RealRadio, BBCRadioSheffield (RadioStn.)
  JessopHosp.,NorthernGeneral, RoyalHallamshire(Hospital)
  Uni.ofSheffield, SheffieldHallam Uni. (University)
VISUALISATION
Data Visualization
• View-based interfaces (e.g. Semantic Crystal and Smeagol)
  support users in query formulation by showing the
  underlying data and connections.

• Helpful for users, especially those unfamiliar with the
  search domain.

• Try to bridge the gap between user terms and tool terms
  (habitability problem)

• Facing challenge to visualize large datasets without
  cluttering the view and affecting user experience.
Data Visualization: Proposed approach
• Visualizing large datasets (especially heterogeneous ones)
  is a challenge.

• To overcome this, we need to select and visualize specific
  parts of the data.

• Exploit collaborative knowledge in query logs to derive
  selection of concepts and predicates added to user’s
  subgraph of interest.
Data Visualization: Proposed approach
Steps
1) User enters NL query
2) Return best-attempt results
3) Identify query instances and find their types
4) For each type:
     • Extract most queried predicates associated with it from
       concept-predicates model.
     • Extract most queried concepts associated with it from
       query-concepts model.
5) Add these to the user’s query graph (see next slide)
Example
Query: “What is the capital of Egypt?”
                                              Best-attempt
  Answer: Cairo                                  results
                                                               Result-
➔ latitude: 30.058056      ➔ depiction:                        Related
                                                            information
➔ longitude: 31.228889
➔ population: 6758581
➔ area: 453000000
➔ time zone: Eastern European Time
➔ subdivision: Governorates of Egypt
➔ page: http://www.cairo.gov.eg/default.aspx
➔ nickname: The City of a Thousand Minarets, Capital of the
  Arab World
Example
Query: “What is the capital of Egypt?”                Query-Related
                                                        information
Answer: Cairo

➔ Cairo Uni., Ain Shams Uni., German Uni., British Uni. (University)
➔ Ittihad El Shorta, El Shams Club, AlNasr Egypt (SportsTeam)
➔ Orascom Telecom, HSBC Bank, EgyptAir, Olympic Grp (Organisation)
➔ Nile River (River)
➔ Al Azhar Park (Park)
➔ Hani Shaker, Sherine, Umm Kulthum, Am Diab (MusicalArtist)
➔ Nile TV, AL Nile, Al-Baghdadia TV (BroadCaster)
➔ Egyptian Museum, Museum of Islamic Art (Museum)
Data Visualization: Proposed approach
Step 5: Add concepts and
predicates to user’s query
graph


 Most queried                              Most queried
predicates with                            concepts with
  “Country”                                  “Country”




       Query
      instance
Questions




Thank You


Questions?

More Related Content

PPTX
Evaluating Semantic Search Systems to Identify Future Directions of Research
PPTX
Introduction to NVivo
PDF
Enhance discovery Solr and Mahout
PDF
Recommender system algorithm and architecture
PDF
Combining IR with Relevance Feedback for Concept Location
PDF
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
PDF
Recommender Systems, Matrices and Graphs
PDF
Open domain Question Answering System - Research project in NLP
Evaluating Semantic Search Systems to Identify Future Directions of Research
Introduction to NVivo
Enhance discovery Solr and Mahout
Recommender system algorithm and architecture
Combining IR with Relevance Feedback for Concept Location
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Recommender Systems, Matrices and Graphs
Open domain Question Answering System - Research project in NLP

What's hot (17)

PDF
Answer Extraction for how and why Questions in Question Answering Systems
PDF
Question Answering - Application and Challenges
PDF
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
PPTX
Domain Identification for Linked Open Data
PPTX
Domain Identification for Linked Open Data
PPTX
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
PPTX
Workshop 2 using nvivo 12 for qualitative data analysis
PPTX
Action research for_librarians_carl2012
PPT
Computer Software in Qualitative Research: An Introduction to NVivo
PPTX
Question answering
PDF
Practical machine learning - Part 1
PDF
Ph d sem_1@iitm
PDF
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
PDF
Using and learning phrases
PPTX
Semi-automated Exploration and Extraction of Data in Scientific Tables
PPTX
Data analysis – using computers
PPTX
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Answer Extraction for how and why Questions in Question Answering Systems
Question Answering - Application and Challenges
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Domain Identification for Linked Open Data
Domain Identification for Linked Open Data
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Workshop 2 using nvivo 12 for qualitative data analysis
Action research for_librarians_carl2012
Computer Software in Qualitative Research: An Introduction to NVivo
Question answering
Practical machine learning - Part 1
Ph d sem_1@iitm
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Using and learning phrases
Semi-automated Exploration and Extraction of Data in Scientific Tables
Data analysis – using computers
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Ad

Similar to Improving Semantic Search Using Query Log Analysis (20)

PDF
Type-Aware Entity Retrieval
PPTX
Loupe model - Use Cases and Requirements
PPT
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
PPTX
The Rhetoric of Research Objects
PPT
A Model of the Scholarly Community
PDF
Recommender Systems and Linked Open Data
PPT
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
PPTX
Epistemic networks for Epistemic Commitments
PPTX
Discovery Hub: on-the-fly linked data exploratory search
KEY
Search Analytics for Content Strategists
PPT
Twente ir-course 20-10-2010
PDF
Scaling Recommendations, Semantic Search, & Data Analytics with solr
PPTX
CiteSeerX: Mining Scholarly Big Data
PPTX
How the Web can change social science research (including yours)
PDF
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
PDF
GARNet workshop on Integrating Large Data into Plant Science
PDF
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
PDF
Survey Research in Software Engineering
PPTX
Search and Hyperlinking Overview @MediaEval2014
PPTX
Semantic Technologies for Big Sciences including Astrophysics
Type-Aware Entity Retrieval
Loupe model - Use Cases and Requirements
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
The Rhetoric of Research Objects
A Model of the Scholarly Community
Recommender Systems and Linked Open Data
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Epistemic networks for Epistemic Commitments
Discovery Hub: on-the-fly linked data exploratory search
Search Analytics for Content Strategists
Twente ir-course 20-10-2010
Scaling Recommendations, Semantic Search, & Data Analytics with solr
CiteSeerX: Mining Scholarly Big Data
How the Web can change social science research (including yours)
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
GARNet workshop on Integrating Large Data into Plant Science
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Survey Research in Software Engineering
Search and Hyperlinking Overview @MediaEval2014
Semantic Technologies for Big Sciences including Astrophysics
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
KodekX | Application Modernization Development
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Chapter 3 Spatial Domain Image Processing.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Unlocking AI with Model Context Protocol (MCP)
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KodekX | Application Modernization Development
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MIND Revenue Release Quarter 2 2025 Press Release

Improving Semantic Search Using Query Log Analysis

  • 1. Improving Semantic Search Using Query Log Analysis Khadija Elbedweihy, Stuart N. Wrigley and Fabio Ciravegna OAK Research Group, Department of Computer Science, University of Sheffield, UK
  • 2. Outline • Introduction • Semantic Query Logs Analysis - Query-Concepts Model - Concepts-Predicates Model - Instance-Types Model • Results Augmentation • Data Visualisation
  • 4. Motivation • Little work on results returned (answers) and presentation style. – Users want direct answers augmented with more information for richer experience1 – Users want more user-friendly and attractive results presentation format1 • Semantic query logs: logs of queries issued to repositories containing RDF data. 1. See our paper from this morning’s IWEST 2012 workshop
  • 5. Related Work Semantic query logs analysis: • Moller et al. identified patterns of Linked Data usage with respect to different types of agents. • Arias et al. analysed the structure of the SPARQL queries to identify most frequent language elements. • Luczak-Rösch et al. analysed query logs to detect errors and weaknesses in LD ontologies and support their maintenance.
  • 6. Related Work (cont’d) How our work is different: Analyze semantic query logs to produce models capturing different patterns of information needs on Linked Data:  Concepts used together in a query: query-concepts model  Predicate used with a concept: concept-predicates model  Concepts used as types of a LD entity: instance-types model The models make use of the “collaborative knowledge” inherent in the logs to enhance the search process.
  • 8. Extraction • Query logs entries follow the Combined Log Format (CLF): Extract SPARQL query SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. }
  • 9. Analysis SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } • For each bound resource (subject or object) -> query endpoint for the type of the resource http://dbpedia.org/resource/Ringo_Starr type http://dbpedia.org/ontology/MusicalArtist
  • 10. Query-Concepts Model SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } 1) Retrieve types of resources in the query: Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer The_Beatles type dbpedia-owl:Band, schema:MusicGroup 2) Increment the co-occurrence of each concept in the first list with each concept in the second: MusicalArtist Band MusicalPerformer MusicGroup MusicalArtist MusicGroup MusicalPerformer Band
  • 11. Concept-Predicates Model SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:genre ?genre. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } 1) Retrieve types of resources used as subjects in the query: Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer 2) Identify bound predicates (dbpedia:genre, dbpedia:instrument) 3) Increment the co-occurrence of each type with the predicate used in the same triple pattern: MusicalPerformer genre MusicalPerformer instrument MusicalArtist genre MusicalArtist instrument
  • 12. Instance-Types Model SELECT DISTINCT ?genre, ?instrument WHERE { <…dbpedia.org…/Ringo_Starr> ?rel <…dbpedia.org/…/The_Beatles>. <…dbpedia.org…/Ringo_Starr> dbpedia:instrument ?instrument. } 1) Retrieve types of resources in the query: Ringo_Starr type dbpedia-owl:MusicalArtist, umbel:MusicalPerformer The_Beatles type dbpedia-owl:Band, schema:MusicGroup 2) Increment the co-occurrence of concepts found as types for the same instance: MusicalArtist MusicalPerformer Band MusicGroup
  • 14. Dataset • Two sets of DBpedia query logs made available at the USEWOD2011 and USEWOD2012 workshops. • The logs contained around 5 million queries issued to DBpedia over a time period spanning almost 2 years USEWOD2012 USEWOD2011 Number of analyzed queries 8866028 4951803 Number of unique triple patterns 4095011 2641098 Number of unique bound triple patterns 3619216 2571662
  • 15. Results Enhancement • Google, Yahoo!, Bing, etc. enhance search results using structured data • FalconS and VisiNav return extra information together with each entity in the answers (e.g. type, label) • Evaluation of Semantic Search showed that augmenting answers with extra information provides a richer user experience2. 2. See our paper from this morning’s IWEST 2012 workshop
  • 16. FalconS Results Query: `population of New York city’ • Information chosen depend on manually (randomly) predefined set.
  • 17. Motivation for proposed approach • Utilizing query logs as a source of collaborative knowledge able to capture implicit associations between Linked Data entities and properties. • Use this to select which information to show the user. • Two recent studies3 analyzed semantic query logs and observed that a class of entities is usually queried with similar relations and concepts. 3. Luczak-Rösch et al. ; Elbedweihy et al.
  • 18. Two Related Types of Result Augmentation 1. Additional result-related information. – More details about each result item – Provides better understanding of the answer. 2. Additional query-related information. – More results related to the query entities – Assists users in discovering useful findings (serendipity)
  • 19. Return additional result-related information Steps 1) For each result item, find types of instance. 1) Most frequently queried predicates associated with them are extracted from the concept-predicates model. 2) Generate queries with each pair (instance, predicate). e.g. (<…dbpedia.org…/Ringo_Starr> , genre) 3) Show aggregated results to the user.
  • 20. Return additional result-related information • MusicalArtist-> genre, associatedBand, occupation, instrument, birthDate, birthPlace, hometown, prop:yearsActive, foaf:surname, prop:associatedActs, … Query: “Who played drums for the Beatles?” Result: Ringo Starr Pop music, Rock music (genre) Keyboard, Drum,Acousticguitar(instrument) The Beatles, Plastic Ono Band, Rory Storm,(assoc.Band)
  • 21. Return additional query-related information Steps 1) Extract all concepts from query. 2) For any instances, find their types. 3) For each query concept, find most frequently occurring concepts from the query-concepts model. 4) For each related concept, query for instances that have relation with the originating instance. 5) Show aggregated results to the user.
  • 22. Return additional query-related information • City-> Book, Person, Country, Organisation, SportsTeam, MusicGroup, Film, RadioStation, River, University, SoccerPlayer, Hospital, ... Query: “Where is the University of Sheffield located?” Result: Sheffield,UK NickClegg,CliveBetts, DavidBlunkett(Person) SheffieldUnitedF SheffieldWednesday (SportsT .C., eam) Hallam FM,RealRadio, BBCRadioSheffield (RadioStn.) JessopHosp.,NorthernGeneral, RoyalHallamshire(Hospital) Uni.ofSheffield, SheffieldHallam Uni. (University)
  • 24. Data Visualization • View-based interfaces (e.g. Semantic Crystal and Smeagol) support users in query formulation by showing the underlying data and connections. • Helpful for users, especially those unfamiliar with the search domain. • Try to bridge the gap between user terms and tool terms (habitability problem) • Facing challenge to visualize large datasets without cluttering the view and affecting user experience.
  • 25. Data Visualization: Proposed approach • Visualizing large datasets (especially heterogeneous ones) is a challenge. • To overcome this, we need to select and visualize specific parts of the data. • Exploit collaborative knowledge in query logs to derive selection of concepts and predicates added to user’s subgraph of interest.
  • 26. Data Visualization: Proposed approach Steps 1) User enters NL query 2) Return best-attempt results 3) Identify query instances and find their types 4) For each type: • Extract most queried predicates associated with it from concept-predicates model. • Extract most queried concepts associated with it from query-concepts model. 5) Add these to the user’s query graph (see next slide)
  • 27. Example Query: “What is the capital of Egypt?” Best-attempt Answer: Cairo results Result- ➔ latitude: 30.058056 ➔ depiction: Related information ➔ longitude: 31.228889 ➔ population: 6758581 ➔ area: 453000000 ➔ time zone: Eastern European Time ➔ subdivision: Governorates of Egypt ➔ page: http://www.cairo.gov.eg/default.aspx ➔ nickname: The City of a Thousand Minarets, Capital of the Arab World
  • 28. Example Query: “What is the capital of Egypt?” Query-Related information Answer: Cairo ➔ Cairo Uni., Ain Shams Uni., German Uni., British Uni. (University) ➔ Ittihad El Shorta, El Shams Club, AlNasr Egypt (SportsTeam) ➔ Orascom Telecom, HSBC Bank, EgyptAir, Olympic Grp (Organisation) ➔ Nile River (River) ➔ Al Azhar Park (Park) ➔ Hani Shaker, Sherine, Umm Kulthum, Am Diab (MusicalArtist) ➔ Nile TV, AL Nile, Al-Baghdadia TV (BroadCaster) ➔ Egyptian Museum, Museum of Islamic Art (Museum)
  • 29. Data Visualization: Proposed approach Step 5: Add concepts and predicates to user’s query graph Most queried Most queried predicates with concepts with “Country” “Country” Query instance