SlideShare a Scribd company logo
IWEST 2012 workshop located at ESWC 2012




                                     Evaluating Semantic Search
                                      Systems to Identify Future
                                          Directions of Research
                                           Khadija Elbedweihy1, Stuart N. Wrigley1,
                                            Fabio Ciravegna1, Dorothee Reinhard2,
                                                              Abraham Bernstein2
                                                      1University of Sheffield, UK
       18.06.2012
                                                2University of Zurich, Switzerland
       1
Outline
•   Introduction
•   Evaluation Design
•   Evaluation Execution
•   Usability Feedback and Analysis
•   Future Directions for Research
•   Conclusions




18.06.2012

2
INTRODUCTION

18.06.2012

3
Semantic Search
• Semantic Search tools have different
    • querying approaches (e.g., forms, graphs, keywords).
    • search strategies during processing and query execution.
    • format and content of the results presented to the user.


• These factors influence the user's perceived performance
  usability of the tool.

• Searching is a user-centric process; usability evaluation is as
  important as – if not more than – assessing the performance.

18.06.2012

4
Previous evaluation efforts
• Kaufmann (2007): compared 4 SW query interfaces (NL and Graph-
  based)
• SemSearch Challenge: ad-hoc object retrieval using keywords
• Question Answering Over Linked Data (QALD): two NL interfaces
• TREC Entity List Completion (ELC) Task: similar to SemSearch

• All previous evaluations based upon the Cranfield methodology
      – test collection; set of tasks; set of relevance judgments.


• Little or no focus on usability

18.06.2012

5
EVALUATION DESIGN

18.06.2012

6
Evaluation Design
Aspect          Details
Tools           • Any query input style
                • Answers extracted from data (e.g., list of URIs or literals but not documents)
Data            Mooney Natural Language Learning Data
                • known within the search community
                • simple and well-known domain for subjects (geography)
                • questions already available
                     • Give me all the state capitals of the USA?
                     • Which rivers in Arkansas are longer than Alleghany river?
Subjects        38 subjects (26 males, 12 females); aged between 20 and 35 years old
Criteria        • Usability:
                     • query input (expressiveness, etc.)
                     • usefulness and suitability of returned answers (data) and presentation
                • Performance: speed of execution (also affects user satisfaction)

   18.06.2012

   7
Data Captured
• Results for each question:
      –   time required to formulate query
      –   number of attempts required to answer question
      –   success rate (user found satisfying answer or not)
      –   query execution time


• Questionnaires capturing user experience
     – System Usability Scale (SUS) questionnaire
     – Extended questionnaire
     – Demographics questionnaire



04.08.2010

8
EVALUATION EXECUTION

18.06.2012

9
Participating tools

Tool         Description
K-Search     Form-based
Ginseng      Natural language with constrained vocabulary and grammar
NLP-Reduce   Natural language for full English questions, sentence fragments,
             and keywords.
PowerAqua    Natural language interface




18.06.2012

10
Running the experiment




18.06.2012

11
ANALYSIS AND FEEDBACK

18.06.2012

12
Results
Criterion                         K-Search    Ginseng       Nlp-     PowerAqua
                          ‘Bad’
                           Bad    Form-      Controlled    Reduce     NL-based
                                  based      NL-based     NL-based
Mean experiment time (s)          4313.84     3612.12     4798.58     2003.9     ‘Awful’
Mean SUS (0 – 100)                44.38         40         25.94       72.25
                                                                                 ‘Good’
Mean Ext.Questionnaire (0-100)    47.29         45         44.63       80.67
Mean number of attempts           2.37         2.03         5.54       2.01    Twice # of
                                                                               attempts
Mean answer found rate            0.41         0.19         0.21       0.55
Mean execution time (s)           0.44         0.51         0.51        11       slowest
Mean input time (s)               69.11        81.63         29        16.03

                                                          slowest

   18.06.2012

   13
Feedback: input style
Input          Positive                                    Negative
Free NL        • fast (16 and 29 sec on average)           mismatch (habitability) problem: “I need to
               • most natural (query in plain natural      know and use the terms expected by the
                 language)                                 system and not my own terms to get results”
Contr. NL      • guidance: suggestions and auto-           very restricted language model:
                 completion                                • frustration (low SUS)
               • avoids habitability problem (only valid   • limit flexibility and expressiveness
                 queries)                                  • slow query formulation (highest input
                                                             time: 81.63 sec)
Form           • allow users to build more complex         • more difficult to use than NL
                 queries than NL                           • time consuming (input time: 69.11 sec on
               • helpful to know the search space            average)
                 (concepts & relations)




  18.06.2012

  14
Feedback: results
Aspect         Comments
Presentation   Results not user-friendly
               • provided full URIs of the concepts
                 (e.g. `http://www.mooney.net/geo#tennesse2’)
               • used ontology labels for providing a NL representation of the answer
                 (e.g. `montgomeryAI’)
Management Users have high expectations; requested advanced means of managing
           the results such as:
           • storing and reusing results of previous queries
           • filtering results according to some suitable criteria
           • checking the provenance of the results
           • basic manipulations such as sorting results




18.06.2012

15
FUTURE DIRECTIONS FOR
     RESEARCH
18.06.2012

16
Input Style
• Visualising the search space shows:
      • what type of information is available (exploration)
      • what queries are supported (query formulation guidance).
• Typing queries in natural language is fast and easy

• Provide ‘dual query formulation’ approach
      • users unfamiliar with domain can correctly formulate their
        intended queries using view-based
      • users familiar with domain can use faster NL queries

18.06.2012

17
Input Style
• Comparatives and Superlatives still a challenge
      e.g., FREyA uses an ‘intervention approach’
             • if a numerical datatype property is found in user query:
               1. generates maximum, minimum and sum functions
               2. user chooses the required function




18.06.2012

18
Query Execution
Delays in response time negatively affect user experience and
satisfaction.

• Provide feedback
      • reduces the effect of delays (more willing to wait if they know the
        status of their search process).

• Provide intermediate (partial) results
      • gradually incremented to provide the complete result set.
      • similar to (arguably better than) basic feedback




18.06.2012

19
Results
 • Presentation
      • Attractive, accessible, understandable and user-friendly.
      • Augment with associated information: `richer’ user experience.


 • Management
      •      Filter, sort
      •      Some complex questions require multiple sub-queries
      •      Ability to store and reuse the result set could be helpful.
      •      Queries can then be constructed by combining saved queries
             with logical operators such as `AND' and `OR’.


18.06.2012

20
CONCLUSIONS

18.06.2012

21
Conclusions & Recommendations
• Query input approaches serve different purposes:
      – View-based: explore and understand
      – NL-based: efficiency and simplicity

• Dual query approach to input
      – natural language and view-based input styles
      – improve search effectiveness and user satisfaction

• More sophisticated results presentation and management
      – customise: sort, filter, provenance and (temporary save)
      – enrich: supplementary information


18.06.2012

22
THANK YOU

18.06.2012

23

More Related Content

PPTX
Improving Semantic Search Using Query Log Analysis
PPTX
Introduction to NVivo
PDF
Question Focus Recognition in Question Answering Systems
PDF
Enhance discovery Solr and Mahout
PDF
Chapter8.coding
PDF
Combining IR with Relevance Feedback for Concept Location
PDF
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
PDF
Recommender system algorithm and architecture
Improving Semantic Search Using Query Log Analysis
Introduction to NVivo
Question Focus Recognition in Question Answering Systems
Enhance discovery Solr and Mahout
Chapter8.coding
Combining IR with Relevance Feedback for Concept Location
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Recommender system algorithm and architecture

What's hot (17)

PDF
Open domain Question Answering System - Research project in NLP
PDF
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
PDF
Question Answering - Application and Challenges
PPTX
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
PDF
Answer Extraction for how and why Questions in Question Answering Systems
PPTX
Action research for_librarians_carl2012
PDF
Practical machine learning - Part 1
PPTX
Workshop 2 using nvivo 12 for qualitative data analysis
PDF
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
PPT
Örüntü tanıma - Pattern Recognition
PPTX
Question answering
PDF
Using and learning phrases
PPT
Computer Software in Qualitative Research: An Introduction to NVivo
PPTX
Data analysis – using computers
PPTX
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
PPTX
Tutorial on Opinion Mining and Sentiment Analysis
PDF
Framester and WFD
Open domain Question Answering System - Research project in NLP
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Question Answering - Application and Challenges
Setting Up a Qualitative or Mixed Methods Research Project in NVivo 10 to Cod...
Answer Extraction for how and why Questions in Question Answering Systems
Action research for_librarians_carl2012
Practical machine learning - Part 1
Workshop 2 using nvivo 12 for qualitative data analysis
Kamelia Aryafar: Musical Genre Classification Using Sparsity-Eager Support Ve...
Örüntü tanıma - Pattern Recognition
Question answering
Using and learning phrases
Computer Software in Qualitative Research: An Introduction to NVivo
Data analysis – using computers
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Tutorial on Opinion Mining and Sentiment Analysis
Framester and WFD
Ad

Similar to Evaluating Semantic Search Systems to Identify Future Directions of Research (20)

PPTX
Evaluating Semantic Search Query Approaches with Expert and Casual Users
PDF
Relevance Improvements at Cengage - Ivan Provalov
PDF
Performance Evaluation of Query Processing Techniques in Information Retrieval
PPTX
Influence of Timeline and Named-entity Components on User Engagement
PDF
Exploring session search
PDF
From multistage information seeking models to multistage search systems (IIiX...
PPTX
Speaking on the Record: Combining Interviews with Search Log Analysis in User...
PPTX
There is a method to it: Making meaning in information research through a mix...
PPTX
Machine Learned Relevance at A Large Scale Search Engine
PPTX
Speaking on the record: Combining interviews with search log analysis in user...
PPTX
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
PDF
Usage and impact of controlled vocabularies in a subject repository for index...
PDF
Explaining the User Experience of Recommender Systems with User Experiments
PDF
User search goal inference and feedback session using fast generalized – fuzz...
PDF
Designing an effective information architecture
PPT
Usability Testing: A Brief Introduction for the Novice
PDF
Dynamic Organization of User Historical Queries
PDF
From Exploration to Construction
 - How to Support the Complex Dynamics of In...
PPTX
Does History Matter?
Evaluating Semantic Search Query Approaches with Expert and Casual Users
Relevance Improvements at Cengage - Ivan Provalov
Performance Evaluation of Query Processing Techniques in Information Retrieval
Influence of Timeline and Named-entity Components on User Engagement
Exploring session search
From multistage information seeking models to multistage search systems (IIiX...
Speaking on the Record: Combining Interviews with Search Log Analysis in User...
There is a method to it: Making meaning in information research through a mix...
Machine Learned Relevance at A Large Scale Search Engine
Speaking on the record: Combining interviews with search log analysis in user...
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-S...
Usage and impact of controlled vocabularies in a subject repository for index...
Explaining the User Experience of Recommender Systems with User Experiments
User search goal inference and feedback session using fast generalized – fuzz...
Designing an effective information architecture
Usability Testing: A Brief Introduction for the Novice
Dynamic Organization of User Historical Queries
From Exploration to Construction
 - How to Support the Complex Dynamics of In...
Does History Matter?
Ad

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Encapsulation theory and applications.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Spectroscopy.pptx food analysis technology
PDF
Empathic Computing: Creating Shared Understanding
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Mobile App Security Testing_ A Comprehensive Guide.pdf
Programs and apps: productivity, graphics, security and other tools
Encapsulation theory and applications.pdf
MIND Revenue Release Quarter 2 2025 Press Release
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation_ Review paper, used for researhc scholars
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectroscopy.pptx food analysis technology
Empathic Computing: Creating Shared Understanding
The AUB Centre for AI in Media Proposal.docx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

Evaluating Semantic Search Systems to Identify Future Directions of Research

  • 1. IWEST 2012 workshop located at ESWC 2012 Evaluating Semantic Search Systems to Identify Future Directions of Research Khadija Elbedweihy1, Stuart N. Wrigley1, Fabio Ciravegna1, Dorothee Reinhard2, Abraham Bernstein2 1University of Sheffield, UK 18.06.2012 2University of Zurich, Switzerland 1
  • 2. Outline • Introduction • Evaluation Design • Evaluation Execution • Usability Feedback and Analysis • Future Directions for Research • Conclusions 18.06.2012 2
  • 4. Semantic Search • Semantic Search tools have different • querying approaches (e.g., forms, graphs, keywords). • search strategies during processing and query execution. • format and content of the results presented to the user. • These factors influence the user's perceived performance usability of the tool. • Searching is a user-centric process; usability evaluation is as important as – if not more than – assessing the performance. 18.06.2012 4
  • 5. Previous evaluation efforts • Kaufmann (2007): compared 4 SW query interfaces (NL and Graph- based) • SemSearch Challenge: ad-hoc object retrieval using keywords • Question Answering Over Linked Data (QALD): two NL interfaces • TREC Entity List Completion (ELC) Task: similar to SemSearch • All previous evaluations based upon the Cranfield methodology – test collection; set of tasks; set of relevance judgments. • Little or no focus on usability 18.06.2012 5
  • 7. Evaluation Design Aspect Details Tools • Any query input style • Answers extracted from data (e.g., list of URIs or literals but not documents) Data Mooney Natural Language Learning Data • known within the search community • simple and well-known domain for subjects (geography) • questions already available • Give me all the state capitals of the USA? • Which rivers in Arkansas are longer than Alleghany river? Subjects 38 subjects (26 males, 12 females); aged between 20 and 35 years old Criteria • Usability: • query input (expressiveness, etc.) • usefulness and suitability of returned answers (data) and presentation • Performance: speed of execution (also affects user satisfaction) 18.06.2012 7
  • 8. Data Captured • Results for each question: – time required to formulate query – number of attempts required to answer question – success rate (user found satisfying answer or not) – query execution time • Questionnaires capturing user experience – System Usability Scale (SUS) questionnaire – Extended questionnaire – Demographics questionnaire 04.08.2010 8
  • 10. Participating tools Tool Description K-Search Form-based Ginseng Natural language with constrained vocabulary and grammar NLP-Reduce Natural language for full English questions, sentence fragments, and keywords. PowerAqua Natural language interface 18.06.2012 10
  • 13. Results Criterion K-Search Ginseng Nlp- PowerAqua ‘Bad’ Bad Form- Controlled Reduce NL-based based NL-based NL-based Mean experiment time (s) 4313.84 3612.12 4798.58 2003.9 ‘Awful’ Mean SUS (0 – 100) 44.38 40 25.94 72.25 ‘Good’ Mean Ext.Questionnaire (0-100) 47.29 45 44.63 80.67 Mean number of attempts 2.37 2.03 5.54 2.01 Twice # of attempts Mean answer found rate 0.41 0.19 0.21 0.55 Mean execution time (s) 0.44 0.51 0.51 11 slowest Mean input time (s) 69.11 81.63 29 16.03 slowest 18.06.2012 13
  • 14. Feedback: input style Input Positive Negative Free NL • fast (16 and 29 sec on average) mismatch (habitability) problem: “I need to • most natural (query in plain natural know and use the terms expected by the language) system and not my own terms to get results” Contr. NL • guidance: suggestions and auto- very restricted language model: completion • frustration (low SUS) • avoids habitability problem (only valid • limit flexibility and expressiveness queries) • slow query formulation (highest input time: 81.63 sec) Form • allow users to build more complex • more difficult to use than NL queries than NL • time consuming (input time: 69.11 sec on • helpful to know the search space average) (concepts & relations) 18.06.2012 14
  • 15. Feedback: results Aspect Comments Presentation Results not user-friendly • provided full URIs of the concepts (e.g. `http://www.mooney.net/geo#tennesse2’) • used ontology labels for providing a NL representation of the answer (e.g. `montgomeryAI’) Management Users have high expectations; requested advanced means of managing the results such as: • storing and reusing results of previous queries • filtering results according to some suitable criteria • checking the provenance of the results • basic manipulations such as sorting results 18.06.2012 15
  • 16. FUTURE DIRECTIONS FOR RESEARCH 18.06.2012 16
  • 17. Input Style • Visualising the search space shows: • what type of information is available (exploration) • what queries are supported (query formulation guidance). • Typing queries in natural language is fast and easy • Provide ‘dual query formulation’ approach • users unfamiliar with domain can correctly formulate their intended queries using view-based • users familiar with domain can use faster NL queries 18.06.2012 17
  • 18. Input Style • Comparatives and Superlatives still a challenge e.g., FREyA uses an ‘intervention approach’ • if a numerical datatype property is found in user query: 1. generates maximum, minimum and sum functions 2. user chooses the required function 18.06.2012 18
  • 19. Query Execution Delays in response time negatively affect user experience and satisfaction. • Provide feedback • reduces the effect of delays (more willing to wait if they know the status of their search process). • Provide intermediate (partial) results • gradually incremented to provide the complete result set. • similar to (arguably better than) basic feedback 18.06.2012 19
  • 20. Results • Presentation • Attractive, accessible, understandable and user-friendly. • Augment with associated information: `richer’ user experience. • Management • Filter, sort • Some complex questions require multiple sub-queries • Ability to store and reuse the result set could be helpful. • Queries can then be constructed by combining saved queries with logical operators such as `AND' and `OR’. 18.06.2012 20
  • 22. Conclusions & Recommendations • Query input approaches serve different purposes: – View-based: explore and understand – NL-based: efficiency and simplicity • Dual query approach to input – natural language and view-based input styles – improve search effectiveness and user satisfaction • More sophisticated results presentation and management – customise: sort, filter, provenance and (temporary save) – enrich: supplementary information 18.06.2012 22