SlideShare a Scribd company logo
Anna Mastora & Sarantos Kapidakis
Laboratory on Digital Libraries and Electronic Publishing
     Department of Archives and Library Science
                   Ionian University

2nd Workshop on Digital Information Management
              25-26 April 2012, Corfu, Greece
   Introduction
   Research hypothesis
   Language & Problems in Information Retrieval
   Query Expansion
   Context
   Knowledge Organisation Systems (KOS)
   Considerations




                                                   2
   Information Retrieval (IR) is about retrieving
    relevant results as response to expressed
    information needs
   The expression of information needs uses natural
    language representations
    ◦ Language is clearly ambiguous (Szostak, 2010)
       Not all linguistic representations are distinguishably
        informative of the user’s intent




                                                                 3
   Description of terms
    ◦ Deals with each term individually
      e.g. definitions in a dictionary


   Discrimination of terms
    ◦ Deals with the relations between terms
      e.g. hierarchical relationships within a thesaurus


                                                       (Blair, 2006)




                                                                       4
   By studying the actual use of language in every-day
    activities is how we can clarify meaning; deriving
    information about the context within which words
    are used will help us clarify better the identified
    indeterminacies




                                                          5
   […]for it’s not an underlying logic that clarifies what we
    mean, it’s the context, activities and practices in which we
    use language that provide the fundamental clarification of
    meaning we are looking for (Blair, 2006)

   Only in the stream of thought and life do words have
    meaning. (L. Wittgenstein, “Zettel”, §173)

   Meaning depends on consistent usage but requires more than
    that; it also requires that speakers be able to check that
    someone’s usage is consistent (Jaworski, 2011)



                                                                   6
7
Language is a labyrinth of paths. You approach from
one side and know your way about; you approach the
same place from another side and no longer know
your way about. (§203)

                                            L. Wittgenstein
                      “Philosophical Investigations” (1967)




                                                              8
   In order for Information Systems to deal with the
    problems caused by the diversity of linguistic
    representations, Query Expansion is implemented.

    ◦ Supplementing the original query with additional -
      meaningful- words or phrases (manually,
      automatically, semi-automatically)
      What *meaningful* means?
      More information about the “context”
        What is the *context*?




                                                           9
More information about the involved parties within
 the information retrieval process




    The Multi-faceted concept of Context   (Bhatia & Kumar, 2010)
11
12
13
- asking the users explicitly
- statistical processing of log files (e.g. Query chains)
- qualitative evaluation of log files, data or systems
- pre-processing of document corpora
- machine learning techniques (un-, semi-, supervised)
- implementation of user behaviour models
- personalisation/ profiling of users
- knowledge organisation structures
- ...




                                                            14
   Meant to be some kind of a problem solver; a mediator
    between the inquirer and the content which is stored in
    the documents



    ◦ Placing a concept within a hierarchical definition establishes
     what sort of thing this is and what sort of thing it is not,
     and often sorts the subsidiary elements of which it may be
     comprised (Szostak, 2010)




                                                                       15
The expert does not hold a “more complete”
definition of the words than we do. He simply
knows more about certain words than we do, and
by providing this additional information about
them, it may be useful for identifying each word in
different circumstances.
                                         (Blair, 2006)




                                                         16
   Schemes for organising information & promoting knowledge
    management (Hodge, 2000)
    ◦ Term lists (authority files, glossaries, dictionaries, gazetteers)
    ◦ Classification & categories (classification scheme, taxonomy,
      subject headings)
    ◦ Relationship lists (thesaurus, semantic network, ontology)
   Constitute ways for formalising knowledge and,
    consequently, communication through linguistic expressions
    or any other kind of definitional or descriptive sign, like
    visual.
   Used in Query Expansion to derive context




                                                                           17
   The epistemological basis of any theory of
    Knowledge Organization is an accepted postulate.
    In other words, how knowledge is organized and
    represented depends largely on the understanding
    of how knowledge is organized and represented.
                                   (Alexiev & Marksbury, 2010)




                                                                 18
   Since language is involved in their creation and
    development, KOSs bear themselves the inherent
    characteristics of the use of language
    ◦ Ambiguity [PoS: “looks”, Semantic: “bank”, Syntactic: “He hit
      the girl with the hat”]
    ◦ Homonymy [Homophones: “too /two”, Homographs: “tire”]
    ◦ Polysemy [“mouth” (on the face OR the opening of a cave)]
    ◦ Synonymy [“sick – ill”]




                                                                      19
   Even if KOSs try to capture and deliver the absolute
    meaning [or a more targeted one] of what they
    describe, they still are considered “collection
    independent knowledge structures” (Efthimiadis, 1996)
    ◦ There still are missing parts for the communication
      of the intended meaning

    Ambiguity differs only by degree between universal
     and domain-specific classifications, though that
     difference of degree is likely quite significant
                                              (Szostak, 2010)




                                                                20
   Take advantage of *user models*

   Take advantage of *user evaluation*

They deliver information about the real use of language



    The degree of ambiguity lessens within groups that
     regularly interact (though it does not disappear)
                                             (Szostak, 2010)



                                                               21
Thank you!
                                                              Contact:
                                                        Anna Mastora
                                            mastora [at] ionio [dot] gr



This research has been co-financed by the European Union (European Social Fund
– ESF) and Greek national funds through the Operational Program "Education and
Lifelong Learning" of the National Strategic Reference Framework (NSRF) -
Research Funding Program: Heracleitus II. Investing in knowledge society through
the European Social Fund.



                                                                                   22

More Related Content

PPTX
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
PDF
Representing Texts as contextualized Entity Centric Linked Data Graphs
PDF
Schema-agnositc queries over large-schema databases: a distributional semanti...
PPTX
WiSS Challenge - Day 2
PPTX
Different Semantic Perspectives for Question Answering Systems
PDF
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
PDF
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
PDF
DBpedia InsideOut
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
Representing Texts as contextualized Entity Centric Linked Data Graphs
Schema-agnositc queries over large-schema databases: a distributional semanti...
WiSS Challenge - Day 2
Different Semantic Perspectives for Question Answering Systems
Open Education Challenge 2014: exploiting Linked Data in Educational Applicat...
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
DBpedia InsideOut

Viewers also liked (20)

PDF
Linked Data Fragments
PPTX
Federated SPARQL query processing over the Web of Data
PPTX
NLP todo
ODP
DBpedia: A Public Data Infrastructure for the Web of Data
PPT
Gathering Alternative Surface Forms for DBpedia Entities
PPTX
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
PPTX
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
PPTX
WISS QA Do it yourself Question answering over Linked Data
PDF
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
ODP
Fast Approximate A-box Consistency Checking using Machine Learning
PDF
LDQL: A Query Language for the Web of Linked Data
PDF
Applying Linked Open Data to Public Procurement
PPTX
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
PDF
Disambiguating Polysemous Queries For Document Retrieval
PPT
Introduction to question answering for linked data & big data
PPT
Comparative study of different ranking algorithms adopted by search engine
PDF
Can Deep Learning Techniques Improve Entity Linking?
PDF
Exploiting the query structure for efficient join ordering in SPARQL queries
ODP
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
PPTX
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
Linked Data Fragments
Federated SPARQL query processing over the Web of Data
NLP todo
DBpedia: A Public Data Infrastructure for the Web of Data
Gathering Alternative Surface Forms for DBpedia Entities
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
WISS QA Do it yourself Question answering over Linked Data
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Fast Approximate A-box Consistency Checking using Machine Learning
LDQL: A Query Language for the Web of Linked Data
Applying Linked Open Data to Public Procurement
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Disambiguating Polysemous Queries For Document Retrieval
Introduction to question answering for linked data & big data
Comparative study of different ranking algorithms adopted by search engine
Can Deep Learning Techniques Improve Entity Linking?
Exploiting the query structure for efficient join ordering in SPARQL queries
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
Ad

Similar to Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Organisation (20)

PDF
Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective
PPTX
Information technologies of cognitive thesauri design
PPTX
430-F12 Class 5: Andrews LEA Approach
PPTX
Opening Content for Deeper Inquiry
PPTX
Lesson planning flash_card
PPT
From first cycle to second cycle qualitative coding: "Seeing a whole"
PDF
Perspectives on the Information Literate University
PPT
Teaching vocabulary to advanced students
PDF
Topic Maps, Douglas Engelbart, and Everything
PPTX
Dcla13 discourse, computation and context – sociocultural dcla
PPTX
Writing open tools[2]
PPTX
KRR Unit-V for btech students helpful.pptx
PPTX
EXPANDED DEFINITION OF WORDS.pptxjejejdidiej
PDF
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
PPT
Naturalistic qualitative inquiry.
PDF
EMPATIC Workshop Poster - Higher Education Sector
PDF
XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...
DOC
Chapter 4 language
PPTX
Dealing the Cards
PDF
Semantics ii what definitions offer
Aldo Gangemi - Meaning on the Web: An Empirical Design Perspective
Information technologies of cognitive thesauri design
430-F12 Class 5: Andrews LEA Approach
Opening Content for Deeper Inquiry
Lesson planning flash_card
From first cycle to second cycle qualitative coding: "Seeing a whole"
Perspectives on the Information Literate University
Teaching vocabulary to advanced students
Topic Maps, Douglas Engelbart, and Everything
Dcla13 discourse, computation and context – sociocultural dcla
Writing open tools[2]
KRR Unit-V for btech students helpful.pptx
EXPANDED DEFINITION OF WORDS.pptxjejejdidiej
Digital Humanities 2009 - Laying out the conceptual foundations for data inte...
Naturalistic qualitative inquiry.
EMPATIC Workshop Poster - Higher Education Sector
XIP Dashboard: Visual Analytics from Automated Rhetorical Parsing of Scient...
Chapter 4 language
Dealing the Cards
Semantics ii what definitions offer
Ad

More from Giannis Tsakonas (20)

PDF
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
PDF
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
PDF
Increasing traceability of physical library items through Koha: the case of S...
PDF
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
PDF
We were group no 2: notes for the MLAS2015 workshop
PDF
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
PDF
{Tech}changes: the technological state of Greek Libraries.
PDF
Affective relationships between users & libraries in times of economic stress
PDF
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
PDF
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
PDF
FRBR και Linked Data - Σεμινάριο Αθήνας
PDF
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
PDF
Policies for geospatial collections: a research in US and Canadian academic l...
PDF
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
PDF
Path-based MXML Storage and Querying
PDF
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
PDF
Open Bibliographic Data and E-LIS
PDF
Dileo Presentation (in English)
PDF
Evaluation Insights to Key Processes of Digital Repositories
PDF
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
Increasing traceability of physical library items through Koha: the case of S...
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
We were group no 2: notes for the MLAS2015 workshop
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
{Tech}changes: the technological state of Greek Libraries.
Affective relationships between users & libraries in times of economic stress
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
FRBR και Linked Data - Σεμινάριο Αθήνας
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Policies for geospatial collections: a research in US and Canadian academic l...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Path-based MXML Storage and Querying
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Open Bibliographic Data and E-LIS
Dileo Presentation (in English)
Evaluation Insights to Key Processes of Digital Repositories
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...

Recently uploaded (20)

PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Computing-Curriculum for Schools in Ghana
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Cell Types and Its function , kingdom of life
PPTX
Pharma ospi slides which help in ospi learning
PDF
Pre independence Education in Inndia.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
RMMM.pdf make it easy to upload and study
PDF
01-Introduction-to-Information-Management.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial disease of the cardiovascular and lymphatic systems
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
GDM (1) (1).pptx small presentation for students
Computing-Curriculum for Schools in Ghana
102 student loan defaulters named and shamed – Is someone you know on the list?
Cell Types and Its function , kingdom of life
Pharma ospi slides which help in ospi learning
Pre independence Education in Inndia.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Supply Chain Operations Speaking Notes -ICLT Program
Microbial diseases, their pathogenesis and prophylaxis
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Final Presentation General Medicine 03-08-2024.pptx
VCE English Exam - Section C Student Revision Booklet
TR - Agricultural Crops Production NC III.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
RMMM.pdf make it easy to upload and study
01-Introduction-to-Information-Management.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra

Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Organisation

  • 1. Anna Mastora & Sarantos Kapidakis Laboratory on Digital Libraries and Electronic Publishing Department of Archives and Library Science Ionian University 2nd Workshop on Digital Information Management 25-26 April 2012, Corfu, Greece
  • 2. Introduction  Research hypothesis  Language & Problems in Information Retrieval  Query Expansion  Context  Knowledge Organisation Systems (KOS)  Considerations 2
  • 3. Information Retrieval (IR) is about retrieving relevant results as response to expressed information needs  The expression of information needs uses natural language representations ◦ Language is clearly ambiguous (Szostak, 2010)  Not all linguistic representations are distinguishably informative of the user’s intent 3
  • 4. Description of terms ◦ Deals with each term individually  e.g. definitions in a dictionary  Discrimination of terms ◦ Deals with the relations between terms  e.g. hierarchical relationships within a thesaurus (Blair, 2006) 4
  • 5. By studying the actual use of language in every-day activities is how we can clarify meaning; deriving information about the context within which words are used will help us clarify better the identified indeterminacies 5
  • 6. […]for it’s not an underlying logic that clarifies what we mean, it’s the context, activities and practices in which we use language that provide the fundamental clarification of meaning we are looking for (Blair, 2006)  Only in the stream of thought and life do words have meaning. (L. Wittgenstein, “Zettel”, §173)  Meaning depends on consistent usage but requires more than that; it also requires that speakers be able to check that someone’s usage is consistent (Jaworski, 2011) 6
  • 7. 7
  • 8. Language is a labyrinth of paths. You approach from one side and know your way about; you approach the same place from another side and no longer know your way about. (§203) L. Wittgenstein “Philosophical Investigations” (1967) 8
  • 9. In order for Information Systems to deal with the problems caused by the diversity of linguistic representations, Query Expansion is implemented. ◦ Supplementing the original query with additional - meaningful- words or phrases (manually, automatically, semi-automatically)  What *meaningful* means?  More information about the “context”  What is the *context*? 9
  • 10. More information about the involved parties within the information retrieval process The Multi-faceted concept of Context (Bhatia & Kumar, 2010)
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. - asking the users explicitly - statistical processing of log files (e.g. Query chains) - qualitative evaluation of log files, data or systems - pre-processing of document corpora - machine learning techniques (un-, semi-, supervised) - implementation of user behaviour models - personalisation/ profiling of users - knowledge organisation structures - ... 14
  • 15. Meant to be some kind of a problem solver; a mediator between the inquirer and the content which is stored in the documents ◦ Placing a concept within a hierarchical definition establishes what sort of thing this is and what sort of thing it is not, and often sorts the subsidiary elements of which it may be comprised (Szostak, 2010) 15
  • 16. The expert does not hold a “more complete” definition of the words than we do. He simply knows more about certain words than we do, and by providing this additional information about them, it may be useful for identifying each word in different circumstances. (Blair, 2006) 16
  • 17. Schemes for organising information & promoting knowledge management (Hodge, 2000) ◦ Term lists (authority files, glossaries, dictionaries, gazetteers) ◦ Classification & categories (classification scheme, taxonomy, subject headings) ◦ Relationship lists (thesaurus, semantic network, ontology)  Constitute ways for formalising knowledge and, consequently, communication through linguistic expressions or any other kind of definitional or descriptive sign, like visual.  Used in Query Expansion to derive context 17
  • 18. The epistemological basis of any theory of Knowledge Organization is an accepted postulate. In other words, how knowledge is organized and represented depends largely on the understanding of how knowledge is organized and represented. (Alexiev & Marksbury, 2010) 18
  • 19. Since language is involved in their creation and development, KOSs bear themselves the inherent characteristics of the use of language ◦ Ambiguity [PoS: “looks”, Semantic: “bank”, Syntactic: “He hit the girl with the hat”] ◦ Homonymy [Homophones: “too /two”, Homographs: “tire”] ◦ Polysemy [“mouth” (on the face OR the opening of a cave)] ◦ Synonymy [“sick – ill”] 19
  • 20. Even if KOSs try to capture and deliver the absolute meaning [or a more targeted one] of what they describe, they still are considered “collection independent knowledge structures” (Efthimiadis, 1996) ◦ There still are missing parts for the communication of the intended meaning Ambiguity differs only by degree between universal and domain-specific classifications, though that difference of degree is likely quite significant (Szostak, 2010) 20
  • 21. Take advantage of *user models*  Take advantage of *user evaluation* They deliver information about the real use of language The degree of ambiguity lessens within groups that regularly interact (though it does not disappear) (Szostak, 2010) 21
  • 22. Thank you! Contact: Anna Mastora mastora [at] ionio [dot] gr This research has been co-financed by the European Union (European Social Fund – ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: Heracleitus II. Investing in knowledge society through the European Social Fund. 22