Eric Sieverts Media, information & communication Amsterdam University of Applied Sciences  / Section Innovation & Development University Library Utrecht A pair of shoes in the thesaurus reflexions on human and computer indexing Society of Indexers Conference 2010  The challenging future of indexing 30 September 2010, Middelburg
agenda the holy grail for search systems: let people find what they search searching in the world of  Google what's wrong with  Google  (and alikes) metadata and indexing indexing and knowledge organization knowledge organization and the semantic web Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
searching in the world of Google  appears to be "the measure of all things" in search: with Google "everything can be found" Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
searching in the world of Google  appears to be "the measure of all things" in search: with Google "everything can be found" but isn't there a paradox ? if Google  (or Yahoo! or Bing)  contains everything  (> 500.000.000.000 items)  can "it" still be found ? >>  anticipation of user's intentions    & peerless ranking algorithms    become increasingly important Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
search, search, search, search, search, ......  searcher / query documents match the basic search-and-find paradigm Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
search, search, search, ......  validity for free-text matching ? match Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010 (paraphrasing a Dutch poetry title "Lees maar er staat niet wat er staat") "just read; it does not mean what you're reading"  How does Google know what you mean? How does Google know what a document means?
filename:  thesaurus.jpg is this meant to be representative for the ease of use of thesauri?   to what query is this Google's answer ?
Want to know something about " hallenkerken "  (Dutch for "hall church") thru Google Books? Google's first hit is a book about building thesauri, containing  the word in a single example of broader and narrower terms
searching in the world of The new  Google Instant  tries to predict  user intent (the holy grail for search engine developers) after typing 1 or 2 letters it already presents results  for statistically most probable (longer) words but is Google really guessing right? Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
match classical situation with controlled human indexing searcher must enter the "term(s)" that have been used to characterize the subject indexer must assign “correct” terms to characterize the  document  in principle perfect match is possible Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010 search, search, search, ......
match not user-friendly:  searcher has to  invent  the correct terms expensive:   indexers must  analyze  the document in order to assign the correct terms however Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010 search, search, search, ......  classical situation with controlled human indexing
search in the world of searcher just types some words  (or often only one single word) search system contains (all) the words from the documents themselves often you don't find  all  you need - still satisfied ?  match Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010 search, search, search, ......
why still user satisfaction ? despite recall and precision problems: search system looks attractively simple searcher always finds something  (in 500 billion web pages) smart relevance ranking,  providing some relevant items among first 10  for most (simple) questions, for majority of users, very often even #1 already and:  who cares about lousy recall & precision  (in the  Google -world)? Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
language technology at searcher side original simple query expanded & disambiguated  statistics generate additional terms to refine queries search system contains just the words from the documents themselves improved queries will result in better  answers ? match Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010 search, search, search, ......
language technology for better "query" "word stemming"  and  "fuzzy search" : automatically search for more wordforms  >>  better recall semantic network  (or ontology)  contains semantic relations between words : query expanded with semantically related terms  >>  better recall for different meanings of a word, a semantic network  (or  ontology)  contains relations with different words  >>  disambiguation  >>  better precision no scientific evidence yet about how much improvement  Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
language technology for better "query" statistical analysis of search result generates characteristic terms, from which user can choose to refine its query such words can also be derived from a synonym list, thesaurus, semantic network et cetera mostly  >>  better precision Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
language technology at the document search with "correct" or “important” terms language technology enriches document  with "correct" term  (from thesaurus)  or derives characteristic terms from the text in principle perfect match is possible match Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010 search, search, search, ......
automatic classification
automatic classification or enrichment 1.  deriving specific terms from the document itself on the basis of word lists and text analysis specific types of terms  (e.g. names of persons, places, products, parties, companies, etc.)  can be recognized and marked as such 2.  adding characteristics to classify a document after training it, a system can analyze documents and classify them with terms from a thesaurus or with classes from a taxonomy despite some limitations it's getting better all the time  even for less tangible tasks as sentiment analysis  Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
The Calais Web Service automatically creates rich semantic metadata Named  Entities Facts Events
geographical recognition in Google Books
training a system thesaurus training documents analysis module “ finger- prints” training module enrichment of  thesaurus     Joop van Gent, Irion
classification with system enriched thesaurus new documents analysis module “ finger- prints” classification module    Joop van Gent, Irion enriched documents
endgame tips: checkmate with bishop  and knight (in Dutch:  "horse" ) chess equestrianism
knowledge  organization  systems metadata: more than  keywords or thesauri ?
knowledge organization systems can be more than just metadata models or tools for subject indexing  4 types of KOS : categorization systems  (like classifications and taxonomies) metadata models  (like MARC or Dublin Core) relational models  (like thesauri, semantic networks,  ontologies ) term lists  (like authorization files) more about ontologies in a moment knowledge organization systems Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
4 types of functions for KOS: description and labeling  (e.g. subject indexing with a thesaurus) definition  (e.g. specification of the meaning of concepts in a thesaurus or ontology) translation  (e.g. concordance between systems for  interoperability ) navigational  (thru the systematic structure of a taxonomy or classification, or the hierarchy of concepts in a thesaurus or ontology) some of these play a role in the semantic web knowledge organization systems Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
"knowledge-representation“ in which knowledge about  (a small part of) the world is stored  mostly not directly used for subject indexing allows more complete and complex representations of reality than a thesaurus with many possible types of relations between concepts with fixed roles and properties of these concepts often for limited domains  (“wine ontology”) sometimes broader in so-called “core ontologies”  for example: CIDOC-CRM (conceptual reference model) for concepts, relations and properties in the field of cultural heritage ontologies Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
relations between some concepts  in a simple "wine ontology"
example of the relations  between concepts about the statue of Balzac by  Rodin  [in CIDOC-CRM]
semantic web
“ ontologies” in relation to the semantic web in a more general connotation : general name for all kinds of subject indexing  (thesauri, classifications, taxonomies, name authority lists, .....) essential requirements :  ontology must be available in a form that can be read, interpreted and processed by a computer program  ->   needs notations and formal languages to describe them ontologies Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
ontology notation for semantic web RDF resource description framework standard to describe relations between object and its metadata OWL web ontology language standard for computer readable description of ontologies RDFS RDF-schema standard for description of a KOS in RDF  SKOS simple knowledge organization system standard for describing KOSses and relations between them in RDF Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
RDF uses XML to describe the relation between a resource (or object), its metadata and the used metadata standards resources should have a URI to refer to them RDF uses “namespaces” to refer to computer-readable description of the standards (link via URL)  RDF is meant to (re)use and to combine existing semantic systems properties (metadata) are registered in so-called triples:  subject  <predicate>  object  (which we could perhaps also write:  thing <property> value  )   RDF-triples are used in &quot;linked data&quot; Eric Sieverts  |  e.sieverts@library.uu.nl  |  http://www.library.uu.nl/medew/it/eric  |  [email_address] resource description framework
rdf triples subject  <predicate>  object   doc1  <has author>  auth1 auth1  <has name>  john smith auth1  <has affiliation>  home inc. auth1  <has email>  [email_address] Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010 graphical representation of simple network of 4 RDF-triples
SKOS-representation of thesaurus term & relations can be described in RDF Term : Economic cooperation  Used For : Economic co-operation  Broader terms : Economic policy  Narrower terms : Economic integration,  European economic cooperation,  European industrial cooperation,  Industrial cooperation  Related terms : Interdependence  Scope Note : Includes cooperative measures  in banking, trade, industry etc., between  and among countries.   Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
SKOS representation in RDF <rdf:RDF  xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;    xmlns:skos=&quot;http://www.w3.org/2004/02/skos/core#&quot;> <skos:Concept> <skos:prefLabel>Economic cooperation</skos:prefLabel> <skos:altLabel>Economic co-operation</skos:altLabel> <skos:scopeNote>Includes cooperative measures in banking, trade, industry etc., between and among countries. </skos:scopeNote> <skos:broader> <skos:Concept> <skos:prefLabel>Economic policy</skos:prefLabel> </skos:Concept> </skos:broader> <skos:related> <skos:Concept> <skos:prefLabel>Interdependence</skos:prefLabel> </skos:Concept> </skos:related> <skos:narrower> <skos:Concept> <skos:prefLabel>Economic integration</skos:prefLabel> </skos:Concept> </skos:narrower> <!-- ...more narrower terms omitted ... --> </skos:Concept> </rdf:RDF> Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
RDF and &quot;linked data&quot; Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010 a lot of buzz recently about &quot; linked (open) data &quot; it's just RDF-triples so it's computer readable it's on the internet  so it's open it's meant to be re-used so it's an important ingredient for the semantic web it's standardized so it can be re-used everybody can (and has to) contribute data so it is also somewhat messy
the &quot;linked data cloud&quot; - september 2010 - 24 billion RDF triples online
viaf: virtual  international authority file dbpedia: data  from Wikipedia last.fm: artists geonames: 6.2 M toponyms BBC: wildlife finder LCSH Reuters: openCalais IMDB
topic maps XML-based information systems  that can be considered as ontologies that need no additional notations and/or standards to make them computer-readable that combine knowledge representations and the indexed information in a single self-containing, interlinked system suited to make local knowledge accessible Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
topic maps consist of: concepts (=topics) that are being characterized with  “ names”  (can be any word - even multiple- to describe them)  (names are topics themselves as well!) “ types”  (describing to what class of concepts it belongs)  (types are topics themselves as well!) “ associations”  (specified types of relations between topics)  (associations are also topics, thus having types!) “ occurrences”  (information-items “about” the concept-topic)   (occurrences are also topics, thus having types!) all of this described in XML Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
verdi puccini lucca italy italia italië italien tosca madame -butterfly madama -butterfly roma rome occurrences situated in influenced composed location for place of birth simple example of  opera topic-map adopted from  Pepper association types topic types composer opera city country
©  Antony Pitts, Kal Ahmed, MusicDNA Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010 topic map application Royal Academy of Music in London developed a model to describe &quot;everything&quot; around music, from  work/composition   to  experience  of a particular performance conceptually similar to relational FRBR model in library world
©  Antony Pitts, Kal Ahmed, MusicDNA
semantic web ultimate application of interoperability using combination of methods and standards for storing, structuring, filling, formalizing, describing and interpreting  metadata  RDF(S)  ontologies (as well as thesauri, taxonomies, semantic networks, …)  formal languages (like SKOS and OWL) annotation of resources/objects (= subject indexing ) so that computers will be able to interpret meaning and to combine knowledge from separate systems Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
©   Guus Schreiber UvA / VU rdf annotation of web resource Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010
iconclass annotation
&quot;species ontology&quot; Eric Sieverts  |  e.sieverts@library.uu.nl  |  http://www.library.uu.nl/medew/it/eric  |  [email_address] ©   Guus Schreiber UvA / VU
search, search, search, search, search, ......  match the semantic web (and interoperability) still require a lot of subject indexing,  but with smart systems that: (help to) index dumb documents can infer meaning can match heterogeneous metadata  can improve dumb searches even a monkey may find correct information, even information he didn't know he was looking for  Eric Sieverts  |  e.g.sieverts@uu.nl  |  e.g.sieverts@hva.nl  |  http://www.library.uu.nl/medew/it/eric  |  Middelburg 30-9-2010

More Related Content

PPTX
UVA MDST 3703 Thematic Research Collections 2012-09-18
PDF
(121027) #fitalk big brother forensics, device tracking using browser-based...
PPTX
DV 2016: Don't Track Me, Bro - Security and Privacy as a Differentiator
PPT
Web browser privacy and security
PDF
Browser security — ROOTS
PPT
Browser Security
PPS
In other words...: Using multiple taxonimies
PPT
Searching electronic resources effectively BLDS, November 2012
UVA MDST 3703 Thematic Research Collections 2012-09-18
(121027) #fitalk big brother forensics, device tracking using browser-based...
DV 2016: Don't Track Me, Bro - Security and Privacy as a Differentiator
Web browser privacy and security
Browser security — ROOTS
Browser Security
In other words...: Using multiple taxonimies
Searching electronic resources effectively BLDS, November 2012

Similar to A pair of shoes in the thesaurus; some reflexions on human and computer indexing (20)

PPTX
Information research skills for projects and dissertations classics2015
PPT
DOCX
10242021 Printhttpscontent.uagc.eduprintWinckelman.
DOCX
10242021 Printhttpscontent.uagc.eduprintWinckelman.
PPTX
Classics Information research skills for projects and dissertations
PDF
BIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URI
PDF
eResources, Literature search and Reference Management Software
PPTX
Literature searching for your dissertation - Translation 2017
PPTX
Hector, E. Developing a practical workflow for maintaining online learning ob...
PPT
Open, social and linked - what do current Web trends tell us about the future...
PPT
UNIGIS Annual Conference - Information management VU
PPT
Using the library for research
PPTX
Search engines, e resources, and search strategy
PPTX
Rscd 2017 bo f data lifecycle data skills for libs
PPTX
Thinking about technology .... differently
PPT
Towards a digital library for York
PPT
Bibliotheek & Onderzoek 2.0?
PPTX
Organise your life and create frameworks with a digital library (schoolnetsa11)
KEY
Challenges for PLE research and development
PPTX
The role of virtual research environments (VRE's) within the context of an e-...
Information research skills for projects and dissertations classics2015
10242021 Printhttpscontent.uagc.eduprintWinckelman.
10242021 Printhttpscontent.uagc.eduprintWinckelman.
Classics Information research skills for projects and dissertations
BIBLIOTECARII MANAGERI AI DATELOR, BIBLIOTECILE API-URI
eResources, Literature search and Reference Management Software
Literature searching for your dissertation - Translation 2017
Hector, E. Developing a practical workflow for maintaining online learning ob...
Open, social and linked - what do current Web trends tell us about the future...
UNIGIS Annual Conference - Information management VU
Using the library for research
Search engines, e resources, and search strategy
Rscd 2017 bo f data lifecycle data skills for libs
Thinking about technology .... differently
Towards a digital library for York
Bibliotheek & Onderzoek 2.0?
Organise your life and create frameworks with a digital library (schoolnetsa11)
Challenges for PLE research and development
The role of virtual research environments (VRE's) within the context of an e-...
Ad

More from Eric Sieverts (20)

PPTX
Automatische classificatie
PPTX
Een andere blik op Google
PPT
Searching the internet - what patent searchers should know
PPT
Wij zullen vinden - ook in 2023
PPTX
Zoekmachines weten het antwoord
PPTX
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
PDF
Semantisch zoeken in een webomgeving
PPT
Information Retrieval: van specialisme tot commodity
PPT
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
PPT
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
PPT
Searching the internet - better with Google / Google not always best
PPT
Searching the internet - what patent searchers should know
PPT
Zin en onzin van metadata
PPT
40 jaar informatiegebruik
PPT
UBU 3.0: semantisch web & linked data voor de UB?
PPT
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
PPT
Searchtrends
PPT
Een digitale bibliotheek of alleen Google?
PPT
Project Panorama: vistas on validated information
PPT
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Automatische classificatie
Een andere blik op Google
Searching the internet - what patent searchers should know
Wij zullen vinden - ook in 2023
Zoekmachines weten het antwoord
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
Semantisch zoeken in een webomgeving
Information Retrieval: van specialisme tot commodity
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Searching the internet - better with Google / Google not always best
Searching the internet - what patent searchers should know
Zin en onzin van metadata
40 jaar informatiegebruik
UBU 3.0: semantisch web & linked data voor de UB?
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
Searchtrends
Een digitale bibliotheek of alleen Google?
Project Panorama: vistas on validated information
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Ad

Recently uploaded (20)

PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
My India Quiz Book_20210205121199924.pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
Hazard Identification & Risk Assessment .pdf
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
IGGE1 Understanding the Self1234567891011
PDF
advance database management system book.pdf
PDF
International_Financial_Reporting_Standa.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
Introduction to pro and eukaryotes and differences.pptx
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
My India Quiz Book_20210205121199924.pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
What if we spent less time fighting change, and more time building what’s rig...
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Hazard Identification & Risk Assessment .pdf
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
IGGE1 Understanding the Self1234567891011
advance database management system book.pdf
International_Financial_Reporting_Standa.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
A powerpoint presentation on the Revised K-10 Science Shaping Paper
FORM 1 BIOLOGY MIND MAPS and their schemes
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Chinmaya Tiranga quiz Grand Finale.pdf

A pair of shoes in the thesaurus; some reflexions on human and computer indexing

  • 1. Eric Sieverts Media, information & communication Amsterdam University of Applied Sciences / Section Innovation & Development University Library Utrecht A pair of shoes in the thesaurus reflexions on human and computer indexing Society of Indexers Conference 2010 The challenging future of indexing 30 September 2010, Middelburg
  • 2. agenda the holy grail for search systems: let people find what they search searching in the world of Google what's wrong with Google (and alikes) metadata and indexing indexing and knowledge organization knowledge organization and the semantic web Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 3. searching in the world of Google appears to be &quot;the measure of all things&quot; in search: with Google &quot;everything can be found&quot; Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 4. searching in the world of Google appears to be &quot;the measure of all things&quot; in search: with Google &quot;everything can be found&quot; but isn't there a paradox ? if Google (or Yahoo! or Bing) contains everything (> 500.000.000.000 items) can &quot;it&quot; still be found ? >> anticipation of user's intentions & peerless ranking algorithms become increasingly important Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 5. search, search, search, search, search, ...... searcher / query documents match the basic search-and-find paradigm Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 6. search, search, search, ...... validity for free-text matching ? match Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010 (paraphrasing a Dutch poetry title &quot;Lees maar er staat niet wat er staat&quot;) &quot;just read; it does not mean what you're reading&quot; How does Google know what you mean? How does Google know what a document means?
  • 7. filename: thesaurus.jpg is this meant to be representative for the ease of use of thesauri? to what query is this Google's answer ?
  • 8. Want to know something about &quot; hallenkerken &quot; (Dutch for &quot;hall church&quot;) thru Google Books? Google's first hit is a book about building thesauri, containing the word in a single example of broader and narrower terms
  • 9. searching in the world of The new Google Instant tries to predict user intent (the holy grail for search engine developers) after typing 1 or 2 letters it already presents results for statistically most probable (longer) words but is Google really guessing right? Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 10. match classical situation with controlled human indexing searcher must enter the &quot;term(s)&quot; that have been used to characterize the subject indexer must assign “correct” terms to characterize the document in principle perfect match is possible Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010 search, search, search, ......
  • 11. match not user-friendly: searcher has to invent the correct terms expensive: indexers must analyze the document in order to assign the correct terms however Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010 search, search, search, ...... classical situation with controlled human indexing
  • 12. search in the world of searcher just types some words (or often only one single word) search system contains (all) the words from the documents themselves often you don't find all you need - still satisfied ? match Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010 search, search, search, ......
  • 13. why still user satisfaction ? despite recall and precision problems: search system looks attractively simple searcher always finds something (in 500 billion web pages) smart relevance ranking, providing some relevant items among first 10 for most (simple) questions, for majority of users, very often even #1 already and: who cares about lousy recall & precision (in the Google -world)? Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 14. language technology at searcher side original simple query expanded & disambiguated statistics generate additional terms to refine queries search system contains just the words from the documents themselves improved queries will result in better answers ? match Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010 search, search, search, ......
  • 15. language technology for better &quot;query&quot; &quot;word stemming&quot; and &quot;fuzzy search&quot; : automatically search for more wordforms >> better recall semantic network (or ontology) contains semantic relations between words : query expanded with semantically related terms >> better recall for different meanings of a word, a semantic network (or ontology) contains relations with different words >> disambiguation >> better precision no scientific evidence yet about how much improvement Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 16. language technology for better &quot;query&quot; statistical analysis of search result generates characteristic terms, from which user can choose to refine its query such words can also be derived from a synonym list, thesaurus, semantic network et cetera mostly >> better precision Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 17. language technology at the document search with &quot;correct&quot; or “important” terms language technology enriches document with &quot;correct&quot; term (from thesaurus) or derives characteristic terms from the text in principle perfect match is possible match Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010 search, search, search, ......
  • 19. automatic classification or enrichment 1. deriving specific terms from the document itself on the basis of word lists and text analysis specific types of terms (e.g. names of persons, places, products, parties, companies, etc.) can be recognized and marked as such 2. adding characteristics to classify a document after training it, a system can analyze documents and classify them with terms from a thesaurus or with classes from a taxonomy despite some limitations it's getting better all the time even for less tangible tasks as sentiment analysis Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 20. The Calais Web Service automatically creates rich semantic metadata Named Entities Facts Events
  • 22. training a system thesaurus training documents analysis module “ finger- prints” training module enrichment of thesaurus  Joop van Gent, Irion
  • 23. classification with system enriched thesaurus new documents analysis module “ finger- prints” classification module  Joop van Gent, Irion enriched documents
  • 24. endgame tips: checkmate with bishop and knight (in Dutch: &quot;horse&quot; ) chess equestrianism
  • 25. knowledge organization systems metadata: more than keywords or thesauri ?
  • 26. knowledge organization systems can be more than just metadata models or tools for subject indexing 4 types of KOS : categorization systems (like classifications and taxonomies) metadata models (like MARC or Dublin Core) relational models (like thesauri, semantic networks, ontologies ) term lists (like authorization files) more about ontologies in a moment knowledge organization systems Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 27. 4 types of functions for KOS: description and labeling (e.g. subject indexing with a thesaurus) definition (e.g. specification of the meaning of concepts in a thesaurus or ontology) translation (e.g. concordance between systems for interoperability ) navigational (thru the systematic structure of a taxonomy or classification, or the hierarchy of concepts in a thesaurus or ontology) some of these play a role in the semantic web knowledge organization systems Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 28. &quot;knowledge-representation“ in which knowledge about (a small part of) the world is stored mostly not directly used for subject indexing allows more complete and complex representations of reality than a thesaurus with many possible types of relations between concepts with fixed roles and properties of these concepts often for limited domains (“wine ontology”) sometimes broader in so-called “core ontologies” for example: CIDOC-CRM (conceptual reference model) for concepts, relations and properties in the field of cultural heritage ontologies Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 29. relations between some concepts in a simple &quot;wine ontology&quot;
  • 30. example of the relations between concepts about the statue of Balzac by Rodin [in CIDOC-CRM]
  • 32. “ ontologies” in relation to the semantic web in a more general connotation : general name for all kinds of subject indexing (thesauri, classifications, taxonomies, name authority lists, .....) essential requirements : ontology must be available in a form that can be read, interpreted and processed by a computer program -> needs notations and formal languages to describe them ontologies Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 33. ontology notation for semantic web RDF resource description framework standard to describe relations between object and its metadata OWL web ontology language standard for computer readable description of ontologies RDFS RDF-schema standard for description of a KOS in RDF SKOS simple knowledge organization system standard for describing KOSses and relations between them in RDF Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 34. RDF uses XML to describe the relation between a resource (or object), its metadata and the used metadata standards resources should have a URI to refer to them RDF uses “namespaces” to refer to computer-readable description of the standards (link via URL) RDF is meant to (re)use and to combine existing semantic systems properties (metadata) are registered in so-called triples: subject <predicate> object (which we could perhaps also write: thing <property> value ) RDF-triples are used in &quot;linked data&quot; Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | [email_address] resource description framework
  • 35. rdf triples subject <predicate> object doc1 <has author> auth1 auth1 <has name> john smith auth1 <has affiliation> home inc. auth1 <has email> [email_address] Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010 graphical representation of simple network of 4 RDF-triples
  • 36. SKOS-representation of thesaurus term & relations can be described in RDF Term : Economic cooperation Used For : Economic co-operation Broader terms : Economic policy Narrower terms : Economic integration, European economic cooperation, European industrial cooperation, Industrial cooperation Related terms : Interdependence Scope Note : Includes cooperative measures in banking, trade, industry etc., between and among countries. Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 37. SKOS representation in RDF <rdf:RDF xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot; xmlns:skos=&quot;http://www.w3.org/2004/02/skos/core#&quot;> <skos:Concept> <skos:prefLabel>Economic cooperation</skos:prefLabel> <skos:altLabel>Economic co-operation</skos:altLabel> <skos:scopeNote>Includes cooperative measures in banking, trade, industry etc., between and among countries. </skos:scopeNote> <skos:broader> <skos:Concept> <skos:prefLabel>Economic policy</skos:prefLabel> </skos:Concept> </skos:broader> <skos:related> <skos:Concept> <skos:prefLabel>Interdependence</skos:prefLabel> </skos:Concept> </skos:related> <skos:narrower> <skos:Concept> <skos:prefLabel>Economic integration</skos:prefLabel> </skos:Concept> </skos:narrower> <!-- ...more narrower terms omitted ... --> </skos:Concept> </rdf:RDF> Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 38. RDF and &quot;linked data&quot; Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010 a lot of buzz recently about &quot; linked (open) data &quot; it's just RDF-triples so it's computer readable it's on the internet so it's open it's meant to be re-used so it's an important ingredient for the semantic web it's standardized so it can be re-used everybody can (and has to) contribute data so it is also somewhat messy
  • 39. the &quot;linked data cloud&quot; - september 2010 - 24 billion RDF triples online
  • 40. viaf: virtual international authority file dbpedia: data from Wikipedia last.fm: artists geonames: 6.2 M toponyms BBC: wildlife finder LCSH Reuters: openCalais IMDB
  • 41. topic maps XML-based information systems that can be considered as ontologies that need no additional notations and/or standards to make them computer-readable that combine knowledge representations and the indexed information in a single self-containing, interlinked system suited to make local knowledge accessible Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 42. topic maps consist of: concepts (=topics) that are being characterized with “ names” (can be any word - even multiple- to describe them) (names are topics themselves as well!) “ types” (describing to what class of concepts it belongs) (types are topics themselves as well!) “ associations” (specified types of relations between topics) (associations are also topics, thus having types!) “ occurrences” (information-items “about” the concept-topic) (occurrences are also topics, thus having types!) all of this described in XML Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 43. verdi puccini lucca italy italia italië italien tosca madame -butterfly madama -butterfly roma rome occurrences situated in influenced composed location for place of birth simple example of opera topic-map adopted from Pepper association types topic types composer opera city country
  • 44. © Antony Pitts, Kal Ahmed, MusicDNA Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010 topic map application Royal Academy of Music in London developed a model to describe &quot;everything&quot; around music, from work/composition to experience of a particular performance conceptually similar to relational FRBR model in library world
  • 45. © Antony Pitts, Kal Ahmed, MusicDNA
  • 46. semantic web ultimate application of interoperability using combination of methods and standards for storing, structuring, filling, formalizing, describing and interpreting metadata RDF(S) ontologies (as well as thesauri, taxonomies, semantic networks, …) formal languages (like SKOS and OWL) annotation of resources/objects (= subject indexing ) so that computers will be able to interpret meaning and to combine knowledge from separate systems Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 47. © Guus Schreiber UvA / VU rdf annotation of web resource Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010
  • 49. &quot;species ontology&quot; Eric Sieverts | e.sieverts@library.uu.nl | http://www.library.uu.nl/medew/it/eric | [email_address] © Guus Schreiber UvA / VU
  • 50. search, search, search, search, search, ...... match the semantic web (and interoperability) still require a lot of subject indexing, but with smart systems that: (help to) index dumb documents can infer meaning can match heterogeneous metadata can improve dumb searches even a monkey may find correct information, even information he didn't know he was looking for Eric Sieverts | e.g.sieverts@uu.nl | e.g.sieverts@hva.nl | http://www.library.uu.nl/medew/it/eric | Middelburg 30-9-2010