SlideShare a Scribd company logo
UB Utrecht                  HvA-MIC                     GO Opleidingen




      searching the internet
better with Google / Google not always best


                      Eric Sieverts
                         @sieverts

                                               CODARTS, 04-03-2013
agenda


    • searching the web
    • smart searching
    • google options
    • beyond google
    • beyond general web search


         for all links see: http://sieverts.pbworks.com/codarts


2
the general
                        agenda               importance
    web                                       of specific
    ?=?                                        material
 everything                                     types?
              general             specific
               web                material
              search              search       how to …
how to …




                          when
                          & why
an ever changing google landscape




            •   unreliable numbers
            •   irreproducible results
            •   disappearing functions
            •   changing interfaces

4
5
building block approach

    systematic searching in structured information systems (like JStor etc.)
      start analytically with so-called building block approach
      e.g.: subject "modern american composers"
         – it breaks up in 3 facets
         – collect keywords for each facet
         – combine keywords with OR and AND operators

              modern              american             composers
       modern                american              composer
       contemporary          america               composers
       20th century OR       usa           OR      songwriters    OR
       twentieth century     united states         …
       …                     …

6
                          AND                   AND
building block approach

             modern             american           composers
        modern              american           composer
        contemporary        america            composers
        20th century OR     usa           OR   songwriters   OR
        twentieth century   united states      …
        …                   …

                        AND                AND
    it makes a query:
    (modern OR contemporary OR "twentieth century" OR "20th
    century")
       AND (america OR american OR usa OR "united states")
       AND (composer OR composers OR songwriter OR songwriters)
7
building block approach

    also with Google ?
    web search engines are not specifically designed for such structured
    queries, but it is possible to do


    Google and Yahoo make it even easier, since you may omit parentheses
    and the AND-operator (since it is default) :
                                                                 implied
                                                                  AND

    modern OR contemporary OR "twentieth century" OR "20th century" america
    OR american OR usa OR "united states" composer OR composers OR
    songwriter OR songwriters
                                       implied
                                        AND

8
relevance ranking (1)

    Google (and other web search engines) are primarily
    focused on presenting search results in order of relevance
    how do they know what is relevant?
     – they interpret the importance of words for the subject matter of
       the retrieved documents
       (your search terms present in title, url, headings, ... ?)
         • you can enhance importance of a certain term for your
           query by repeating that word a couple of times
     – they estimate the importance of the relation between words in
       the retrieved documents: whether ..
        • your search words occur close together
        • your search words occur in same order as you entered them
9
          >> formulate your query like you expect it formulated
word order matters
relevance ranking (2)

     Google (and other web search engines) are primarily
     focused on presenting search results in order of relevance
     how do they know what is relevant?
      – importance or quality of retrieved web pages is deduced from
        the number and the importance of links from other sites
        (for each site a pagerank is calculated)
      – importance of retrieved web pages for your personal interest is
        deduced on basis of your previous search and browse behaviour,
        which is monitored whenever you're logged in

     since every search engine uses somewhat different algorithms for its
     relevance calculations (and their coverage is different as well) there
     tends to be little overlap between top 10 results form different engines
11
Searching the internet - better with Google / Google not always best
search terms

     use of proper search terms is crucial for search success
     think of :
      –   singular / plural , verbs / nouns / adjectives , conjugations , ...
      –   spelling variations (behavior / behaviour)
      –   compound terms (writer / songwriter)
      –   synonyms, acronyms (compact disc / compact disk / cd / digital disc)

     how would the answer to my question be formulated in a
     relevant document? "think as if being a document"
      –   the right terms
      –   as an "exact phrase" or in most probable word order
      –   use wildcard for variable words ("modern * * composers")
      –   use known examples from a list to be found
      –   use of popular <> scientific terms etc.
13
refining searches

 if results are too broad, too diverse
  – add another essential term or set of terms to your query
  – see what your search engine suggests
    while you enter your query




   – exclude unwanted term with NOT (francis bacon NOT philosopher)
     NB: Google does not understand NOT ; use minus-sign instead:
14                                     francis bacon -philosopher
nice interactive infographic "how search works"
     http://www.google.com/insidesearch/howsearchworks/thestory/
15
is Google outsmarting us ?
     Google tries to improve and to broaden your queries
     •   automatic spelling corrections (veilgheid >> veiligheid)
     •   automatic search for words with same word stem (singular/plural,
         verb, conjugation, inflection, …)
     •   expands acronyms (jfk >> john f kennedy | wwii >> world war II)
     •   adds some synonyms (vaccination >> immunization)
     •   transforms separate words to compound term & vice versa
         (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food)
     •   may leave out term as optional if not differentiating enough

     more often what/when or notEnglish than in Dutch
     never sure and elaborate in
     • personalisation based on previous search behaviour

     but what, if you don't like all of this ........
16
                                                            >> "verbatim"
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
d
    searche
   only    literally
                     t
   f or t he exac
                   u
      w ords yo
        entered

  on google.nl:
"woord voor woord"
some more "how to"


     • domain search: site:edu OR site:edu.* [for all edu (sub)domains]
                          site:shell.com OR site:philips.com
     • url search:        inurl:novelty
     • title search:      intitle:catalytic

                     just
     • filetype search: filetype:pdf
                          filetype:xls OR filetype:xlsx
                          filetype:doc OR filetype:docx
                                                            more than shown in
                                                             advanced search
                                                             drop-down menu
                          filetype:rss
     • exact search:      "greenhouses“       [or VERBATIM for all words]



20
advanced search

     Google is hiding its advanced search screen :
     you must perform a simple search
     first, to get the "cog wheel"




21
some more "how to"

     some of this can be done from the advanced search screen
     but regular search box offers greater flexibility,
     once you know the syntax
     • domain search: [in combination with real search terms]
                         site:codarts.nl
                         site:edu OR site:edu.* [for all edu (sub)domains]
                         site:last.fm OR site:spotify.com
     • url search:       inurl:course
     • title search:     intitle:guitar



22
some more "how to" (2)

     • filetype search:    filetype:pdf
                           filetype:xls OR filetype:xlsx     more types than shown
                                                              in advanced search
                           filetype:doc OR filetype:docx
                                                                drop-down menu
                           filetype:rss
     • numeric search: 10..20              [includes all values in between]
                           $10..$20        [not for other currencies]
     • punctuation:        &, %, dot, ...          [can be searched]
                           €, /, ", comma, ...     [is ignored]
     • exact search:       "greenhouses“         [or VERBATIM for all words]
     • synonym search: ~guitar
     • time limitations:   [after search, hidden in top menu]

23
synonym
 search
date
limitations
26
who searches for “Bach” is probably more interested
       in data about him, than in websites about him; and
       most probably in "J.S." instead of one of his relatives




Google's "Knowledge Graph"
knows 500 million objects
with 3,5 billion properties and
even more mutual relations
(but only in English)
it also interprets the intention of your query (sometimes ;-)




28
Searching the internet - better with Google / Google not always best
general
         search engines besides google
 • Bing         microsoft, large
 • Yahoo!       content=Bing, large
 • Blekko       uses hashtags to search more [domain-] selective
                also many predefined hashtags; e.g. /likes for Facebook
 • DuckDuckGo assures privacy, no personalisation, no filter-bubble,
                rather small, !Bang-function offers many extras
 • Gigablast    green search engine, rather small, some unique functions
 • Exalead      french, many advanced functions, primarily demo system
 • Millionshort leaves out results from most popular sites → the long tail
 • WolframAlpha knowledge engine, facts, calculations
 together, these others have 30% market share in US; in NL only 3%
 •   Yandex        in Russia more popular than Google
 •   Baidu         in China more popular than Google
 •   Naver, Daum   in South Korea more popular than Google
 •   Seznam        in Czechia more popular than Google
30
material type specific search
     science   google scholar, microsoft academic, scirus,
               oaister, scientific commons, science.gov
     reference wikipedia, quora, wolfram|alpha, answers.com
     news     google news, yahoo news, bing news, cnn, bbc
     old news way-back-machine, historische kranten KB
     images google image, yahoo image, bing image, flickr,
                tineye (ip-check), panoramio (geo-search)
     video      google video, youtube, youtube edu channel,
                bing video, blinkx, voxalead-news
     tweets     twitter search, topsy, postpost, snapbird
     social     socialsearcher, socialmention, whostalkin, kurrently
     forums     google groups, omgili, boardtracker
     blogs      google blogs, icerocket, [rss] CTRLQ, RSS SearchHub
31
scientific search

     books
       –   Google Books (full text search)
       –   Hathitrust Digital Library (open book scan project / part of G-books)
       –   Librarything (catalog of 58.000.000 books from 1.000.000 owners)
       –   GoodReads (reviews, recommandation, friends, ...)
       –   Open Textbook Catalog (open access leerboeken)

     journal articles
       –   licensed databases (like JStor, ...)
       –   Google Scholar (articles, dissertations, reports, ...)
       –   sEURch / UvA-library ("discovery" systems of EUR / UvA)
       –   Scirus / SciVerse (journal articles -Elsevier- , database content, webpages)
       –   Magportal (also -English- popular magazines)
       –   DeepDyve (scientific articles "for rent" - for 24 hours)

32
Google Books

     •   all pages scanned and full-text searchable
     •   important to discover specific subjects/terms - not primary book topic
     •   often limitations on display and browsability
         (no preview / snippet view / limited preview / full preview)
     •   content from publishers and large libraries
     •   problems with viewing copyrighted material also from libraries
     •   build your personal ‘My Library’
     •   NL-books not only from Gent University (and soon KB), also from
         US/UK
     •   also some ‘magazines’
     •   metadata on about-this-book-page


33
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
Google Scholar

     •   > 100 million scientific publications (most articles)
     •   differences between availability (and hence searchability) of
         full-text (majority), bibliographic-only, and citation data
     •   competitor of Web of Science, Scopus, Scirus, ...
     •   indexing many selected -even licensed- sources (publishers,
         abstract-databases, university sites, institutional repositories, ...)
     •   includes numbers of citations! [and links to them]
     •   number of citations important factor for relevance ranking
         (!! reason why recent publications get low rankings)
     •   advanced search limited, many mistakes in metadata (authors etc.)
     •   accessibility of full-text often a problem because of licences
     •   often many versions of same article (including sometimes free ones)
     •   coupling with library subscriptions to allow smoother linking
     •   no info about sources, updates etc.
37
open access




            if this article is interesting,
            these 23 more recent ones probably also




  ## of
citations
                                                      subscription
                                                      univ. utrecht
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
facts and reference

     encyclopedias
       – wikipedia
       – internet movie database
       – ...
     Q&A (human powered)
       – Quora
       – Yahoo-answers
     direct answers, facts and calculations
       – Wolfram|Alpha
     dictionaries, translations
       –   answers.com (metasearch)
       –   Roget thesaurus
       –   Bartleby
       –   Google Translate
       –   Google Translated search           >
       –   Synoniemen.net (dutch)
41
wikipedia

     •   >250 languages
     •   “wisdom of the crowds” ?=? “wisdom” for all topics?
     •   quite good for “factual” topics
     •   many detailed specific topics (>20 million lemmas, >1 million NL)
     •   there are policies & guidelines
         & management: stewards, administrators
     •   for searching the wikipedia use Google rather than internal search
         limit to:               site:wikipedia.org
         gives more complete results
         and searches directly in all language versions together




42
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
google's
"translated search"
is now almost hidden
translates original query
(here in english)
into chosen languages
and translates results
back into english
... and pages selected
from the result list are
translated in English too
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
old stuff : web & news

     •   web archive
          – "way-back machine": old versions of websites, back to 1996
            access thru the -original- url, NO search
            internal site links will mostly work
          – also other archived materials (a.o. music)
     •   historical Dutch newspapers
           – historische kranten KB (1618-1995 ; full-text search)
     •   historical international newspapers
           – British newspapers 1800-1900
           – historic American newspapers
           – international overview



50
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
… and the very oldest one from february 1998:




53
twitter & social search

     twitter search (often limited to messages from past 1 - 2 weeks only)
           – twitter (also advanced search)
           – topsy (best one at the moment, also older messages)
           – postpost (search your own timeline - everything you're following)
           – snapbird (search thru all tweets of particular person -
                        you have to know twittername)
     real time / social search
           – socialsearcher (facebook | twitter | g+ : side by side)
           – socialmention (also weblogs)
           – samepoint, whostalkin, kurrently, … (also weblogs)
     forum discussions
         – omgili, boardtracker, ...
         – Google groups

54
55
56
57
58
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
multimedia search / images

     mostly search by keywords
       – Google-image (simple image recognition)
       – Yahoo-image (also pictures from Flickr)
       – Bing-image
       – Flickr (photo upload-site; search on user tags;
                      filter on “Creative Commons” material)
       – photographs on twitter (twicsy, picfog, topsy, skylines.io, …)
       – special sites (beeldbank nationaal archief, wikimedia commons, ...)

     special techniques:
       – geographical (panoramio [google-maps], worldc.am [instagram], ...)
       – Google (search by example)
       – Tineye (search for -almost- exact copies; a.o. copyright infringed?)

62
63
image search

     Content based image retrieval (CBIR)
     •   search on colors
          – examples: Tineye, Chromatik, Picitup, Google, ...




64
image search

 Content based image retrieval
 • search by example

     – draw it yourself
       Retrievr, ...

     – existing image
       Google (visually similar)
       Tineye (almost exact copies)
       Retrievr, ...
       example found on the web or
       uploaded from your own computer



65
Searching the internet - better with Google / Google not always best
example




67
google looks for most probable
keywords to describe this image
and in the search box combines
them already with the image




           ... and how about these
           "visually similar images" ?
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
photoshopped
advertisement,
but what's the
  original ?
Searching the internet - better with Google / Google not always best
Searching the internet - better with Google / Google not always best
multimedia search / video
     (mostly) uploaded material
      – YouTube (growth: 70 hours/minute ; also many "how to" video's)
        also: YouTube-channels / YouTube-education / YouTube-teachers /
        YouTube-movies / YouTube-shows / …
      – Vimeo

     (mostly) broadcasted material
      – Blinkx (35 million hours video, speech recognition?)
      – VoxaleadNews (speech recognition in several languages - also NL!
        hence "full-text" search on spoken words)
      – Bing-video (not easy to find from European home page)
      – Google-video (also videos from YouTube; metadata search only)
      – Dutch TV-programs:
          • Uitzending gemist (limited search functionality)
          • Beeld & Geluid (metadata search; use “uitgebreid zoeken”)
          • Academia (selection from Beeld & Geluid for higher education)
74
Searching the internet - better with Google / Google not always best
?
the end
     any questions?




77

More Related Content

PDF
Beyond Google: Advanced Search
PDF
WTF is Semantic Web?
PPTX
Beyond Google: Advanced Internet Search Tips and Tricks
PPTX
Library 2012 presentation
PPT
OLLI Workshop : Beyond The Basics of Google Searching April 2009
DOC
Advanced search made easy
PDF
when the link makes sense
PPT
8th grade research
Beyond Google: Advanced Search
WTF is Semantic Web?
Beyond Google: Advanced Internet Search Tips and Tricks
Library 2012 presentation
OLLI Workshop : Beyond The Basics of Google Searching April 2009
Advanced search made easy
when the link makes sense
8th grade research

What's hot (7)

PPTX
Name That Graph !
PDF
An introduction to Semantic Web and Linked Data
PPT
Pick n Mix: Choosing the right research tool
PPT
Web Search Alert 2006
PDF
"Whatever I can get..."
PPT
Research 2 0
PPT
Queen Mary MA Performance Induction
Name That Graph !
An introduction to Semantic Web and Linked Data
Pick n Mix: Choosing the right research tool
Web Search Alert 2006
"Whatever I can get..."
Research 2 0
Queen Mary MA Performance Induction
Ad

Viewers also liked (11)

PPT
Models of Information Searching
PPTX
CT231: Research & search skills
PPT
The 8-Fold Path to Web Searching Power
PPT
Blossom591 interactivepresentation
PPTX
20110521 eightfold path and meditation2
 
PPT
Information Searching Skills
PPTX
Information Search Skills
PDF
Searching the Web of Data (Tutorial)
PPT
Gathering information and Scanning the environment
PPT
Searching techniques
PPTX
Effective web search techniques
Models of Information Searching
CT231: Research & search skills
The 8-Fold Path to Web Searching Power
Blossom591 interactivepresentation
20110521 eightfold path and meditation2
 
Information Searching Skills
Information Search Skills
Searching the Web of Data (Tutorial)
Gathering information and Scanning the environment
Searching techniques
Effective web search techniques
Ad

Similar to Searching the internet - better with Google / Google not always best (20)

PPTX
PPT
Searching the internet - what patent searchers should know
DOC
Searching techniques
PPTX
dulces
PPT
Tifle.Week1
PPTX
Search Google Like a Pro
PDF
GoogleSmart
PPTX
Information update november
PPTX
Google search techniques
PPTX
Google search techniques
PPT
Mpl brownbag sept2011
PPT
2011 simple-webinar_searchsecrets_trv_l_145_final
PPT
Advanced Search Basics
PDF
Google rules for_searching
DOC
Searching teacherdemo tips
PDF
Digital Literacy: Learning How to Search and Evaluate Information
DOC
1 01 Notes Internet Search Tools T
PPT
Searching the Internet
PDF
Crib Search
PDF
Crib Search
Searching the internet - what patent searchers should know
Searching techniques
dulces
Tifle.Week1
Search Google Like a Pro
GoogleSmart
Information update november
Google search techniques
Google search techniques
Mpl brownbag sept2011
2011 simple-webinar_searchsecrets_trv_l_145_final
Advanced Search Basics
Google rules for_searching
Searching teacherdemo tips
Digital Literacy: Learning How to Search and Evaluate Information
1 01 Notes Internet Search Tools T
Searching the Internet
Crib Search
Crib Search

More from Eric Sieverts (20)

PPTX
Automatische classificatie
PPTX
Een andere blik op Google
PPT
Searching the internet - what patent searchers should know
PPT
Wij zullen vinden - ook in 2023
PPTX
Zoekmachines weten het antwoord
PPTX
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
PDF
Semantisch zoeken in een webomgeving
PPT
Information Retrieval: van specialisme tot commodity
PPT
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
PPT
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
PPT
Zin en onzin van metadata
PPT
40 jaar informatiegebruik
PPT
UBU 3.0: semantisch web & linked data voor de UB?
PPT
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
PPT
Searchtrends
PPT
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
PPT
Een digitale bibliotheek of alleen Google?
PPT
Project Panorama: vistas on validated information
PPT
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
PPT
Vinden dankzij / ondanks metadata
Automatische classificatie
Een andere blik op Google
Searching the internet - what patent searchers should know
Wij zullen vinden - ook in 2023
Zoekmachines weten het antwoord
Vertrouwen op semantische zoeksystemen of zelf aan het stuur
Semantisch zoeken in een webomgeving
Information Retrieval: van specialisme tot commodity
Semantisch Zoeken - knowledge graph, semantisch web, linked data, rdf, ontolo...
Semantisch zoeken - over knowledge graph, semantisch web, rdf enz.
Zin en onzin van metadata
40 jaar informatiegebruik
UBU 3.0: semantisch web & linked data voor de UB?
Metadata, standaarden, interoperabiliteit, semantisch web en linked data
Searchtrends
A pair of shoes in the thesaurus; some reflexions on human and computer indexing
Een digitale bibliotheek of alleen Google?
Project Panorama: vistas on validated information
Lifehacking met RSS en Netvibes? De strijd tegen informatie overload
Vinden dankzij / ondanks metadata

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Complications of Minimal Access Surgery at WLH
PDF
Computing-Curriculum for Schools in Ghana
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Institutional Correction lecture only . . .
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
VCE English Exam - Section C Student Revision Booklet
Complications of Minimal Access Surgery at WLH
Computing-Curriculum for Schools in Ghana
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Microbial diseases, their pathogenesis and prophylaxis
O7-L3 Supply Chain Operations - ICLT Program
O5-L3 Freight Transport Ops (International) V1.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Anesthesia in Laparoscopic Surgery in India
Basic Mud Logging Guide for educational purpose
Institutional Correction lecture only . . .
GDM (1) (1).pptx small presentation for students
Pharmacology of Heart Failure /Pharmacotherapy of CHF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPH.pptx obstetrics and gynecology in nursing
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
3rd Neelam Sanjeevareddy Memorial Lecture.pdf

Searching the internet - better with Google / Google not always best

  • 1. UB Utrecht HvA-MIC GO Opleidingen searching the internet better with Google / Google not always best Eric Sieverts @sieverts CODARTS, 04-03-2013
  • 2. agenda • searching the web • smart searching • google options • beyond google • beyond general web search for all links see: http://sieverts.pbworks.com/codarts 2
  • 3. the general agenda importance web of specific ?=? material everything types? general specific web material search search how to … how to … when & why
  • 4. an ever changing google landscape • unreliable numbers • irreproducible results • disappearing functions • changing interfaces 4
  • 5. 5
  • 6. building block approach systematic searching in structured information systems (like JStor etc.) start analytically with so-called building block approach e.g.: subject "modern american composers" – it breaks up in 3 facets – collect keywords for each facet – combine keywords with OR and AND operators modern american composers modern american composer contemporary america composers 20th century OR usa OR songwriters OR twentieth century united states … … … 6 AND AND
  • 7. building block approach modern american composers modern american composer contemporary america composers 20th century OR usa OR songwriters OR twentieth century united states … … … AND AND it makes a query: (modern OR contemporary OR "twentieth century" OR "20th century") AND (america OR american OR usa OR "united states") AND (composer OR composers OR songwriter OR songwriters) 7
  • 8. building block approach also with Google ? web search engines are not specifically designed for such structured queries, but it is possible to do Google and Yahoo make it even easier, since you may omit parentheses and the AND-operator (since it is default) : implied AND modern OR contemporary OR "twentieth century" OR "20th century" america OR american OR usa OR "united states" composer OR composers OR songwriter OR songwriters implied AND 8
  • 9. relevance ranking (1) Google (and other web search engines) are primarily focused on presenting search results in order of relevance how do they know what is relevant? – they interpret the importance of words for the subject matter of the retrieved documents (your search terms present in title, url, headings, ... ?) • you can enhance importance of a certain term for your query by repeating that word a couple of times – they estimate the importance of the relation between words in the retrieved documents: whether .. • your search words occur close together • your search words occur in same order as you entered them 9 >> formulate your query like you expect it formulated
  • 11. relevance ranking (2) Google (and other web search engines) are primarily focused on presenting search results in order of relevance how do they know what is relevant? – importance or quality of retrieved web pages is deduced from the number and the importance of links from other sites (for each site a pagerank is calculated) – importance of retrieved web pages for your personal interest is deduced on basis of your previous search and browse behaviour, which is monitored whenever you're logged in since every search engine uses somewhat different algorithms for its relevance calculations (and their coverage is different as well) there tends to be little overlap between top 10 results form different engines 11
  • 13. search terms use of proper search terms is crucial for search success think of : – singular / plural , verbs / nouns / adjectives , conjugations , ... – spelling variations (behavior / behaviour) – compound terms (writer / songwriter) – synonyms, acronyms (compact disc / compact disk / cd / digital disc) how would the answer to my question be formulated in a relevant document? "think as if being a document" – the right terms – as an "exact phrase" or in most probable word order – use wildcard for variable words ("modern * * composers") – use known examples from a list to be found – use of popular <> scientific terms etc. 13
  • 14. refining searches if results are too broad, too diverse – add another essential term or set of terms to your query – see what your search engine suggests while you enter your query – exclude unwanted term with NOT (francis bacon NOT philosopher) NB: Google does not understand NOT ; use minus-sign instead: 14 francis bacon -philosopher
  • 15. nice interactive infographic "how search works" http://www.google.com/insidesearch/howsearchworks/thestory/ 15
  • 16. is Google outsmarting us ? Google tries to improve and to broaden your queries • automatic spelling corrections (veilgheid >> veiligheid) • automatic search for words with same word stem (singular/plural, verb, conjugation, inflection, …) • expands acronyms (jfk >> john f kennedy | wwii >> world war II) • adds some synonyms (vaccination >> immunization) • transforms separate words to compound term & vice versa (veiligheid maatregel >> veiligheidsmaatregel | catfood >> cat food) • may leave out term as optional if not differentiating enough more often what/when or notEnglish than in Dutch never sure and elaborate in • personalisation based on previous search behaviour but what, if you don't like all of this ........ 16 >> "verbatim"
  • 19. d searche only literally t f or t he exac u w ords yo entered on google.nl: "woord voor woord"
  • 20. some more "how to" • domain search: site:edu OR site:edu.* [for all edu (sub)domains] site:shell.com OR site:philips.com • url search: inurl:novelty • title search: intitle:catalytic just • filetype search: filetype:pdf filetype:xls OR filetype:xlsx filetype:doc OR filetype:docx more than shown in advanced search drop-down menu filetype:rss • exact search: "greenhouses“ [or VERBATIM for all words] 20
  • 21. advanced search Google is hiding its advanced search screen : you must perform a simple search first, to get the "cog wheel" 21
  • 22. some more "how to" some of this can be done from the advanced search screen but regular search box offers greater flexibility, once you know the syntax • domain search: [in combination with real search terms] site:codarts.nl site:edu OR site:edu.* [for all edu (sub)domains] site:last.fm OR site:spotify.com • url search: inurl:course • title search: intitle:guitar 22
  • 23. some more "how to" (2) • filetype search: filetype:pdf filetype:xls OR filetype:xlsx more types than shown in advanced search filetype:doc OR filetype:docx drop-down menu filetype:rss • numeric search: 10..20 [includes all values in between] $10..$20 [not for other currencies] • punctuation: &, %, dot, ... [can be searched] €, /, ", comma, ... [is ignored] • exact search: "greenhouses“ [or VERBATIM for all words] • synonym search: ~guitar • time limitations: [after search, hidden in top menu] 23
  • 26. 26
  • 27. who searches for “Bach” is probably more interested in data about him, than in websites about him; and most probably in "J.S." instead of one of his relatives Google's "Knowledge Graph" knows 500 million objects with 3,5 billion properties and even more mutual relations (but only in English)
  • 28. it also interprets the intention of your query (sometimes ;-) 28
  • 30. general search engines besides google • Bing microsoft, large • Yahoo! content=Bing, large • Blekko uses hashtags to search more [domain-] selective also many predefined hashtags; e.g. /likes for Facebook • DuckDuckGo assures privacy, no personalisation, no filter-bubble, rather small, !Bang-function offers many extras • Gigablast green search engine, rather small, some unique functions • Exalead french, many advanced functions, primarily demo system • Millionshort leaves out results from most popular sites → the long tail • WolframAlpha knowledge engine, facts, calculations together, these others have 30% market share in US; in NL only 3% • Yandex in Russia more popular than Google • Baidu in China more popular than Google • Naver, Daum in South Korea more popular than Google • Seznam in Czechia more popular than Google 30
  • 31. material type specific search science google scholar, microsoft academic, scirus, oaister, scientific commons, science.gov reference wikipedia, quora, wolfram|alpha, answers.com news google news, yahoo news, bing news, cnn, bbc old news way-back-machine, historische kranten KB images google image, yahoo image, bing image, flickr, tineye (ip-check), panoramio (geo-search) video google video, youtube, youtube edu channel, bing video, blinkx, voxalead-news tweets twitter search, topsy, postpost, snapbird social socialsearcher, socialmention, whostalkin, kurrently forums google groups, omgili, boardtracker blogs google blogs, icerocket, [rss] CTRLQ, RSS SearchHub 31
  • 32. scientific search books – Google Books (full text search) – Hathitrust Digital Library (open book scan project / part of G-books) – Librarything (catalog of 58.000.000 books from 1.000.000 owners) – GoodReads (reviews, recommandation, friends, ...) – Open Textbook Catalog (open access leerboeken) journal articles – licensed databases (like JStor, ...) – Google Scholar (articles, dissertations, reports, ...) – sEURch / UvA-library ("discovery" systems of EUR / UvA) – Scirus / SciVerse (journal articles -Elsevier- , database content, webpages) – Magportal (also -English- popular magazines) – DeepDyve (scientific articles "for rent" - for 24 hours) 32
  • 33. Google Books • all pages scanned and full-text searchable • important to discover specific subjects/terms - not primary book topic • often limitations on display and browsability (no preview / snippet view / limited preview / full preview) • content from publishers and large libraries • problems with viewing copyrighted material also from libraries • build your personal ‘My Library’ • NL-books not only from Gent University (and soon KB), also from US/UK • also some ‘magazines’ • metadata on about-this-book-page 33
  • 37. Google Scholar • > 100 million scientific publications (most articles) • differences between availability (and hence searchability) of full-text (majority), bibliographic-only, and citation data • competitor of Web of Science, Scopus, Scirus, ... • indexing many selected -even licensed- sources (publishers, abstract-databases, university sites, institutional repositories, ...) • includes numbers of citations! [and links to them] • number of citations important factor for relevance ranking (!! reason why recent publications get low rankings) • advanced search limited, many mistakes in metadata (authors etc.) • accessibility of full-text often a problem because of licences • often many versions of same article (including sometimes free ones) • coupling with library subscriptions to allow smoother linking • no info about sources, updates etc. 37
  • 38. open access if this article is interesting, these 23 more recent ones probably also ## of citations subscription univ. utrecht
  • 41. facts and reference encyclopedias – wikipedia – internet movie database – ... Q&A (human powered) – Quora – Yahoo-answers direct answers, facts and calculations – Wolfram|Alpha dictionaries, translations – answers.com (metasearch) – Roget thesaurus – Bartleby – Google Translate – Google Translated search > – Synoniemen.net (dutch) 41
  • 42. wikipedia • >250 languages • “wisdom of the crowds” ?=? “wisdom” for all topics? • quite good for “factual” topics • many detailed specific topics (>20 million lemmas, >1 million NL) • there are policies & guidelines & management: stewards, administrators • for searching the wikipedia use Google rather than internal search limit to: site:wikipedia.org gives more complete results and searches directly in all language versions together 42
  • 46. translates original query (here in english) into chosen languages and translates results back into english
  • 47. ... and pages selected from the result list are translated in English too
  • 50. old stuff : web & news • web archive – "way-back machine": old versions of websites, back to 1996 access thru the -original- url, NO search internal site links will mostly work – also other archived materials (a.o. music) • historical Dutch newspapers – historische kranten KB (1618-1995 ; full-text search) • historical international newspapers – British newspapers 1800-1900 – historic American newspapers – international overview 50
  • 53. … and the very oldest one from february 1998: 53
  • 54. twitter & social search twitter search (often limited to messages from past 1 - 2 weeks only) – twitter (also advanced search) – topsy (best one at the moment, also older messages) – postpost (search your own timeline - everything you're following) – snapbird (search thru all tweets of particular person - you have to know twittername) real time / social search – socialsearcher (facebook | twitter | g+ : side by side) – socialmention (also weblogs) – samepoint, whostalkin, kurrently, … (also weblogs) forum discussions – omgili, boardtracker, ... – Google groups 54
  • 55. 55
  • 56. 56
  • 57. 57
  • 58. 58
  • 62. multimedia search / images mostly search by keywords – Google-image (simple image recognition) – Yahoo-image (also pictures from Flickr) – Bing-image – Flickr (photo upload-site; search on user tags; filter on “Creative Commons” material) – photographs on twitter (twicsy, picfog, topsy, skylines.io, …) – special sites (beeldbank nationaal archief, wikimedia commons, ...) special techniques: – geographical (panoramio [google-maps], worldc.am [instagram], ...) – Google (search by example) – Tineye (search for -almost- exact copies; a.o. copyright infringed?) 62
  • 63. 63
  • 64. image search Content based image retrieval (CBIR) • search on colors – examples: Tineye, Chromatik, Picitup, Google, ... 64
  • 65. image search Content based image retrieval • search by example – draw it yourself Retrievr, ... – existing image Google (visually similar) Tineye (almost exact copies) Retrievr, ... example found on the web or uploaded from your own computer 65
  • 68. google looks for most probable keywords to describe this image and in the search box combines them already with the image ... and how about these "visually similar images" ?
  • 74. multimedia search / video (mostly) uploaded material – YouTube (growth: 70 hours/minute ; also many "how to" video's) also: YouTube-channels / YouTube-education / YouTube-teachers / YouTube-movies / YouTube-shows / … – Vimeo (mostly) broadcasted material – Blinkx (35 million hours video, speech recognition?) – VoxaleadNews (speech recognition in several languages - also NL! hence "full-text" search on spoken words) – Bing-video (not easy to find from European home page) – Google-video (also videos from YouTube; metadata search only) – Dutch TV-programs: • Uitzending gemist (limited search functionality) • Beeld & Geluid (metadata search; use “uitgebreid zoeken”) • Academia (selection from Beeld & Geluid for higher education) 74
  • 76. ?
  • 77. the end any questions? 77

Editor's Notes

  • #7: Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • #8: Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • #9: Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • #10: Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • #12: Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • #14: Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie
  • #15: Opdracht zoekactie verfijnen tot er bij de eerste 50 geen niet-relevante meer zitten, lettend op deze punten; gebruiken thesaurus of Word-synoniemen; truncatie