SlideShare a Scribd company logo
H O W S E A R C H E N G I N E S S E A R C H
Week 12
Spiders and Algorithms
 Search engines perform two technical tasks:
 Search and Structure
 search for new sites and add them to their
databases
 structure searches for users of the search engine
Searching for New Sites
 Search engines search somewhat like ‘spiders’ for
new sites to add
 They crawl the web finding pages for inclusion by
following links from pages already in their database:
 http://www.lib.berkeley.edu/TeachingLib/Guides/Internet
/SearchEngines.html
 http://computer.howstuffworks.com/internet/basics/searc
h-engine1.htm
See picture of how it works:
 http://computer.howstuffworks.com/internet/basics/search-engine1.htm
Structuring Searches for Users
 All search engines use a search algorithm to
structure searches for users.
 In computer science, a search algorithm is an
algorithm for “finding an item with specified
properties among a collection of items”.
*http://en.wikipedia.org/wiki/Search_algorithm
Search Algorithms are Proprietary
 Most search engines keep their search algorithms
secret, or proprietary [that means it is a corporate
property, and they keep it partially secret.
 All search engines feature different [or at least
slightly different] search algorithms
 All search engines use some form of their own
search algorithm
Google Search Algorithm
 Let's look at the most well know search
engine and how it searches
 Google: their search algorithm operates according
to a basic principle of relevance ranking
 results are ranked according to an algorithm they call
PageRank [name is patented by Google]
 See link for basic explanation of origins of Google search
*http://en.wikipedia.org/wiki/Google
A picture of Google’s Search Algorithm:
Google’s Search Algorithm
 See link for explanation:
 From http://en.wikipedia.org/wiki/PageRank
 Pretty Mathematical! We won’t go into all that.
Algorithms in Simple Terms
 However, the algorithm [as most search engine
algorithms] can be broken down into basic concepts of 1)
popularity, 2 ) density, and 3) keywords:
 site popularity [how many other users search the site]
 site density [how many other sites link to it]
 Keywords
 Keywords are still key [no pun intended] and how they intersect with
the first two
 These are considered in:
 ranking a site
 including it in your search results.
Updating Search Algorithms
 If that weren’t enough, search
engines regularly update their
search algorithms
http://www.webmarketingpros.com/blog/h
ow-to-recover-from-the-google-penguin-
update/
http://blog.junta42.com/2011/04/4-
steps-to-make-googles-panda-update-
work-for-you/
Google released two
updates in the past few
years, termed ‘Panda’
and ‘Penguin’
[similar to updates to
PC or MAC operating
system, down to the
catchy names]
Updating Search Algorithms
 These updates were designed to catch and eliminate from
searches ‘low quality sites’ [those with little content, ad-heavy
or replicating other pages]
 http://www.business2community.com/seo/animalistic-algorithms-googles-
panda-and-penguin-shakeups-0270910
 http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-
in.html
Technical Stuff
 This is the ‘background’ information on how search
engines search
 This ‘technical stuff’ is not information we need to
activate to use a search engine
 [i.e. we don’t need to explicitly construct search
algorithms or know about spiders]
 However, it can help us in thinking of how to
approach constructing searches
Web and Database Searching
 Refer to p. 67, textbook, for discussion of controlled
vocabulary:
 All databases [this includes online catalogues and
subscription databases like Ebsco] include
controlled vocabulary
Controlled
Vocabulary – LC
Subject
Headings
Databases and online
catalogues [Ebsco, our
RHC catalogue for
books as well as others]
• Use controlled
• vocabulary
• Allow us to
• narrow by
• subjects
Web and
Database
Searching
The Internet features
no controlled
vocabulary
– i.e. no subject
headings or agreed-
upon subjects in
databases
We can, however:
• Search specific
• fields
• Eliminate or
• specify terms or
• related terms

More Related Content

PDF
Week12 key concepts_additionalsearchengines
PPTX
Search Engine
PPTX
working of search engine & SEO
PDF
WT - Web & Working of Search Engine
PPTX
Search engine
PPTX
Working of search engine
PPTX
Search Engine working, Crawlers working, Search Engine mechanism
PPTX
Alternative searchengines ocallaghan
Week12 key concepts_additionalsearchengines
Search Engine
working of search engine & SEO
WT - Web & Working of Search Engine
Search engine
Working of search engine
Search Engine working, Crawlers working, Search Engine mechanism
Alternative searchengines ocallaghan

What's hot (13)

PDF
PPTX
How google search engine work
PPTX
Presentation by Imaaz
PPTX
Web search Technologies
PPTX
Alt search engines_ocallaghan
PPTX
Inside google search - how it works??
DOC
Notes for
PPT
how google works
PPTX
More alternative search engines
PPTX
Introduction to APIs
PPT
Information retrieval
PPTX
How search engine works and history of search engine
PPT
Metaserachfinal1
How google search engine work
Presentation by Imaaz
Web search Technologies
Alt search engines_ocallaghan
Inside google search - how it works??
Notes for
how google works
More alternative search engines
Introduction to APIs
Information retrieval
How search engine works and history of search engine
Metaserachfinal1
Ad

Viewers also liked (13)

DOCX
DOCX
organisation policy
PDF
Semana 6 -_libro__direccion_de_marketing_pag._257_-_265
PDF
Gjald lezing 1 november 2016 karmelieten klooster drachten
PPTX
Farmacia general
PPTX
Capacitacion en tics
DOC
CV_Sayani_Updated
PDF
Discover how to use Micro-learning Techniques to create Mini-Learning Format ...
PDF
Article Website Design
PDF
"PAC Learning - a discussion on the original paper by Valiant" presentation @...
PPTX
Pre anestesia
PPTX
Endocrinologia del embarazo
PDF
Mascarade
organisation policy
Semana 6 -_libro__direccion_de_marketing_pag._257_-_265
Gjald lezing 1 november 2016 karmelieten klooster drachten
Farmacia general
Capacitacion en tics
CV_Sayani_Updated
Discover how to use Micro-learning Techniques to create Mini-Learning Format ...
Article Website Design
"PAC Learning - a discussion on the original paper by Valiant" presentation @...
Pre anestesia
Endocrinologia del embarazo
Mascarade
Ad

Similar to Week 12 how searchenginessearch (20)

PPTX
Week 9 10 ppt-how_searchworks
PPTX
Search Engine
PPTX
SEARCH ENGINE BY SAIKIRAN PANJALA
PPTX
Search Engine Optimization - Fundamentals - SEO
PPTX
Google Algorithms presentation
PPTX
Search engines by Gulshan K Maheshwari(QAU)
PDF
Search Engine Marketing | Top Search Engines | Search Engines List
PDF
Session5
PPTX
Search engine
PPTX
Search engine
PPTX
Introduction to internet.
PPTX
SEARCH ENGINE ALOGORITHMS.pptx
PPTX
Presentationjava
PPTX
Search Engines
PPTX
Search engine optimization
PPTX
Google algorithm updates
PPTX
Algorithms that changed the future
PPT
Search engines
DOCX
How Internet Search Engines Work
PDF
Search Engines Other than Google
Week 9 10 ppt-how_searchworks
Search Engine
SEARCH ENGINE BY SAIKIRAN PANJALA
Search Engine Optimization - Fundamentals - SEO
Google Algorithms presentation
Search engines by Gulshan K Maheshwari(QAU)
Search Engine Marketing | Top Search Engines | Search Engines List
Session5
Search engine
Search engine
Introduction to internet.
SEARCH ENGINE ALOGORITHMS.pptx
Presentationjava
Search Engines
Search engine optimization
Google algorithm updates
Algorithms that changed the future
Search engines
How Internet Search Engines Work
Search Engines Other than Google

More from carolyn oldham (20)

PDF
Autocitingfromdatabases
PPTX
Introductiontocataloganddatabases
PPTX
Citation formats
PPTX
Evaluation of sources
PPTX
Searching library databases
PPTX
Searching google
PDF
Searchthecatalog 151023001814-lva1-app6892
PPTX
Searching google
PPTX
Finding information
PPTX
Searching the library catalog
PPTX
Introduction to catalog and databases
PPTX
Information need and thesis
PDF
Alternate access to databases (3)
PPTX
Typesof information
PDF
Introduction to catalog and databases
PPTX
Week4databases
PDF
Rhclibrarytourspring2017 170213233614
PDF
Extra credit
PDF
Research presentation
PPTX
Brainstormingconceptmaps
Autocitingfromdatabases
Introductiontocataloganddatabases
Citation formats
Evaluation of sources
Searching library databases
Searching google
Searchthecatalog 151023001814-lva1-app6892
Searching google
Finding information
Searching the library catalog
Introduction to catalog and databases
Information need and thesis
Alternate access to databases (3)
Typesof information
Introduction to catalog and databases
Week4databases
Rhclibrarytourspring2017 170213233614
Extra credit
Research presentation
Brainstormingconceptmaps

Recently uploaded (20)

PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Computing-Curriculum for Schools in Ghana
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
master seminar digital applications in india
PPTX
Cell Types and Its function , kingdom of life
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Lesson notes of climatology university.
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
human mycosis Human fungal infections are called human mycosis..pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
O7-L3 Supply Chain Operations - ICLT Program
Microbial disease of the cardiovascular and lymphatic systems
Computing-Curriculum for Schools in Ghana
STATICS OF THE RIGID BODIES Hibbelers.pdf
Complications of Minimal Access Surgery at WLH
Anesthesia in Laparoscopic Surgery in India
Microbial diseases, their pathogenesis and prophylaxis
master seminar digital applications in india
Cell Types and Its function , kingdom of life
2.FourierTransform-ShortQuestionswithAnswers.pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
VCE English Exam - Section C Student Revision Booklet
102 student loan defaulters named and shamed – Is someone you know on the list?
Final Presentation General Medicine 03-08-2024.pptx
Lesson notes of climatology university.

Week 12 how searchenginessearch

  • 1. H O W S E A R C H E N G I N E S S E A R C H Week 12
  • 2. Spiders and Algorithms  Search engines perform two technical tasks:  Search and Structure  search for new sites and add them to their databases  structure searches for users of the search engine
  • 3. Searching for New Sites  Search engines search somewhat like ‘spiders’ for new sites to add  They crawl the web finding pages for inclusion by following links from pages already in their database:  http://www.lib.berkeley.edu/TeachingLib/Guides/Internet /SearchEngines.html  http://computer.howstuffworks.com/internet/basics/searc h-engine1.htm
  • 4. See picture of how it works:  http://computer.howstuffworks.com/internet/basics/search-engine1.htm
  • 5. Structuring Searches for Users  All search engines use a search algorithm to structure searches for users.  In computer science, a search algorithm is an algorithm for “finding an item with specified properties among a collection of items”. *http://en.wikipedia.org/wiki/Search_algorithm
  • 6. Search Algorithms are Proprietary  Most search engines keep their search algorithms secret, or proprietary [that means it is a corporate property, and they keep it partially secret.  All search engines feature different [or at least slightly different] search algorithms  All search engines use some form of their own search algorithm
  • 7. Google Search Algorithm  Let's look at the most well know search engine and how it searches  Google: their search algorithm operates according to a basic principle of relevance ranking  results are ranked according to an algorithm they call PageRank [name is patented by Google]  See link for basic explanation of origins of Google search *http://en.wikipedia.org/wiki/Google
  • 8. A picture of Google’s Search Algorithm:
  • 9. Google’s Search Algorithm  See link for explanation:  From http://en.wikipedia.org/wiki/PageRank  Pretty Mathematical! We won’t go into all that.
  • 10. Algorithms in Simple Terms  However, the algorithm [as most search engine algorithms] can be broken down into basic concepts of 1) popularity, 2 ) density, and 3) keywords:  site popularity [how many other users search the site]  site density [how many other sites link to it]  Keywords  Keywords are still key [no pun intended] and how they intersect with the first two  These are considered in:  ranking a site  including it in your search results.
  • 11. Updating Search Algorithms  If that weren’t enough, search engines regularly update their search algorithms http://www.webmarketingpros.com/blog/h ow-to-recover-from-the-google-penguin- update/ http://blog.junta42.com/2011/04/4- steps-to-make-googles-panda-update- work-for-you/ Google released two updates in the past few years, termed ‘Panda’ and ‘Penguin’ [similar to updates to PC or MAC operating system, down to the catchy names]
  • 12. Updating Search Algorithms  These updates were designed to catch and eliminate from searches ‘low quality sites’ [those with little content, ad-heavy or replicating other pages]  http://www.business2community.com/seo/animalistic-algorithms-googles- panda-and-penguin-shakeups-0270910  http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites- in.html
  • 13. Technical Stuff  This is the ‘background’ information on how search engines search  This ‘technical stuff’ is not information we need to activate to use a search engine  [i.e. we don’t need to explicitly construct search algorithms or know about spiders]
  • 14.  However, it can help us in thinking of how to approach constructing searches
  • 15. Web and Database Searching  Refer to p. 67, textbook, for discussion of controlled vocabulary:  All databases [this includes online catalogues and subscription databases like Ebsco] include controlled vocabulary
  • 16. Controlled Vocabulary – LC Subject Headings Databases and online catalogues [Ebsco, our RHC catalogue for books as well as others] • Use controlled • vocabulary • Allow us to • narrow by • subjects
  • 17. Web and Database Searching The Internet features no controlled vocabulary – i.e. no subject headings or agreed- upon subjects in databases We can, however: • Search specific • fields • Eliminate or • specify terms or • related terms