SlideShare a Scribd company logo
Web Crawler
●   Each search engine uses
    a crawler and spider.
●   A web crawler is a
    computer program that
    browses the WWW in a
    methodical.
●   A web spider is a kind of
    web crawler.
●   This process is called
    Web crawling or
    spidering.
●   Image source :
    http://www.codeproject.com/KB/IP/Crawler.aspx
Spider
 A spider is a program that crawls the Internet in
      a specific way for a specific purpose.
    Spiders are the basis for modern search
    engines, such as Google and AltaVista.
 These spiders automatically retrieve data from
the Web and pass it on to other applications that
 index the contents of the Web site for the best
                set of search terms.
 Source : http://www.ibm.com/developerworks/linux/library/l-spider/
Information Indexing
 Documents from an                  Indexing
                                    Software
                                                  Index
agent, are indexed by   Agents
an indexing software.                  Extract
                                       words or
                                      something    Database

                                 Documents

● Information is putted into a certain database
● There are many different types of indexing

● The kind of index built how the information will

be displayed.
Searching and Visiting

If you visit web pages related your searching
 keywords, you type those in a web page.



A particular search engine allow you to use
      several keywords for searching.
Searching

An engine searched Your keyword from the
database.
Results are returned by HTML document.
There are some additional information.
Visiting


If you are interested in a title of the result
page, you click the link and go to directly.
Search engines or databases do not store
the documents of the indexed sites.

More Related Content

PPTX
Introduction to Microdata & Google Rich Snippets
PDF
C. Concept Mapping (Week # 3 - 7)
PDF
Wp10
PDF
How Internet Search Engines Work
PPT
How Google WOrks?
PDF
Week10web Poster
PPTX
Building an unstructured data management solution with elastic search and ama...
Introduction to Microdata & Google Rich Snippets
C. Concept Mapping (Week # 3 - 7)
Wp10
How Internet Search Engines Work
How Google WOrks?
Week10web Poster
Building an unstructured data management solution with elastic search and ama...

What's hot (13)

DOCX
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
PDF
DomainTools Fingerprinting Threat Actors with Web Assets
PPT
gRSShopper
ODP
How To Build your own Custom Search Engine
PDF
Salesforce connect
PDF
Indexing repositories: Pitfalls & best practices
PPTX
Recovered file 1
PPTX
Elastisearch ur own local google
PPTX
Winning SEO Using Schema Markup and Structured Data
PDF
presentation-week10
PPTX
PPT
WebCrawler
PPTX
Building Windows Phone Database App Using MVVM Pattern
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
DomainTools Fingerprinting Threat Actors with Web Assets
gRSShopper
How To Build your own Custom Search Engine
Salesforce connect
Indexing repositories: Pitfalls & best practices
Recovered file 1
Elastisearch ur own local google
Winning SEO Using Schema Markup and Structured Data
presentation-week10
WebCrawler
Building Windows Phone Database App Using MVVM Pattern
Ad

Similar to Week10 Web Presentation (20)

PDF
Week12presentation
PDF
Week12presentation
PDF
Week10
PPTX
Search engines
PPTX
Search engine
PPTX
Working of web browser.pptx
PDF
Effective Searching Policies for Web Crawler
PPTX
Search Engine
DOC
Notes for
PDF
Slide 6 24
PDF
PDF
Week10
PDF
Week10
PPT
Working Of Search Engine
DOC
Seo Manual
PDF
Presentation10
PPTX
Internet search-tools
PPTX
search engines
PDF
Search engine and web crawler
Week12presentation
Week12presentation
Week10
Search engines
Search engine
Working of web browser.pptx
Effective Searching Policies for Web Crawler
Search Engine
Notes for
Slide 6 24
Week10
Week10
Working Of Search Engine
Seo Manual
Presentation10
Internet search-tools
search engines
Search engine and web crawler
Ad

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Big Data Technologies - Introduction.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Big Data Technologies - Introduction.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Monthly Chronicles - July 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Building Integrated photovoltaic BIPV_UPV.pdf
Review of recent advances in non-invasive hemoglobin estimation
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf

Week10 Web Presentation

  • 1. Web Crawler ● Each search engine uses a crawler and spider. ● A web crawler is a computer program that browses the WWW in a methodical. ● A web spider is a kind of web crawler. ● This process is called Web crawling or spidering. ● Image source : http://www.codeproject.com/KB/IP/Crawler.aspx
  • 2. Spider A spider is a program that crawls the Internet in a specific way for a specific purpose. Spiders are the basis for modern search engines, such as Google and AltaVista. These spiders automatically retrieve data from the Web and pass it on to other applications that index the contents of the Web site for the best set of search terms. Source : http://www.ibm.com/developerworks/linux/library/l-spider/
  • 3. Information Indexing Documents from an Indexing Software Index agent, are indexed by Agents an indexing software. Extract words or something Database Documents ● Information is putted into a certain database ● There are many different types of indexing ● The kind of index built how the information will be displayed.
  • 4. Searching and Visiting If you visit web pages related your searching keywords, you type those in a web page. A particular search engine allow you to use several keywords for searching.
  • 5. Searching An engine searched Your keyword from the database. Results are returned by HTML document. There are some additional information.
  • 6. Visiting If you are interested in a title of the result page, you click the link and go to directly. Search engines or databases do not store the documents of the indexed sites.