The document discusses web crawling and provides an overview of the process. It defines web crawling as the process of gathering web pages to index them and support search. The objective is to quickly gather useful pages and link structures. The presentation covers the basic operation of crawlers including using a seed set of URLs and frontier of URLs to crawl. It describes common modules in crawler architecture like URL filtering tests. It also discusses topics like politeness, distributed crawling, DNS resolution, and types of crawlers.
Related topics: