Web pages are served through HTTP and viewed in browsers. Crawlers fetch pages to build indexes. They start from seed URLs and recursively fetch new URLs found on pages. Crawlers face challenges like overlapping delays, avoiding duplicates, handling redirects, and preventing crawler traps. Large-scale crawlers employ techniques like multi-threading, non-blocking sockets, distributed storage, and estimating change rates to efficiently crawl billions of pages across the web.