The document presents a two-stage framework named Smart Crawler designed for efficient harvesting of deep-web interfaces. It employs a site-based searching approach to prioritize relevant websites and utilizes adaptive link-ranking for in-site searching, achieving higher harvest rates compared to existing crawlers. Experimental results demonstrate the crawler's agility, accuracy, and effectiveness in retrieving deep-web resources from large-scale sites.
Related topics: