The document proposes a novelty detection approach for web crawlers to minimize redundant documents retrieved. It summarizes the generic crawler methodology and introduces the proposed crawler methodology which uses semantic text summarization and similarity calculation based on n-gram fingerprinting to identify novel pages not already in the database. The implementation and results show that the proposed approach significantly reduces redundancy and memory requirements compared to a generic crawler.