Perplexity uses stealth crawlers to evade website no-crawl directives

View profile for mahesh Ramichetty

Enterprise Architecture | DEVSECOPS | Technology Leadership| 6X AWS | 4X Azure| Digtal transformation| Lowcode-NoCode|gRPC|APISec Certified|Process Mining|Data First|Hyper Orchestration|Convergence of Data and GenAI

Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives stealth crawling behavior from Perplexity, an AI-powered answer engine. Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences. We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their source ASNs to hide their crawling activity, as well as ignoring — or sometimes failing to even fetch — robots.txt files. this is exactly changing the identity from the declared crawlers and get in to the websites, this is the time to put thoughts over even if the sensitive data is taken out. And with comet browser has perplexity got a wild card to the Stealth crawlers. https://lnkd.in/gEm8KBxS

To view or add a comment, sign in

Explore topics