Cloudflare accuses Perplexity of dodging anti-scraping rules
AI startup Perplexity has been accused of ignoring website restrictions and scraping content from sites that explicitly indicated they do not wish to be scraped, according to a new report by internet infrastructure provider Cloudflare.
On Monday, Cloudflare released research stating that it had observed Perplexity evading website blocks and concealing its crawling and scraping operations. The company claims Perplexity attempted to hide its identity while scraping web pages “in an attempt to circumvent the website’s preferences,” as outlined by Cloudflare’s researchers.
AI tools like those developed by Perplexity rely heavily on massive volumes of data harvested from the web. It has become common for AI companies to extract text, images, and video from online sources—often without consent—to power their models. In response, many websites have implemented measures like the Robots.txt file, a standard protocol that tells search engines and AI scrapers which content can or cannot be indexed. However, the effectiveness of these measures has varied.
Cloudflare asserts that Perplexity has been deliberately bypassing these restrictions by altering its bot’s "user agent"—a line of text identifying the visitor’s browser and device—and rotating through various autonomous system numbers (ASN), which help identify networks on the internet.
“This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of machine learning and network signals,” Cloudflare said in its blog post.
Perplexity spokesperson Jesse Dwyer dismissed the allegations, characterizing Cloudflare’s report as a “sales pitch.” Dwyer also said that the screenshots in the post “show that no content was accessed.” He further asserted that the bot named in Cloudflare’s research “isn’t even ours.”
Cloudflare said it began investigating the issue after receiving complaints from customers that Perplexity was still scraping their websites, even after specific blocks had been put in place via the Robots.txt file and known bot filters. Tests conducted by Cloudflare confirmed the behavior. “We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked,” the company reported.
In response, Cloudflare has removed Perplexity’s bots from its verified list and implemented new techniques to block their activity.
Cloudflare has been vocal recently about the impact of AI scrapers. In July, the company launched a marketplace allowing website owners to charge AI scrapers for accessing their content. CEO Matthew Prince warned that AI poses a serious threat to the internet’s economic model, especially for publishers. Cloudflare also released a free tool last year designed to prevent bots from scraping content for AI training.
This is not the first time Perplexity has faced similar criticism. In 2024, media outlets including Wired accused the company of plagiarizing their content without authorization.
Follow me on X in order to fight your AI FOMO.