> The bots of Big Tech, namely Google, Meta and Apple are of course exempt from ...

seydor · 2025-08-04T17:21:11 1754328071

it's because those SEO bots keep crawling over and over, which perplexity does not seem to do (considering that the URLS are user-requested). Those are different cases and robots.txt is only about the former. Cloudflare in this case is not doing "ddos protection" because i presume Perplexity does not constantly refetch or crawl or ddos the website (If perplexity does those things then they are guilty)

https://www.robotstxt.org/faq/what.html

I wonder if cloudflare users explicitly have to allow google or if it's pre-allowed for them when setting up cloudflare.

Despite what Cloudflare wants us to think here, the web was always meant to be an open information network , and spam protection should not fundamentally change that characteristic.

benregenspan · 2025-08-04T17:48:03 1754329683

I believe that AI crawlers are the main thing that is currently blocked by default when you enroll a new site. No traditional crawlers are blocked, it's not that the big incumbents are allow-listed. And I think that clearly marked "user request" agents like ChatGPT-User are not blocked by default.

But at end of day it's up to the site operator, and any server or reverse proxy provides an easy way to block well-behaved bots that use a consistent user-agent.

akagusu · 2025-08-04T18:22:22 1754331742

> The Big Tech bots provide proven value to most sites.

They provide valeu for their companies. If you get some value from them it's just a side effect.

benregenspan · 2025-08-04T21:12:31 1754341951

It goes without saying that they are profit-oriented. The point is that they historically offered a clear trade: let us crawl you, and we will refer traffic to you. An AI crawler does not provide clear value back. An AI user request agent might or might not provide enough clear value back for sites to want to participate. (Same goes for the search incumbents if they go all-in on LLM search results and don't refer much traffic back).