it's because those SEO bots keep crawling over and over, which perplexity does not seem to do (considering that the URLS are user-requested). Those are different cases and robots.txt is only about the former. Cloudflare in this case is not doing "ddos protection" because i presume Perplexity does not constantly refetch or crawl or ddos the website (If perplexity does those things then they are guilty)
I wonder if cloudflare users explicitly have to allow google or if it's pre-allowed for them when setting up cloudflare.
Despite what Cloudflare wants us to think here, the web was always meant to be an open information network , and spam protection should not fundamentally change that characteristic.
I believe that AI crawlers are the main thing that is currently blocked by default when you enroll a new site. No traditional crawlers are blocked, it's not that the big incumbents are allow-listed. And I think that clearly marked "user request" agents like ChatGPT-User are not blocked by default.
But at end of day it's up to the site operator, and any server or reverse proxy provides an easy way to block well-behaved bots that use a consistent user-agent.
https://www.robotstxt.org/faq/what.html
I wonder if cloudflare users explicitly have to allow google or if it's pre-allowed for them when setting up cloudflare.
Despite what Cloudflare wants us to think here, the web was always meant to be an open information network , and spam protection should not fundamentally change that characteristic.