In theory retrieving a page on behalf of a user would be acceptable, but these a...

zarzavat · 2025-08-04T18:42:05 1754332925

If you allow Googlebot to crawl your website and train Gemini, but you don't allow smaller AI companies to do the same thing, then you're contributing to Google's hegemony. Given that AI is likely to be an increasingly important part of society in the future, that kind of discrimination is anti-social. I don't want a future where everything is run by Google even more than it currently is.

Crawling is legal. Training is presumably legal. Long may the little guys do both.

dgreensp · 2025-08-04T19:27:32 1754335652

Googlebot respects robots.txt. And Google doesn't use the fetched data from users of Chrome to supplement their search index (as a2128 is speculating that Perplexity might do when they fetch pages on the user's behalf).

foota · 2025-08-04T20:35:35 1754339735

Yes, but there's no way to say "allow indexing for search, but not for AI use", right?

warkdarrior · 2025-08-04T21:13:32 1754342012

But there is: https://developers.google.com/search/docs/crawling-indexing/...

There is an user agent for search that you can control in robots.txt.

    user-agent: Googlebot

There is another user agent for AI training.

    user-agent: Google-Extended

foota · 2025-08-05T06:51:41 1754376701

Wow, I had no idea this page existed, thanks for the reference!