In theory retrieving a page on behalf of a user would be acceptable, but these are AI companies who have disregarded all norms surrounding copyright, etc. It would be stupid of them not to also save contents of the page and use it for future AI training or further crawling
If you allow Googlebot to crawl your website and train Gemini, but you don't allow smaller AI companies to do the same thing, then you're contributing to Google's hegemony. Given that AI is likely to be an increasingly important part of society in the future, that kind of discrimination is anti-social. I don't want a future where everything is run by Google even more than it currently is.
Crawling is legal. Training is presumably legal. Long may the little guys do both.
Googlebot respects robots.txt. And Google doesn't use the fetched data from users of Chrome to supplement their search index (as a2128 is speculating that Perplexity might do when they fetch pages on the user's behalf).