As others have mentioned the problem is that of scale. Perhaps there needs to be a rate limit (times they ping a site) set within robots.txt that a site bot can come but only X times per hour etc. At least we move from a binary scrape or no scrape to a spectrum then.