Cloudflare banning bad actors has at least made scraping more expensive, and changes the economics of it - more sophisticated deception is necessarily more expensive. If the cost is high enough to force entry, scrapers might be willing to pay for access.
But I can imagine more extreme measures. e.g. old web of trust style request signing[0]. I don’t see any easy way for scrapers to beat a functioning WOT system. We just don’t happen to have one of those yet.
> Cloudflare banning bad actors has at least made scraping more expensive, and changes the economics of it - more sophisticated deception is necessarily more expensive. If the cost is high enough to force entry, scrapers might be willing to pay for access.
I think this might actually point at the end state. Scraping bots will eventually get good enough to emulate a person well enough to be indistinguishable (are we there yet?). Then, content creators will have to price their content appropriately. Have a Patreon, for example, where articles are priced at the price where the creator is fine with having people take that content and add it to the model. This is essentially similar to studios pricing their content appropriately… for Netflix to buy it and broadcast it to many streaming users.
Then they will have the problem of making sure their business model is resistant to non-paying users. Netflix can’t stop me from pointing a camcorder at my TV while playing their movies, and distributing it out like that. But, somehow, that fact isn’t catastrophic to their business model for whatever reason, I guess.
Cloudflare can try to ban bad actors. I’m not sure if it is cloudflare, but as someone who usually browses without JavaScript enables I often bump into “maybe you are a bot” walls. I recognize that I’m weird for not running JavaScript, but eventually their filters will have the problem where the net that captures bots also captures normal people.
>Then they will have the problem of making sure their business model is resistant to non-paying users. Netflix can’t stop me from pointing a camcorder at my TV while playing their movies, and distributing it out like that. But, somehow, that fact isn’t catastrophic to their business model for whatever reason, I guess.
Interested to see some LLM-adverserial equivalent of MPAA dots![1]
Beating web of trust is actually pretty easy: pay people to trust you.
Yes, you can identify who got paid to sign a key and ban them. They will create another key, go to someone else, pretend to be someone not yet signed up for WoT (or pay them), and get their new key signed, and sign more keys for money.
So many people will agree to trust for money, and accountability will be so diffuse, that you won't be able to ban them all. Even you, a site operator, would accept enough money from OpenAI to sign their key, for a promise the key will only be used against your competitor's site.
It wouldn't take a lot to make a binary-or-so tree of fake identities, with exponential fanout, and get some people to trust random points in the tree, and use the end nodes to access your site.
Heck, we even have a similar problem right now with IP addresses, and not even with very long trust chains. You are "trusted" by your ISP, who is "trusted" by one of the RIRs or from another ISP. The RIRs trust each other and you trust your local RIR (or probably all of them). We can trace any IP to see who owns it. But is that useful, or is it pointless because all actors involved make money off it? You know, when we tried making IPs more identifying, all that happened is VPN companies sprang up to make money by leasing non-identifying IPs. And most VPN exits don't show up as owned by the VPN company, because they'd be too easy to identify as non-identifying. They pay hosting providers to use their IPs. Sometimes they even pay residential ISPs so you can't even go by hosting provider. The original Internet was a web of trust (represented by physical connectivity), but that's long gone.
It is inevitable, not because of some technological predestination but because if these services get hard-blocked and unable to perform their duties they will ship the agent as a web browser or browser add-on just like all the VSCode forks and then the requests will happen locally through the same pipe as the user's normal browser. It will be functionally indistinguishable from normal web traffic since it will be normal web traffic.
Cloudflare banning bad actors has at least made scraping more expensive, and changes the economics of it - more sophisticated deception is necessarily more expensive. If the cost is high enough to force entry, scrapers might be willing to pay for access.
But I can imagine more extreme measures. e.g. old web of trust style request signing[0]. I don’t see any easy way for scrapers to beat a functioning WOT system. We just don’t happen to have one of those yet.
0: https://en.m.wikipedia.org/wiki/Web_of_trust