But I can send my personal shopper and you'll be none the wiser.

Polizeiposaune · 2025-08-04T15:45:01 1754322301

To stretch the analogy to the breaking point: If you send 10,000 personal shoppers all at once to the same store just to check prices, the store's going to be rightfully annoyed that they aren't making sales because legit buyers can't get in.

hombre_fatal · 2025-08-04T16:05:16 1754323516

Your comment and the above comment of course show different cases.

An agent making a request on the explicit behalf of someone else is probably something most of us agree is reasonable. "What are the current stories on Hacker News?" -- the agent is just doing the same request to the same website that I would have done anyways.

But the sort of non-explicit just-in-case crawling that Perplexity might do for a general question where it crawls 4-6 sources isn't as easy to defend. "Are polar bears always white?" -- Now it's making requests I wouldn't have necessarily made, and it could even been seen as a sort of amplification attack.

That said, TFA's example is where they register secretexample.com and then ask Perplexity "what is secretexample.com about?" and Perplexity sends a request to answer the question, so that's an example of the first case, not the second.

bayindirh · 2025-08-04T16:11:18 1754323878

As a person who has a couple of sites out there, and witnesses AI crawlers coming and fetching pages from these sites, I have a question:

What prevents these companies from keeping a copy of that particular page, which I specifically disallowed for bot scraping, and feed it to their next training cycle?

Pinky promises? Ethics? Laws? Technical limitations? Leeroy Jenkins?

Aeolun · 2025-08-04T23:26:25 1754349985

> What prevents these companies from keeping a copy of that particular page, which I specifically disallowed for bot scraping, and feed it to their next training cycle?

What prevents anyone else? robots.txt is a request, not an access policy.

utbabya · 2025-08-05T01:17:28 1754356648

This honor system mostly worked at scale because interests align, which seems to be no longer the case.

Does information no longer wants to be free now? Maybe internet, just like social media was just a social experiment at the end, albeit a successful one. Thanks GenAI.

egypturnash · 2025-08-05T03:23:03 1754364183

“Information Wants To Be Free. Information also wants to be expensive. ...That tension will not go away.” - the full aphorism

https://en.wikipedia.org/wiki/Information_wants_to_be_free

windexh8er · 2025-08-05T03:06:07 1754363167

Can the Terms of Service of individual content creators leverage a "death of a thousand cuts" model to produce a legal honeypot which would require organizations like Perplexity to be bound up in 10s of thousands of conciliation court cases?

Big Tech has hidden behind ToS for years. Now, it seems as though it only works for them, but not against. It seems as though this would be easy to orchestrate and prove forcing these companies into a legal nightmare or risk insolvent business stature due to the high load of cases filed against.

Why couldn't something like this be used to flip the table? A conciliation brigading, of sorts.

Eisenstein · 2025-08-05T07:31:58 1754379118

Because lawyers are expensive and big tech companies have lots of them. Because it takes a ton of time and effort to sue someone. Because you need to show standing, which means you need to be able to demonstrate you lost something of value by their actions. Because the power imbalance is heavily weighted towards a corporation. Because the way to deal with such things should be legislation and not court decisions. And lots more reasons...

windexh8er · 2025-08-05T11:32:45 1754393565

That's exactly why I said conciliation court. None of what you've outlined is required nor is it expensive. But, for each case, the defendant is still required to show up.

I've successfully used conciliation court against large corporations in the past which is why I question it here.

And while this should be able to be handled via legislation it won't be. Beyond that a workaround could force that to happen.

Eisenstein · 2025-08-05T16:06:53 1754410013

> conciliation court

Sorry, I had never heard that term before. You would still have to show standing though. How would you try to prove that their violating your TOS cost you money?

windexh8er · 2025-08-06T14:50:00 1754491800

Is it not viable to produce a work of art and say that this is free for humans, but not for bots and cannot be used for training and said violation cost X?

Again, I can't copy and distribute a game Microsoft rents to me. But if I do I can be found held accountable for a ridiculous amount of money. If it's my work of art the terms can dictate who doesn't need to pay and who does. If an LLM is consuming my work of art and now distributing it within their user base how is that not the same?

Eisenstein · 2025-08-06T23:47:59 1754524079

These are arguments you would tell the judge. And the judge would almost certainly tell you 'this is the wrong venue for that. You are in small claims. I need an itemized list of monetary damages you have suffered before I can make a judgement.'

BrenBarn · 2025-08-06T06:03:44 1754460224

Maybe you could say the increase in traffic increased your hosting costs by a penny or whatever.

accrual · 2025-08-04T18:21:32 1754331692

Thanks for sharing your experience. A little off-topic but I'd like to start hosting some personal content, guides/tutorials, etc.

Do you still see authentic human traffic on your domains, is it easy to discern?

I feel like I missed the bus on running a blog pre-AI.

bayindirh · 2025-08-05T14:47:05 1754405225

I intentionally doesn't keep detailed analytics on my homepage server and my digital garden, because I respect my users and don't want to push unnecessary Javascript on them. The blog platform I use (Mataroa) keeps rudimentary analytics (essentially page hit counters, nothing more) on index, RSS and per post.

Both my blog homepage and posts see mostly human traffic. Sometimes bots crawl the site and they appear as spikes in the analytics.

Looks like my homepage which doesn't have anything but links is pretty popular with crawlers. My digital garden doesn't get much interest from them. All in all, human traffic on my sites are pretty much alive.

I don't believe in missing the bus in anything actually, because I don't write these for others, first. Both my blog (more meta) and digital garden (more technical) are written for myself primarily, and left open. I post links to both when it's appropriate, but they are not made to be popular. If people read it and learn something or solve one of their problems, that's enough for me.

This is why my software is GPLv3, Digital Garden is GFDL and blog is CC BY-NC-SA 2.0. This is why everything is running with absolutely minimum analytics and without any ads whatsoever.

Lastly, this is why I don't want AI crawlers in my site and my data in the models. This thing is made by a human for humans, absolutely for free. It's not OK somebody to sell something designed to be free and make money over it.

accrual · 2025-08-05T16:30:39 1754411439

> I intentionally doesn't keep detailed analytics on my homepage server and my digital garden, because I respect my users and don't want to push unnecessary Javascript on them.

Absolutely, I'm in agreement here. I want to run a JS-free blog, just plain old static HTML. I plan to use GoAccess to parse the access logs but that's it. I think I would find it encouraging to see real human traffic.

> I don't write these for others, first. Both my blog (more meta) and digital garden (more technical) are written for myself primarily, and left open.

That is a great way to view it, thank you.

bayindirh · 2025-08-05T17:07:32 1754413652

> That is a great way to view it, thank you.

You're welcome. I'm glad it helped.

> I want to run a JS-free blog, just plain old static HTML.

If you want to start fast until you find a template you want to work with, I can recommend Mataroa [0]. The blog have almost no JS (it binds a couple of keys for navigation, that's it), and it's $10/year. When you feel right in your self-hosted solution, you can move there. It's all Markdown at the end of the day.

> I plan to use GoAccess to parse the access logs but that's it.

That's the only thing I use, too. Nothing else.

If you want to look at what I do, how I do, and reach out to me, the rabbit hole starts from my profile, here.

Wish you all the best, and you may find bliss and joy you never dreamed of!

[0]: https://www.mataroa.blog

kldg · 2025-08-05T00:44:47 1754354687

if you do analytics, it is not so hard, but then you need to store user data (if not directly, then worse, with a third party), which should be viewed as a liability. I see ~2/3 human traffic, ~1/3 bot traffic (I just parse user agent strings and count whitelisted browsers as human), but my main landing page is all dynamic-populated webgl. I just asked Gemini what it sees on website, and it states "The page appears to be loading, with the text "Loading room data...".[1] There are also labels for "BG", "FG", and "CURSOR", and a background weather animation." -so I can be feel reasonably confident I don't need to worry about AI, for now; it needs a machine-friendly frontend.

you could go proper insanomode, too. remaking The Internet is trivial if you don't care about existing web standards -- replacing HTTP with your own TCP implementation, getting off html/js/css, etc. being greenfield, you can control the protocol, server, and client implementation, and put it in whatever language you want. I made a stateful Internet implementation in Python earlier for proof-of-concept, but I want to port it and expand on it in rust soon (just for fun; I don't do serious biznos). you'll very likely have 100% human traffic then, even if you're the only person curious and trusting enough to run your client.

71bw · 2025-08-05T06:36:51 1754375811

  > I made a stateful Internet implementation in Python earlier for proof-of-concept

Is there a repo or some other form of public access? I'd like to see this.

kldg · 2025-08-05T14:51:29 1754405489

it's not in a shareable state; is unsafe as-is. can share general idea and sample "webpage" files, though.

the server ("lodge") passes JSON to the client from what are called .branch files. the client receives JSON, parses it, then builds the UI and state representation from the JSON, then stored in that client's memory (self.current_doc and self.page_state in python client).

branches can invoke waterwheel (.ww) files hosted on the lodge. waterwheel files on the lodge contain scripts which define how patches (as JSON) are to be sent to the client. the client updates its state based on the JSON patch it receives. sample .branch and .ww from python implementation (in pastebin so to not make everyone have to scroll through this): https://pastebin.com/A0DEZDmR

71bw · 2025-08-06T06:43:19 1754462599

I was right to ask, this seems extremely cool. Hit me up via mail [in bio] if you ever end up polishing it enough to share.

1024core · 2025-08-05T01:55:01 1754358901

It's your server. You're free to do whatever you want. You can serve different versions of the page depending on the UserAgent (has been done many times before).

You can put up a paywall depending on UserAgent or OS (has been done).

In short, it's a 2-way street: the client on the other end of the TCP pipe makes a request, and your server fulfills the request as it sees fit.

tempfile · 2025-08-04T16:38:20 1754325500

The way to prevent people from downloading your pages and using them is to take them off the public internet. There are laws to prevent people from violating your copyright or from preventing access to your service (by excessive traffic). But there is (thankfully) no magical right that stops people from reading your content and describing it.

bayindirh · 2025-08-04T16:46:12 1754325972

Many site operators want people to access their content, but prevent AI companies from scraping their sites for training data. People who think like that made tools like Anubis, and it works.

I also want to keep this distinction on the sites I own. I also use licenses to signal that this site is not good to use for AI training, because it's CC BY-NC-SA-2.0.

So, I license my content appropriately (No derivative, Non-commercial, shareable with the same license with attribution), add technical countermeasures on top, because companies doesn't respect these licenses (because monies), and circumvent these mechanisms (because monies), and I'm the one to suck this up and shut-up (because their monies)?

Makes no sense whatsoever.

zzo38computer · 2025-08-05T02:43:38 1754361818

I don't want AI companies to scrape my sites (or use the files I wrote) for training data either, but that is not specifically what I am trying to stop (unless the files are supposed to be private and unpublished). I should not stop them from using the files for what they want, once they have them. (I also specifically do not want to block use of lynx, curl, Dillo, etc.)

What I want to stop is excessive crawling and scraping of my server. Once they have the file they can do what they want with it. Another comment (44786237) mentions that robots.txt is only for restricting recursive access; I agree and that is what should be blocked. They also should not access the same file several times quickly even though it should be unnecessary to do so, just as much as they should not access all of the files. (If someone wants to make a mirror of the files, there may be other ways, e.g. in case there is a archive file available to download many at once (possibly, in case if the site operator made their own index and then did it this way). If it is a git repository, then it can be cloned.)

tempfile · 2025-08-04T20:47:39 1754340459

Of course some people want that. And at the moment they can prevent it. But those methods may stop working. Will it then be alright to do it? Of course not, so why bother mentioning that they are able to prevent it now - just give a justification.

Your license is probably not relevant. I can go to the cinema and watch a movie, then come on this website and describe the whole plot. That isn't copyright infringement. Even if I told it to the whole world, it wouldn't be copyright infringement. Probably the movie seller would prefer it if I didn't tell anyone. Why should I care?

I actually agree that AI companies are generally bad and should be stopped - because they use an exorbitant amount of bandwidth and harm the services for other users. At least they should be heavily taxed. I don't even begrudge people for using Anubis, at least in some cases. But it is wrong-headed (and actually wrong in fact) to try to say someone may or may not use my content for some purpose because it hurts my feelings or it messes with my ad revenue. We have laws against copyright infringement, and to prevent service disruption. We should not have laws that say, yes you can read my site but no you can't use it to train an LLM, or to build a search index. That would be unethical. Call for a windfall tax if they piss you off so much.

bayindirh · 2025-08-05T15:09:37 1754406577

> I can go to the cinema and watch a movie, then come on this website and describe the whole plot. That isn't copyright infringement.

This is a false analogy. A correct one would be going to a 1000 movies and creating the 1001th movie with scenes cropped from these 1000 movies and assemble it as a new movie, and this is copyright infringement. I don't think any of the studios would applaud and support you for your creativity.

> But it is wrong-headed (and actually wrong in fact) to try to say someone may or may not use my content for some purpose because it hurts my feelings or it messes with my ad revenue.

Why does it have to be always about money? Personally it's not. I just don't want my work to be abused and sold to people to benefit a third party without my consent and will (and all my work is licensed appropriately for that).

> We should not have laws that say, yes you can read my site but no you can't use it to train an LLM, or to build a search index.

This goes both ways. If big corporations can scrape my material without asking me and resell it as an output of a model, I can equally distill their models further and sell it as my own. If companies can scrape my pages to sell my content as theirs, I can scrape theirs and unpaywall them.

But that will be copyright infringement, just because they have more money. What angers me is "all is fair game because you're a small fish, and this is a capitalist marketplace" mentality.

If companies can paywall their content to humans that don't pay, I can paywall AI companies and demand money or push them out of my lawn, just because I feel like that. The inverse is very unethical, but very capitalist, yes.

It's not always about money.

P.S.: Oh, try to claim that you can train a model with medical data without any clearance because it'd be unethical to have laws limiting this. It'll be fun. Believe me.

tempfile · 2025-08-05T17:17:28 1754414248

> This is a false analogy.

I think you are describing something much more like stable diffusion. This article is about Perplexity, which is much closer to "watch a movie and tell me the plot" than it is like "take these 1000 movies and make a collage". The copyright points are different - stable diffusion are on much shakier ground than perplexity.

> Why does it have to be always about money?

Before I mentioned money I said "because it hurts my feelings". I'm sorry I can't give a more charitable interpretation, but I really do see this kind of objection as "I don't want you to have access to this web page because I don't like LLMs". This is not a principled objection, it is just "I don't like you, go away". I don't think this is a good principle to build the web on.

Obviously you can make your website private, if you want, and that would be a shame. But you can't have this kind of pick-and-choose "public when you feel like" option. By the way I did not mention, but I am ok with people using Anubis and the like as a compromise while the situation remains unjust. But the justification is very important.

> If companies can scrape my pages to sell my content as theirs, I can scrape theirs and unpaywall them.

This is probably not a gambit you want to make. You literally can do this, and they would probably like it if you did. You don't want to do that, because the output of LLMs is usually not that good.

In fact, LLM companies should probably be taxed, and the taxes used to fund real human AI-free creations. This will probably not happen, but I am used to disappointment.

> P.S.: Oh, try to claim that you can train a model with medical data

Medical data is not public, for good reasons.

account42 · 2025-08-05T12:31:29 1754397089

> Many site operators want people to access their content, but prevent AI companies from scraping their sites for training data.

That is unfortunately not a distinction that is currently legally enforceable. Until that changes all other "solutions" are pointless and only cause more harm.

> People who think like that made tools like Anubis, and it works.

It works to get real humans like myself to stop visiting your site while scrapers will have people whose entire job is to work around such "protections". Just like traditional DRM inconveniences honest customers and not pirates. And to be clear, what you are advocating for is DRM.

> I also want to keep this distinction on the sites I own. I also use licenses to signal that this site is not good to use for AI training, because it's CC BY-NC-SA-2.0.

If AI crawlers cared about that we wouldn't be talking about this issue. A license and only give more permissions than there are without one.

bayindirh · 2025-08-05T14:58:58 1754405938

> It works to get real humans like myself to stop visiting your site

If we talk about Anubis, it's pretty invisible. You wait a couple of seconds in the first visit, and don't get challenged for a couple of weeks, at least. With more tuning some of the sites using Anubis work perfectly well without ever seeing Anubis' wall while stopping AI crawlers.

> And to be clear, what you are advocating for is DRM.

Yes. It's pretty ironic that someone like me who believes in open access prefers a DRM solution to keep companies abusing the small fish, but life is an interesting phenomenon, and these things happen.

> Until that changes all other "solutions" are pointless and only cause more harm.

As an addendum to above paragraph, I'm not happy that I have to insert draconian measures between the user and the information I want to share, but I need a way to signal that I'm not having their ways to these faceless things. What do you propose? Taking my sites offline? Burning myself in front of one of the HQs?

> If AI crawlers cared about that we wouldn't be talking about this issue. A license and only give more permissions than there are without one.

AI crawlers default to "Public Domain" when they find no licenses. Some of my lamest source code repositories made into "The Stack" because I forgot to add COPYING.md. A fork of a GPLv2 tool I wrote some patches also got into "The Stack", because COPYING.md was not in the root folder of the repository. I'd rather add licenses (which I can accept) to things rather than leave them as-is, because AI companies also eagerly grab things without license.

All licenses I use mandate attribution and continuation of license, at least, and my blog doesn't allow any derivations of from what I have written. So you can't ingest it into a model to be derived and remixed with something else.

account42 · 2025-08-05T15:08:29 1754406509

> If we talk about Anubis, it's pretty invisible. You wait a couple of seconds in the first visit, and don't get challenged for a couple of weeks, at least. With more tuning some of the sites using Anubis work perfectly well without ever seeing Anubis' wall while stopping AI crawlers.

It's not invisible, the sites using it don't work perfectly well for all users and it doesn't stop AI crawlers.

bayindirh · 2025-08-05T15:12:30 1754406750

I haven't seen any problems with any Anubis enabled site I encountered. Can you give examples? This is interesting.

fxtentacle · 2025-08-05T18:09:23 1754417363

I've never seen problems with Anubis.

hombre_fatal · 2025-08-04T19:31:55 1754335915

I guess that's a question that might be answered by the NYT vs OpenAI lawsuit at least on the enforceability of copyright claims if you're a corporation like NYT.

If you don't have the funds to sue an AI corp, I'd probably think of a plan B. Maybe poison the data for unauthenticated users. Or embrace the inevitability. Or see the bright side of getting embedded in models as if you're leaving your mark.

miki123211 · 2025-08-05T06:30:30 1754375430

the fact that it would be discovered almost immediately.

If you give them a URL that does not appear in Google, ask them to visit that URL specifically, and then notice the content from that URL in the training data, it's proof that they're doing this, which would be quite damaging to them.

Freak_NL · 2025-08-05T07:22:50 1754378570

> […] it's proof that they're doing this, which would be quite damaging to them.

Is it? It's damning, but is it damaging at all?

I'm not getting the impression that anyone's data being available for training if some bot can get to it is just how things are now, rather than an unsettled point of contention. There's too much money invested in this thing for any other outcome, and with the present decline of the rule of law…

autoexec · 2025-08-05T16:22:23 1754410943

Nothing, and that's why I expect they all do it.

tintor · 2025-08-04T23:05:54 1754348754

technical limitations / data poisoning measures

AuthAuth · 2025-08-04T23:35:25 1754350525

Hacker news wants you to vist the site, look at the main page, enter threads and participate in discussion.

When you swap in an AI and ask what are the current stories. The AI fetches the front page and every thread and feeds it back to you. You are less likely to participate in discussion because you've already had the info summarized.

jychang · 2025-08-05T00:35:28 1754354128

Who cares what Hacker News wants? You’re not obliged to participate in discussion.

Am I supposed to spend money on Amazon.com when I visit the website just because Amazon wants me to?

egypturnash · 2025-08-05T03:29:23 1754364563

If most people quit spending money on Amazon then Amazon stops being worth running.

If most people stop discussing things on HN, and the discussion is indeed one of the major reasons it’s kept running, then HN stops being worth running.

AuthAuth · 2025-08-05T04:34:05 1754368445

Whats the point of a human coming to a site if all the threads and empty and its front page is a glorified RSS feed for lazy peoples AI agents?

AnthonBerg · 2025-08-05T05:12:28 1754370748

Who cares what you want?

danlitt · 2025-08-05T07:15:04 1754378104

Most humans place the desires of human beings over the desires of companies.

lenkite · 2025-08-10T02:59:03 1754794743

Indeed. But that is a false equivalence - this is conflict of desires between small companies and creators and an AI-corp where the AI-corp wants to steal their content and give it to users with their shop branding.

noboostforyou · 2025-08-06T19:59:58 1754510398

> You’re not obliged to participate in discussion.

Are website owners obligated to serve content to AI agents and/or LLM scrapers?

butlike · 2025-08-06T13:05:54 1754485554

It was a corollary example

ithkuil · 2025-08-05T06:24:09 1754375049

Foo news wants you to visit the site, look at the main page, watch the ads, click on them and buy the products advertised by third parties which will give money to Foo news in exchange for this service.

And yet people install ad blockers and defend their freedom to not participate in this because they don't want to be annoyed by ads.

They claim that since they are free to not buy an advertised product, why would they be forced to see ads for it. But Foo news claims that they are also free to not waste bandwidth to serve their free website to people who declare (by using an ad blocker or the modern alternative: AI aummarizera) they won't participate in the funding of the service

skydhash · 2025-08-05T10:40:47 1754390447

It's not ads. We have ads in paper magazines and newspapers and no one went around with scissors to remove them. It's obnoxious ads, designed to violently grabs your attention and trackers (malware). It's like a newspapers giving your address to a whole crew of salemens that intrudes on your property at 3am and looking at you sleeping and installing cameras in your bathroom. All so that they can jump at you in the street to loudly claim they have the underwear you told your partner you like. If you're going to be that invasive about my person, then I'm going to be that forceful about restrictions.

imtringued · 2025-08-05T12:30:06 1754397006

This is one of the dumbest things about ad networks. Google has enough data about your watching habits on Youtube and their algorithm is basically as good as it gets in terms of showing you what you want to watch and getting you hooked on it, but the moment they show you ads, all that technical expertise appears to have vanished into thin air and all they show you is fake mobile ads?

People hate obnoxious ads because the money that pays for them is essentially a bribe to artificially elevate content above its deserved ranking. It feels like you're being manipulated into an unfavorable trade.

Timwi · 2025-08-07T02:48:24 1754534904

> their algorithm is basically as good as it gets in terms of showing you what you want to watch and getting you hooked on it

It is? Are we talking about the same YouTube? I get absolutely useless recommendations, I get un-hooked within a couple videos, and I even keep getting recommendations for the same videos I've literally watched yesterday. Who in the world gets hooked by this??

autoexec · 2025-08-05T16:14:48 1754410488

> We have ads in paper magazines and newspapers and no one went around with scissors to remove them.

I never saw people bother with scissors but I've seen people pulling the ads out of the newspaper countless times.

remus · 2025-08-05T13:34:58 1754400898

> And yet people install ad blockers and defend their freedom to not participate in this because they don't want to be annoyed by ads.

I think this is a pretty different scenario. Here the user and the news website are talking directly to each other, but then the user is making a choice around what to do with the content the news website send to them. With AI agents, there is a company inserting themselves between the user and the news website and acting as a middleman.

It seems reasonable to me that the news website might say they only want to deal with users and not middlemen.

ithkuil · 2025-08-05T13:56:48 1754402208

I understand; but as an excercise to better understand this problem I'll keep doing devil's advocate and I'll raise with:

What if my executive assistant reading the news website and giving me a digest?

Would the website owners rather prefer me doing my reading directly?

fxtentacle · 2025-08-05T18:04:47 1754417087

Yes. Because they want to own your attention and that only works if they are interfacing directly to you.

I remember that Samsung was at one time offering to play non-skippable full-screen apps on their newest 8K OLED TVs and their argument was precisely that these ads will reach those rich people who normally pay extra to avoid getting spammed with ads. Or going with your executive assistant example, there are situations where it makes sense to bribe them to get access to you and/or your data. E.g. "evil maid attack".

trhway · 2025-08-05T02:28:26 1754360906

With all the crypto development how come we haven't got to

  HTTP/1.1 402 Payment Required
  WWW-price: 0.0000001 BTC, 0.000001 ETH, 0.00001 DOGE

> You are less likely to participate in discussion

you (or AI on your behalf) paid instead. Many sites would probably like it better.

autoexec · 2025-08-05T16:20:51 1754410851

If people were forced to pay for websites by the http request people would demand that websites stop loading a ton of externally hosted JS, stop filling sites with ads, and would demand that websites actually have content worth the price.

There are so many links I click on these days that are such trash I'd be demanding refunds constantly.

trhway · 2025-08-05T18:20:47 1754418047

>There are so many links I click on these days that are such trash

That is why AI "summarization" becomes a necessary intermediate layer. You'd not see nor trash nor ads, and thus the payment instead of being exposed to the ads. AI saves the Internet :)

dns_snek · 2025-08-05T06:53:22 1754376802

It's not a development problem, it's an adoption problem. Publishers are desperate to sell us on a $20+/month subscription, they don't want to offer convenient affordable access to single articles.

skydhash · 2025-08-05T10:48:31 1754390911

$20/month would be nice if it wasn't a tier with less ads. I want no ads, and full-text rss feeds (because I want to use my clients to read). It's like how Netflix refuses to build a basic search and filter, or Spotify refuses to an actual library manager. They don't want you in control of your consumption.

cellis · 2025-08-05T00:41:06 1754354466

Easy "By Appointment only" or "rate limited to authenticated users" done.

p3rls · 2025-08-05T02:13:40 1754360020

That is not the breaking point at all of the analogy-- that literally happens to my custom CMS/wiki/image host I built for my niche, kpopping.com. We are constantly attacked by crawlers. Meanwhile google rewards wordpress slop that buys backlinks with #1 pageranks for years. Welcome to the internet.

sublinear · 2025-08-04T15:52:15 1754322735

Too bad. Build a bigger store or publish this information so we don't need 10,000 personal shoppers. Was this not the whole point of having a website? Who distorted that simple idea into the garbage websites we have now?

recursive · 2025-08-04T15:54:43 1754322883

Weird take. The store doesn't owe your personal shippers anything.

drdaeman · 2025-08-04T21:26:23 1754342783

That's fair, but if there's enough of supply and demand for this to get traction (and online shopping is bug, and autonomous agents are sort of trending), this conflict of interest paired with a no-compromise "we don't own you anything" attitude is bound to escalate in an arms race. And YMMV but I don't like where that race may possibly end.

If store businesses at least partially relies on obscurity of information that can be solved through automated means (e.g. storefronts tend to push visitors towards products they don't want, and buyer agents are fighting that and looking for something buyers instructed them) just playing this cat and mouse game of blocking agents, finding workarounds, and repeating the cycle is only creating perverse technological contraptions that neither party is really interested in - but both are circumstantially forced to invest into.

the_real_cher · 2025-08-04T16:08:29 1754323709

In the same token the personal shoppers don't owe the store anything either.

eddythompson80 · 2025-08-04T16:36:26 1754325386

Surely they owe them money for the goods and service, no? I thought that's how stores worked.

the_real_cher · 2025-08-04T18:12:55 1754331175

Context friend. This article and entire comments sections is about questionable web page access. Context.

eddythompson80 · 2025-08-04T18:51:34 1754333494

You're replying in a store metaphor thread though. Context matters.

recursive · 2025-08-04T17:04:15 1754327055

Then they can't complain if they're barred entry.

the_real_cher · 2025-08-04T18:22:03 1754331723

http is neutral. it's up to the client to ignore robots.txt

You can block IP's at the host level but there's pretty easy ways around that with proxy networks.

eddythompson80 · 2025-08-04T19:50:38 1754337038

> http is neutral.

Who misled you with that statement?

the_real_cher · 2025-08-04T21:29:51 1754342991

Http doesnt have emotions or thought last time I checked.

eddythompson80 · 2025-08-04T21:40:09 1754343609

It seems that a 403 makes you sad though.

the_real_cher · 2025-08-04T22:47:54 1754347674

iproyal.com makes me smile again

eddythompson80 · 2025-08-04T23:01:20 1754348480

And Cloudflare makes you cry. See, it's not neutral. Glad you learned something today. The more one learns everyday, the less stupid you become.

drdaeman · 2025-08-04T21:31:43 1754343103

IETF?

dabockster · 2025-08-04T16:51:46 1754326306

> Who distorted that simple idea into the garbage websites we have now?

Corporate America. Where clean code goes to die.

bradleyjg · 2025-08-04T15:28:58 1754321338

It’s possible to violate all sorts of social norms. Societies that celebrate people that do so are on the far opposite end of the spectrum from high trust ones. They are rather unpleasant.

ToucanLoucan · 2025-08-04T15:41:44 1754322104

Just the Silicon Valley ethos extended to it's logical conclusions. These companies take advantage of public space, utilities and goodwill at industrial scale to "move fast and break things" and then everyone else has to deal with the ensuing consequences. Like how cities are awash in those fucking electric scooters now.

Mind you I'm not saying electric scooters are a bad idea, I have one and I quite enjoy it. I'm saying we didn't need five fucking startups all competing to provide them at the lowest cost possible just for 2/3s of them to end up in fucking landfills when the VC funding ran out.

SoftTalker · 2025-08-04T16:18:40 1754324320

My city impounded them and made them pay a fee to get them back. Now they have to pay a fee every year to be able to operate. Win/win.

account42 · 2025-08-05T12:53:45 1754398425

Do those fees actually improve anything for the citizens who now have to deal with vehicles abandoned on sidewalks everywhere or does it just buy the major a nicer yacht?

pixl97 · 2025-08-04T20:14:38 1754338478

[flagged]

tomhow · 2025-08-05T14:00:23 1754402423

> Oh, this is a bunch of baloney...

Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

Please don't fulminate. Please don't sneer, including at the rest of the community.

Eschew flamebait. Avoid generic tangents. Omit internet tropes.

Please don't use Hacker News for political or ideological battle. It tramples curiosity.

https://news.ycombinator.com/newsguidelines.html

p3rls · 2025-08-05T02:19:10 1754360350

[flagged]

tomhow · 2025-08-05T14:02:15 1754402535

You can't comment like this on Hacker News, no matter what you're replying to. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

Workaccount2 · 2025-08-04T21:02:53 1754341373

[flagged]

_proofs · 2025-08-05T01:12:04 1754356324

this is such a wild comment -- there are countless products where regardless of purchase -- the user is still served advertisements. i have no idea what reality, or timeline, this comment belongs in.

broadcast television, paid streaming entertainment is just straight up the most glaringly obvious example of a paid service overflowing with advertisements.

paid radio broadcasts (xm/Sirius).

operating systems (windows serves you ads any chance it gets).

monthly subscriptions to gyms where youre constantly hit with ads, marketing, and promotions be it at the gym or via push notification (you got opted into and therefore have to opt out of intentionally after the service is paid).

mobile phones, especially prepaid come LOADED with ads and bloatware.

i mean the list goes on -- you cannot be serious.

Workaccount2 · 2025-08-05T04:15:01 1754367301

> pay for services in full directly

Those are hybrid subscriptions/subsidies. Not paid in full.

If you are being exposed to ads in something you paid for, you are almost certainly being charged less money. Companies can compete on cost by introducing ads, and it's why the cheaper you go, the more ad infested it gets.

Pure ad-free things tend to be much more expensive then their ad subsidized counterparts. Ad subsidized has become so ubiquitous though, that people think that price is the true price.

_proofs · 2025-08-05T16:54:50 1754412890

this seems like semantics and corporate hand-waving -- that's not what is conveyed to the user in what i have observed as the context of paid services and the promises asserted around what a purchase gets a customer.

in the subsidized example, xm/Sirius is marketed to users as an "ad-free paid radio broadcast"; the marketing literally attempts to leverage the notion of it being ad-free as a consequence of your purchase (power) in order to highlight its supposed competitive edge and usefulness, and provide the user an incentive to spend money, except for the fact that the marketing is false. you still get served promotions and ads, just less "conventional" ads.

i go to a football game and im literally inundated with ads -- the whole game has time stoppage dedicated to serving ads. i guess my season ticket purchase with the hopes of seeing football in person is.. apparently not spending enough money?

i see this as attempting to move the goalposts and gaslight users on their purchase expectations, as a way to offload the responsibility and accountability back onto the user -- "you don't pay enough, you only think that you pay enough, so we are still going to serve you ads because <insert financial justification here around the expectations we'e undermined>.

why then is there any expectation of a service being ad-free upon purchasing?

who the hell actually enjoys sitting through 1.5 hours of advertisements and play stoppage?

over time users have been conditioned to just tolerate it, and over time, the advertising reclaims ground it previously gave up one inch at a time in the same way people are price-gouged in those stadiums -- they don't have much alternative, but apparently the problem is the user should fork up more money for tickets so as to align their expectations with reality? while they're getting strong-armed at the concession stand via proximity and circumstance and lack of competition, no less.

are you really trying to tell me the problem there is, they need to make... more money? and THEN and only THEN we can have ad-free, paid for entertainment otherwise known as american football? is this really about user expectations, or is this about companies wanting their cake and eating it, too?

sublinear · 2025-08-04T15:57:03 1754323023

[flagged]

arrowsmith · 2025-08-04T16:20:09 1754324409

Go spend some time in Brazil or South Africa or other places where no-one trusts anyone (for good reasons), then report back.

bradleyjg · 2025-08-04T15:59:25 1754323165

A place where you can lose you wallet and get it back with all the cash inside.

The horror!!

sublinear · 2025-08-04T16:01:54 1754323314

[flagged]

arrowsmith · 2025-08-04T16:21:16 1754324476

No, you're describing a low-trust society.

Please learn what words mean before you comment on them.

sublinear · 2025-08-04T16:27:24 1754324844

[flagged]

Imustaskforhelp · 2025-08-04T16:40:49 1754325649

Isn't that the system that we are already living in?

Democracy in its american form or even at many others show almost complete paralysis of the entire system basically if bad actors infiltrate it (Looking at ya donald)

It is honestly a little sad since conservatives usually think of their society as this high trust society and they were the ones who primarily voted and are being taken advantaged of by the few untrustworthy individuals.

Politics is a cult/religion and you can't prove me otherwise.

I vote because I vote for lesser evil not for greater good. I do think that frankly, both the parties or just most parties in every nation are just so short of reality but I created a discord server of 100 people and I can see how I can't manage 100 people and so maybe I expect so much from the govt.

I used to focus so much on history and politics but its bloody mess and there is no good or bad. Now I just feel like going into the woods and into the darks living alone, maybe coding.

Workaccount2 · 2025-08-04T21:05:21 1754341521

Let me guess: A low violence society is bad because people get attacked and beat up?

sensanaty · 2025-08-04T17:30:28 1754328628

That's quite literally the opposite of what high trust means...

fireflash38 · 2025-08-04T16:16:48 1754324208

That's a very sad and lonely way to live.

sublinear · 2025-08-04T16:18:25 1754324305

I don't think we're talking about the same thing.

account42 · 2025-08-05T12:56:52 1754398612

Obviously. You should heed the advice of other posters who told you to look up the meaning of the word.

immibis · 2025-08-04T17:59:39 1754330379

[flagged]

Ray20 · 2025-08-04T19:58:16 1754337496

> High trust is prima facie incompatible with capitalism

Quite compatible

> If you want a high trust society, you don't want capitalism.

There is nothing at all in capitalism that would prevent a high level of trust in society.

> Capitalism is inherently low trust

But that's not true. The thing about capitalism is that it's RESILENT to low trust. It does not require low levels of trust, but is capable of functioning in such conditions.

> If the penalty for deceit was greater than the penalty for non-deceit

Who are the judges? Capitalism is the most resistant to deception, deceivers under capitalism receive fewer benefits than under any other economic system. Simply because capitalism is based on the premise that people cheat, act out of greed, try to get the most for themselves at the expense of others. These qualities exist in people regardless of the existence of capitalism, it is just that capitalism ensures prosperity in society even when people have these qualities.

immibis · 2025-08-04T21:25:07 1754342707

https://theonion.com/this-war-will-destabilize-the-entire-mi...

sublinear · 2025-08-04T18:22:57 1754331777

Why bring up capitalism? I don't get it. What's stopping people from lying and cheating under any other system?

dgshsg · 2025-08-04T19:37:40 1754336260

When lying and cheating doesn't get you ahead, there is no reason to do it.

Workaccount2 · 2025-08-04T21:20:56 1754342456

If we look at any communist society, the only way to get ahead was lying and cheating. China was forced to adopt capitalist markets to deal with this, hence why modern China hardly resembles the USSR, Cuba, Venezuela, or Laos.

immibis · 2025-08-05T11:19:33 1754392773

Communist with a capital C.

I've never seen a stateless, classless, moneyless society. It may be impossible.

ghurtado · 2025-08-04T21:13:33 1754342013

You seriously think that mankind wasn't lying and cheating long before inventing capitalism?

dgshsg · 2025-08-04T21:45:16 1754343916

Sure, but the risk/reward ratio was different.

Ray20 · 2025-08-04T20:04:17 1754337857

The problem is that without capitalism ONLY lying and cheating will get you ahead. Look at ANY country that builds its economy on the restriction of people's economic freedom, on the absence of private property rights - these are the most deceitful and disgusting regimes in the world with zero level of public trust.

rapind · 2025-08-04T15:33:05 1754321585

It's all about scale. The impact of your personal shopper is insignificant unless you manage to scale it up into a business where everyone has a personal shopper by default.

nickthegreek · 2025-08-04T15:59:06 1754323146

How is everyone having a personal shopper a problem of scale? I was going to shop myself, but I sent someone else to do it for me.

At this moment I am using Perplexity's Comet browser to take a spotify playlist and add all the tracks to my youtube music playlist. I love it.

SoftTalker · 2025-08-04T16:12:57 1754323977

We'll see more of this sort of thing as AI agents become more popular and capable. They will do things that the site or app should be able to do (or rather, things that users want to be able to do) but don't offer. The YouTube music playlist is a good example. One thing I'd like to be able to do is make a playlist of some specific artists. But you can't. You have to select specific songs.

If sites want to avoid people using agents, they should offer the functionality that people are using the agents to accomplish.

dylan604 · 2025-08-04T19:33:10 1754335990

Let's look at the opposite benefit to a store if a mom that would need to bring her 3 kids to the store vs that mom having a personal shopper. In this case, the personal shopper is "better" for the store as far as physical space. However, I'm sure the store would still rather have the mom and 3 kids physically in the store so that the kids can nag mom into buying unneeded items that are placed specifically to attract those kids' attention.

pixl97 · 2025-08-04T19:44:14 1754336654

>o that the kids can nag mom into buying unneeded items

Excellent. Personal shoppers are 'adblock for IRL'.

>You owe the companies nothing. You especially don't owe them any courtesy. They have re-arranged the world to put themselves in front of you. They never asked for your permission, don't even start asking for theirs.

rapind · 2025-08-04T16:10:02 1754323802

I didn't use the word "problem". In fact I presented no opinion at all. I'm just pointing out that scale matters a lot. In fact, in tech, it's often the only thing that matters. It's naive (or narrative) to think it doesn't.

Everyone having a personal shopper obviously changes the relationship to the products and services you use or purchase via personal shopper. Good, bad, whatever.

mbrumlow · 2025-08-04T15:41:38 1754322098

Well then. Seems like you would be a fool to not allow personal shoppers then.

The point is the web is changing, and people use a different type of browser now. Ans that browser happens to be LLMs.

Anybody complaining about the new browser has just not got it yet, or has and is trying to keep things the old way because they don’t know how or won’t change with the times. We have seen it before, Kodak, blockbuster, whatever.

Grow up cloud flare, some is your business models don’t make sense any more.

goatlover · 2025-08-04T16:00:29 1754323229

Some people use LLMs to search. Other people still prefer going to the actual websites. I'm not going to use an LLM to give me a list of the latest HN posts or NY Times articles, for example.

ToucanLoucan · 2025-08-04T15:46:53 1754322413

> Anybody complaining about the new browser has just not got it yet, or has and is trying to keep things the old way because they don’t know how or won’t change with the times. We have seen it before, Kodak, blockbuster, whatever.

You say this as though all LLM/otherwise automated traffic is for the purposes of fulfilling a request made by a user 100% of the time which is just flatly on-its-face untrue.

Companies make vast amounts of requests for indexing purposes. That could be to facilitate user requests someday, perhaps, but it is not today and not why it's happening. And worse still, LLMs introduce a new third option: that it's not for indexing or for later linking but is instead either for training the language model itself, or for the model to ingest and regurgitate later on with no attribution, with the added fun that it might just make some shit up about whatever you said and be wrong. And as the person buying the web hosting, all of that is subsidized by me.

"The web is changing" does not mean every website must follow suit. Since I built my blog about 2 internet eternities ago, I have seen fad tech come and fad tech go. My blog remains more or less exactly what it was 2 decades ago, with more content and a better stylesheet. I have requested in my robots.txt that my content not be used for LLM training, and I fully expect that to be ignored because tech bros don't respect anyone, even fellow tech bros, when it means they have to change their behavior.

Imustaskforhelp · 2025-08-04T16:30:14 1754325014

Tech bros just respect money. Making money is very easy in the short term if you don't show ethics. Venture capitalism and the whole growth/indie hacking is focused around making money and making it fast.

Its a clear road for disaster. I am honestly surprised by how great Hackernews is, to that comparison where most people are sharing it for the love of the craft as an example. And for that hackernews holds a special place in my heart. (Slightly exaggerating to give it a thematic ending I suppose)

julkali · 2025-08-04T15:48:02 1754322482

Do not conflate your own experience with everyone else's.

tom_m · 2025-08-05T03:33:20 1754364800

Perplexity isn't your personal anything. It's a service just like Postmates and Uber. You want a personal shopper equivalent? You're going to pay more money. It won't say perplexity all over it.

dataflow · 2025-08-05T14:46:18 1754405178

> But I can send my personal shopper and you'll be none the wiser.

They will be quite the wiser if they track/limit how often your shopper enters the store. You probably aren't entering the same store fifteen times every day and neither would be your shopper if they were only doing it on your behalf.

542354234235 · 2025-08-04T15:42:09 1754322129

True, and I would ask, what is your point? Is it that no rule can have 100% perfect enforcement? That all rules have a grey area if you look close enough? Was it just a "gotcha" statement meant to insinuate what the prior commenter said was invalid?

amelius · 2025-08-05T09:00:33 1754384433

But the store owner can ask the personal shopper to leave, if e.g. they find out that they work for a personal shopper service.

account42 · 2025-08-05T13:03:00 1754398980

What the article is advocating for is hiring bouncers that strip all shoppers so they can do just that.

fireflash38 · 2025-08-04T16:14:55 1754324095

And you can be trespassed and prosecuted if you continue to violate.

ghurtado · 2025-08-04T16:06:09 1754323569

Sure. There's lots of things you could do, but you don't do them because they are wrong.

Might does not make right.

rjbwork · 2025-08-04T20:51:10 1754340670

How is it wrong to send my personal shopper? How is it wrong to have an agent act directly on my behalf?

It's like saying a web browser that is customized in any way is wrong. If one configures their browser to eagerly load links so that their next click is instant, is that now wrong?

ghurtado · 2025-08-04T21:17:37 1754342257

Here's a good rule of thumb: if you have to do it without other people knowing, because otherwise they wouldn't let you do it: chances are it's a bad thing to do.

_proofs · 2025-08-05T01:14:20 1754356460

if you send your personal shopper to a store, and the business is... closed for business, or refusing you entry, and you just... go in anyway.

that's called breaking and entering, and generally frowned upon -- by-passing the "closed sign".

itsdesmond · 2025-08-04T15:29:29 1754321369

[flagged]

dang · 2025-08-04T19:14:06 1754334846

Whoa, please don't post like this. We end up banning accounts that do.

https://news.ycombinator.com/newsguidelines.html

itsdesmond · 2025-08-04T22:52:28 1754347948

Aw, alright. I thought it was a funny way to make the point and I figured the yo momma structure was traditional enough to not be taken as a proper insult. Heard tho.

dang · 2025-08-06T06:10:21 1754460621

Thanks for this. Now that you explain your intent, I see the joke. Unfortunately, it's too easy for the intent not to come across in these forsaken little text blobs that we're all limited to here. A lot of it boils down to the absence of voice tone and body language.