Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLMs are not "someone", LLMs are something, and they don't "read content", they by definition acquire and reuse that content (for example, by summarizing it), as part of their product.

So here the consent is indeed about what can be done with the data.

In general, it's absolutely the norm that public websites (I.e., unauthenticated) restrict even who can access the data. The simplest example that comes to mind is geoblocking. I have all the rights to say that my website is not made available to anybody in the US, for example. Would you still call that website "public"? Would bypassing the block via a VPN be a violation of my consent? This is mostly a moral discussion I suppose.

But anyway, it's not what's happening here. LLMs access content for the sole purpose of doing something with that content, either training or providing the service to their customers. They are not humans, they are not consumers, they don't simply fetch the content and present it to the users (a much more neutral action, like curl or the browser does). It's impossible to distinguish, in the case of LLMs the act of accessing and the act of using, so the difference you make doesn't apply in my opinion.



LLMs are indeed not "someone". They are programs, like web browsers, acting on user instruction. The user is a person. I am only talking about people - I never said that an LLM does anything of its own volition.

> The simplest example that comes to mind is geoblocking.

Do you think it is alright to geoblock people, for arbitrary reasons? It is one thing when GDPR imposes a legal obligation on you for serving content in a particular way. Note that that actually doesn't prevent you from seeing the content, it just prevents you from being served by that server. The distinction is important - circumventing a geoblock is something I think should be legally protected.

> They are not humans, they are not consumers, they don't simply fetch the content and present it to the users

They simply fetch the content, run it through a software, and present it to the user. As far as you, the service owner, are concerned, they are simply fetching the content for the user. It is none of your business what the user and the AI company go on to do with "your content".


> like web browsers, acting on user instruction.

No, they are not like browsers. The browser access my content in a transparent way. An LLM reuses the information and acts as an opaque intermediary which - maybe - will at most add a reference to my content.

> I never said that an LLM does anything of its own volition

It doesn't matter why it does what it does, it matters what it does. Your previous comment stressed the idea that it's possible to regulate _what can be done_ with my intellectual property (licensing), but not who can access it, once made it public. What I am saying is that this is exactly the case for LLMs, who _use_ my intellectual property, they are not a tool to _access_ it (like a browser).

> Do you think it is alright to geoblock people, for arbitrary reasons?

Yes. Why wouldn't it be? And if you believe it's not, where do you draw the line? Once you share a picture with your partner, everyone has the right to see it? Or if you share it with your group of friends? Or if you share it on a private social media profile (where you have acquaintances)? When does the audience turn from "a restricted group" to "everyone"? Or why would it be different with my blog? If I want my blog accessible only from my country, I can absolutely do that and there is nothing wrong with it at all. Nobody is entitled to my intellectual property. Obviously I am playing devil's advocate, but this was to say that the fact that something is public, doesn't mean it's unrestricted. And don't get me started on "the spirit of the internet". I can't imagine something breaking that spirit more than LLMs acting as interface between people and the other people on the internet. That spirit is gone, and belongs to a time when the internet was tiny. When OpenAI and company will respect the "spirit of the internet", maybe I will think about doing the same.

> As far as you, the service owner, are concerned, they are simply fetching the content for the user. It is none of your business what the user and the AI company go on to do with "your content".

No, as far as I am concerned the program can take my information, summarize, change, distort, misinterpret it and then present it back to its user. This can happen with or without the user ever knowing that the information can from me. Considering this equal to the user accessing the information is something I simply will not concede and is a fundamental disagreement between us, from which many other disagreements stems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: