Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Unless I am misunderstanding you, you are talking about something different than the article. The article is talking about web-crawling. You are talking about local / personal LLM usage. No one has any problems with local / personal LLM usage. It's when Perplexity uses web crawlers that an issue arises.


You probably need a computer that costs $250,000 or more to run the kind of LLM that Perplexity uses, but with batching it costs pennies to have the same LLM fetch a page for you, summarize the content, and tell you what is on it. And the power usage similarly, running the LLM for a single user will cost you a huge amount of money relative to the power it takes in a cloud environment with many users.

Perplexity's "web crawler" is mostly operating like this on behalf of users, so they don't need a massively expensive computer to run an LLM.


does Perplexity store crawled pages for training?


Is the article really talking about crawling? Because in one of their screenshots where they ask information about the "honeypot" website you can see that the model requested pages from the website. But that is most definitely "fetching by proxy because I asked a question about the website" and not random crawling.

It is confusing.


Yea, now I feel like my comment might be misleading. The title mentions crawling; the article itself is talking about something else.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: