Unless I am misunderstanding you, you are talking about something different than...

lukeschlather · 2025-08-04T14:44:05 1754318645

You probably need a computer that costs $250,000 or more to run the kind of LLM that Perplexity uses, but with batching it costs pennies to have the same LLM fetch a page for you, summarize the content, and tell you what is on it. And the power usage similarly, running the LLM for a single user will cost you a huge amount of money relative to the power it takes in a cloud environment with many users.

Perplexity's "web crawler" is mostly operating like this on behalf of users, so they don't need a massively expensive computer to run an LLM.

sterlind · 2025-08-04T23:06:01 1754348761

does Perplexity store crawled pages for training?

st3fan · 2025-08-04T21:12:24 1754341944

Is the article really talking about crawling? Because in one of their screenshots where they ask information about the "honeypot" website you can see that the model requested pages from the website. But that is most definitely "fetching by proxy because I asked a question about the website" and not random crawling.

It is confusing.

johnfn · 2025-08-05T00:50:40 1754355040

Yea, now I feel like my comment might be misleading. The title mentions crawling; the article itself is talking about something else.