Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> But if, as I suspect, Perplexity are visiting that page and then using information from that webpage in order to train their model then sorry mate, you're a crawler, you're just using a user as a proxy for your crawling activity.

If it is not recursive access, and is only one file, then it hopefully should be OK (except for issues with HTML where common browsers will usually also download CSS, JavaScripts, WebAssembly, pictures, favicons (even if the web page does not declare any favicons), etc; many "small web" formats deliberately avoid this), especially if it is just used only since you requested it.

However, if they do then use it to train their model, without documenting that, that can be a problem, especially if the file being accessed is not intended to be public; but this is a different issue than the above.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: