Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You asked the web-enabled AI to look at the domains.

Right, and the domain was configured to disallow crawlers, but Perplexity crawled it anyway. I am really struggling to see how this is hard to understand. If you mean to say "I don't think there is anything wrong with ignoring robots.txt" then just say that. Don't pretend they didn't make it clear what they're objecting to, because they spell it out repeatedly.



> Perplexity crawled it anyway

No, they did not. Crawling = recursive fetching, which wasn't what was happening here.

But also, I don't think there is anything wrong with ignoring robots.txt. In fact, I believe it is discriminatory and people should ignore it. See: https://wiki.archiveteam.org/index.php/Robots.txt


> I don't think there is anything wrong with ignoring robots.txt

Neither do I, I just thought your reply was disingenuous.

> Crawling = recursive fetching

I do not find this convincing. I am ok with using the word crawler for recursive fetching only. But robots.txt is not only for excluding crawlers and never has been. From the very beginning it was used to exclude specific automated clients, whether they only fetch one page or many, and that is certainly how the vast majority of people think about it today.

Like I implied in my first comment, I have no problem with you saying you dislike robots.txt, but it is not reasonable to pretend the article is unclear in some way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: