Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Stealth" crawlers are always going to win the game.

There are ways to build scrapers using browser automation tools [0,1] that makes detection virtually impossible. You can still captcha, but the person building the automation tools can add human-in-the-loop workflows to process these during normal business hours (i.e., when a call center is staffed).

I've seen some raster-level scraping techniques used in game dev testing 15 years ago that would really bother some of these internet police officers.

[0] https://www.w3.org/TR/webdriver2/

[1] https://chromedevtools.github.io/devtools-protocol/



> "Stealth" crawlers are always going to win the game.

no, because we'll end up with remote attestation needed to access any site of value


Yes, because there's always the option for a camera pointed at the screen and a robot arm moving the mouse. AI is hoping to solve much harder problems.


Won't work with biometric attestation. For example, banks in China require periodic facial recognition to continue the banking session.


What's stopping these companies from offloading the scraping onto their users?

"Either pay us $50/month or install our extension, and when prompted, solve any captchas or authenticate with your ID (as applicable) on the given website so we can train on the content.


yea but those are not open sites, try imposing that on an open site you'd want to actually attract human traffic to


see, literally, reddit requiring teenagers to open their mouth and roll their heads around to enter.


I heard that it could be easily bypass through realistic 3D human game model with basic mouth open and head tilt animation, even gmod can do such thing.


Almost no site of value will use remote attestation because an alternative that works will all of your devices, operating systems, ad blockers and extensions will attract more users than your locked-down site.


> alternative that works will all of your devices, operating systems, ad blockers and extensions

When 99.9% of users are using the same few types of locked down devices, operating systems, and browsers that all support remote attestation, the 0.1% doesn't matter. This is already the case on mobile devices, it's only a matter of time until computers become just as locked down.


tell that to the massive content sites already using widevine


But for the case of Perplexity-User, presumably the user is in the loop to provide their attestation.

This case (“go research this subject for me”) is the grey area here. It’s not the same as simple scraping or search indexing, it’s a new activity that is similar in some ways.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: