Hacker Newsnew | past | comments | ask | show | jobs | submit | rustc's commentslogin

Since when does PayPal Honey replace ads on websites?

> PayPal Honey is a browser extension that automatically finds and applies coupon codes at checkout with a single click.


They overwrite ad attributions, affiliate links, clickthrough attributions with their own.


> Should curl be considered a bot too? What's the difference?

Perplexity definitely does:

    $ curl -sI https://www.perplexity.ai | head -1
    HTTP/2 403


> Crawling and scraping is legal. If your web server serves the content without authentication, it's legal to receive it, even if it's an automated process.

> If you want to gatekeep your content, use authentication.

Are there no limits on what you use the content for? I can start my own search engine that just scrapes Google results?


Yes, I believe that's basically what https://serpapi.com/ is doing.


There are many APIs that scrape Google but I don't know of any search engine that scrapes and rebrands Google results. Kagi.com pays Google for search results. Either Kagi has a better deal than SERP apis (I doubt) or this is not legal.


I tried to scrape Google results once using an automated process, and quickly got banned from all of Google. They banned my IP address completely. It kind of really sucked for a while, until my ISP assigned a new IP address. Funny enough, this was about 15 years ago and I was exploring developing something very similar to what LLMs are today.


I think OP based this on an old case about what you can do with data from Facebook vs LinkedIn based on if you need to be logged in to get it. Not relevant when you talk about scraping in this case I think. P is clearly in the wrong here.


It's ironic Perplexity itself blocks crawlers:

    $ curl -sI https://www.perplexity.ai | head -1
    HTTP/2 403
Edit: trying to fake a browser user agent with curl also doesn't work, they're using a more sophisticated method to detect crawlers.


someone asked this already to the CEO: https://x.com/AravSrinivas/status/1819610286036488625


The bots are coming from inside the house


ironically... they use cloudflare.



And now you spend the same time verifying/reviewing AI output?


If before I did a thing in 60 minutes and now Claude Code does it in 5 minutes I will not spend 55 minutes reviewing that code.

I will maybe spend 5-10 minutes reviewing and refining the code with the help of Claude Code and then the rest of the time I will go for another feature/bugfix.


Worth adding that sometimes I will spend an ~equivalent amount of time doing something in Claude Code, but the result is better.

Case in point recently I was working on a mobile app where I had to check for a whole litany of user permissions and present UI to the user if any particular permission was missing, including instructions on how to rectify it.

Super annoying to do manually, but Claude Code was not only able to exhaustively enumerate all possible combos of missing permissions, but also automatically create the UIs for each edge case. I reviewed all of it for accuracy, which took some time.

I probably would've missed some of the more obscure edge cases on my own.

Overall maybe not much faster than doing it myself, but I'm pretty sure the results were substantially better.


I spend a fraction of the time verifying LLM production of rote code --- which I do in fact do, I'm not a vibe coder --- than I would writing it. I don't understand why people always expect this to be a mic drop rebuttal.


Do you feel like you end up with as clear of a mental model reviewing it as you do if you wrote it?

I'm still trying to figure out the answer to that question for myself. Maybe the answer is, "Probably not, and it probably doesn't matter" but I'm still trying to figure out what kind of downstream effects that may have later on my judgment.


Yes, of course I do. It's rote stuff. To the balance of time we're accruing to me dealing with generated code, add "stripping off all the comments", "fixing variable names to be like I like them", etc. My fingerprints are still all over everything. And it's still radically faster than doing this all by hand.

Mental expenditure on programming is also not linear through a task; it takes much more energy to get started than to do the back half. Ever stared at an empty function for a minute trying to come up with the right variable name, or choosing which value to compute first? LLMs are geniuses at just getting things started.


> did they not make it clear to their customers that letting the Nonsense Generator have root was a bad idea?

No, the opposite (from replit.com home page):

> The safest place for vibe coding

> Vibe coding makes software creation accessible to everyone, entirely through natural language. Whether it’s personal software for yourself and family, a new business coming to life, or internal tools at your workplace, Replit is the best place for anybody to build.


> I recently made a user script that tries to highlight LLM generated text

How does that work?


It uses one of Mozilla's models: https://huggingface.co/fakespot-ai/roberta-base-ai-text-dete...

It has a pretty high false positive rate though, but it reliably highlights AI generated spam websites and saves me from having to read them.


Random number generator and vibes probably.


> Ha, nice find! I'm the Adriaan in adriaan.com. I'm testing some new script features that might improve deliverability. It's not sending any personal data. I use another domain to have the least effect of ad-blockers.

You are sending the user agent, path, referrer, a session id + the IP (which is automatically sent) to your personal server and also using a different domain to track users who have ad blockers installed. Even Google Analytics does not use random domain names to track adblock users (yet).


So the correct title must be: "SimpleAnalytics track you TWICE when you're reading about Google tracking you (even when using DuckDuckGo)."


"Honeypot even when using DuckDuckGo"


Nice reminder to disable javascript or just use Tor Browser to open any links you don't want associated with your public presence


How would this apply to open-weight models? The creators of the models cannot know who is using the model and for what.


Why would it be the creators’ responsibility? It should be the one running the model.


Just the same? If you publish a model that doesn't follow these rules nobody in the EU could use that model in their business. You could publish unlicensed source code as well and nobody could really use it for anything business related either.


> If you publish a model that doesn't follow these rules

The rules require tracking outputs, which open-weight models cannot do. So I'm wondering if open-weight models have separate rules or this effectively bans releasing such models.


Of course they can. Lets assume you are using such a model in your product, this now makes tracking its output your responsibility. It is really no different from the way you would use an open source library.


That is confusing since closed-weight models (the models, not the applications using the models) also can't track outputs. It would be weird if the rules applied to the model and not the application because then it literally only applies to open-weight models since the closed-weight models are, by definition, never released to the public.

Trying to understand the rules but it doesn't seem to make a clear distinction between these things. I assume that they are intending the applications that use the models, not the models.


You ban open weight models and the problem is solved


Users of the model could be legally required to take part in the tracking.


The service provider (or whoever was _running_ the thing) would be liable, I'd assume, not the model creator.


Probably because that requires a paid account ($100/yr).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: