More

cryptonector · 2025-08-21T19:25:11 1755804311

> wage slaves

In the knowledge worker space, the wages are pretty nice. It's a stretch to call out "wage slaves".

happymellon · 2025-08-21T20:35:58 1755808558

If you can't quit your job because you have to pay your mortgage, then you are.

kaffekaka · 2025-08-22T11:11:48 1755861108

There is no middleclass. Wage slave is the correct description.

cryptonector · 2025-08-21T19:22:35 1755804155

Devs are expensive. Of course management wants to measure what they produce. It's a hard problem. There are no magic solutions yet.

cryptonector · 2025-08-21T19:20:48 1755804048

GP isn't saying that there is evidence that open offices work. GP is saying that execs want such evidence. Way back when Google was young its execs thought outside the box, so it's no surprise that they didn't copy what MSFT was doing.

smugglerFlynn · 2025-08-22T11:28:57 1755862137

Wait, so now there are thinking-outside-the-box-execs who don’t need any evidence, and regular-gimme-evidence-execs who do?

cryptonector · 2025-08-22T23:13:02 1755904382

Yes, but only in young startups. Once the companies earnings go beyond a certain point they get MBAs for executives.

cryptonector · 2025-08-21T15:10:09 1755789009

If we weren't making shoplifting not a crime, then we wouldn't be having that worry right now.

cryptonector · 2025-08-21T15:09:28 1755788968

Blame jurisdictions that made shoplifting up to $900 or similarly large amounts practically not-a-crime.

cryptonector · 2025-08-21T15:05:20 1755788720

sarcasm detector broken

cryptonector · 2025-08-19T22:26:14 1755642374

Is a "semantic layer" nothing more than a fancy name for a SQL VIEW in a NoSQL?

aszen · 2025-08-19T22:58:23 1755644303

No, it's more than that.

Semantic Layer is about decomposing views into dimensions and aggregates, then letting downstream apps/users compose their own views on top without having to redefine/re-calculate business level metrics.

This makes data analyis more flexible than sql views which are hardcoded on particular groupings.

CharlesW · 2025-08-19T22:58:35 1755644315

It's a lot more. A SQL VIEW is just a saved query, where a semantic layer defines the shared meaning of the data, and helps enforce consistent metrics, joins, and logic across tools. You'd be surprised at how many ways "active customer" can be represented as SQL.

porridgeraisin · 2025-08-20T09:05:24 1755680724

Doesn't a view do that?

  create view active_cx as select * from customer join audit_events using(...) join ... where -- active condition

  -- use active_cx wherever

  select ... from orders join active_cx using(...) where ts > start_of_month() group by active_cx.id

cryptonector · 2025-08-20T18:38:37 1755715117

It sounds like "semantic layer" == views/queries created automatically and on the fly.

Frotag · 2025-08-19T23:26:52 1755646012

Kind of annoying the article writes "What is [a semantic layer] anyway?" twice but never defines it directly.

articsputnik · 2025-08-20T08:55:39 1755680139

OP here - I wrote extensively about, that's why I linked to existing article rather than explaining once more, and focusing on the why and how to build one. See also comment above: https://news.ycombinator.com/reply?id=44960004&goto=item%3Fi...

cryptonector · 2025-08-20T15:12:31 1755702751

I looked for such a link in TFA, and it wasn't obvious.

cryptonector · 2025-08-19T19:40:21 1755632421

The original Unix in-kernel wait queues were also like that.

cryptonector · 2025-08-19T19:16:06 1755630966

Ideally the value should be two words -- 16 bytes on 64-bit systems.

prerok · 2025-08-19T19:32:34 1755631954

Emm, what? Why? If you mean two processor words, which I gather from what you are saying, then I think you are already in the space of full memory barriers.

In that case, why not just say, ideally it would be 256K words, or whatever?

mananaysiempre · 2025-08-19T19:50:10 1755633010

Because mainstream modern architectures (practically speaking, x86-64-v2+ and ARMv8+) give you[1] a two-word compare-and-swap or LL/SC.

[1] https://ibraheem.ca/posts/128-bit-atomics/

adrian_b · 2025-08-20T05:39:18 1755668358

However using compare-and-swap as the atomic operation for implementing multiple events can be very inefficient, because it introduces waiting loops where the threads can waste a lot of time when there is high contention.

The signaling of multiple events is implemented efficiently with atomic bit set and bit clear operations, which are supported since Intel 80386 in 1985, so they are available in all Intel/AMD CPUs, and they are also available in 64-bit Arm CPUs starting with Armv8.2-A, since Cortex-A55 & Cortex-A75, in 2017-2018 (in theory the atomic bit operations were added in Armv8.1-A, but there were no consumer CPUs with that ISA).

With atomic bit operations, each thread signals its event independently of the others and there are no waiting loops. The monitoring thread can determine very fast the event with the highest priority using the LZCNT (count leading zeroes) or equivalent instructions, which is also available on all modern Arm-based or Intel/AMD CPUs.

When a futex is used to implement waiting for multiple events, despite not having proper support in the kernel, the thread that sets its bit to signal an event must also execute a FUTEX_WAKE, so that the monitoring thread will examine the futex value. Because the atomic bit operations are fetch-and-OP operations, like fetch-and-add or atomic exchange, the thread that signals an event can determine whether the previous event signaled by it has been handled or not by the monitoring thread, so it can act accordingly.

So currently on Linux you are limited to waiting for up to 32 events. The number of events can be extended by using a multi-level bitmap, but then the overhead increases significantly. Using a 64-bit futex value would have been much better.

In theory compare-and-swap or the equivalent instruction pair load-exclusive/store-conditional are more universal, but in practice they should be avoided whenever high contention is expected. The high performance algorithms for accessing shared resources are all based on using only fetch-and-add, atomic exchange, atomic bit operations and load-acquire/store-release instructions.

This fact has forced the Arm company to correct their mistake from the first version of the 64-bit ARM ISA, where there were no atomic read-modify-write operations, so they have added all such operations in the first revision of the ISA, i.e. Armv8.1-A.

aktau · 2025-08-20T14:49:21 1755701361

> In theory compare-and-swap or the equivalent instruction pair load-exclusive/store-conditional are more universal, but in practice they should be avoided whenever high contention is expected. The high performance algorithms for accessing shared resources are all based on using only fetch-and-add, atomic exchange, atomic bit operations and load-acquire/store-release instructions.

> This fact has forced ... there were no atomic read-modify-write operations, so they have added all such operations in the first revision of the ISA, i.e. Armv8.1-A.

I'm not sure if you meant for these two paragraphs to be related, but asking too make sure:

  - Isn't compare-and-swap (CMPXCHG on x86) also read-modify-write, which in the first quoted paragraph you mention is slow?
  - I think I've benchmarked LOCK CMPXCHG vs LOCK OR before, with various configurations of reading/writing threads. I was almost sure it was going to be an optimization, and it ended up being inobservable. IIRC, some StackOverflow posts lead me to the notion that LOCK OR still needs to acquire ownership of the target address in memory (RMW). Do you have any more insights? Cases where LOCK OR is better? Or should I have used a different instruction to set a single bit atomically?

viega · 2025-08-20T18:38:17 1755715097

In terms of the relative cycle cost for instructions, the answer definitely has changed a lot over time.

As CAS has become more and more important as the world has scaled out, hardware companies have been more willing to favor "performance" in the cost / performance tradeoff. Meaning, it shouldn't surprise you if uncontended CAS as fast as a fetch-and-or, even if the later is obviously a much simpler operation logically.

But hardware platforms are a very diverse place.

Generally, if you can design your algorithm with a load-and-store, there's a very good chance you're going to deal with contention much better than w/ CAS. But, if the best you can do is use load-and-store but then have a retry loop if the value isn't right, that probably isn't going to be better.

For instance, I have a in-memory debugging "ring buffer" that keeps an "epoch"; threads logging to the ring buffer fetch-and-add themselves an epoch, then mod by the buffer size to find their slot.

Typically, the best performance will happen when I'm keeping one ring buffer per thread-- not too surprising, as there's no contention (but impact of page faults can potentially slow this down).

If the ring buffer is big enough that there's never a collision where a slow writer is still writing when the next cycle through the ring buffer happens, then the only contention is around the counter, and everything still tends to be pretty good, but the work the hardware has to do around syncing the value will 100% slow it down, despite the fact that there is no retries. If you don't use a big buffer, you have to do something different to get a true ring buffer, or you can lock each record, and send the fast writer back to get a new index if it sees a lock. The contention still has the effect of slowing things down either way.

The worst performance will come with the CAS operation though, because when lots of threads are debugging lots of things, there will be lots of retries.

viega · 2025-08-19T20:50:56 1755636656

One thing to add here, I've enjoyed reasonably extensive support for `atomic_compare_exchange_strong()` and the `_explicit` variant for quite a long time (despite the need for the cache line lock on x86).

But, last I checked (the last release, early last year) MUSL still does not provide a 128 bit version, which is disappointing, and hopefully the AVX related semantics changes will encourage them to add it? :)

cryptonector · 2025-08-19T20:53:08 1755636788

Because there are apps that use two-word pass/return-by-value values internally, so it'd be convenient.

cryptonector · 2025-08-19T19:13:43 1755630823

I think part of it is that you shouldn't be using recursive locks, so why bother specifying support for them? IMO.