Efficiency—why??
A takeaway from Cat Swetel’s talk at Agile 2025:
Understand why your company cares about efficiency.
Is it part of the business model?
Trying to ride out hard times?
or something else?
“Efficiency can be about hoarding or efficiency can be about access.” - Cat Swetel
Nubank serves millions of people who aren’t profitable for older banks, so older banks won’t (can’t) serve them. These are people with high transaction volumes and low balances. Nubank can only do this with super efficient transaction processing. It’s part of their business model.
Nubank is entirely on AWS, so they’re really good at optimizing cloud costs. For instance, they use a lot of Spot instances: cheaper than regular instances, but that computer can be taken away with 2 minutes’ notice.
With Spot, you pay less, but your infrastructure is less stable. The software has to compensate: services are quick to start up and shut down, and every part of the system handles ephemerality. Optimize for fault tolerance, not the happy path.
Any delay in one service slows down the services calling it. When this leads those services to scale up, then more EC2 instances start up. This negates savings downstream. The whole flow has to expect ephemerality to see any savings--efficiency is not a local concern.
For Spot instances, Nubank worked with the instability and saved money. In an other case, paying more for infrastructure saved money.
Their database, Datomic, has a local cache and an external cache. When transaction volumes shot up, the local cache ran out of space. Fetches from the external cache increased latency, slowing down all the services depending on this. Some of their spot instances happened to have an SSD, and they noticed that using that SSD to expand the local cache saved time (and therefore money) across the entire transaction flow. Numbers like: Spending $1 for SSD saved $3500 across the whole flow!
If each team minimized cost, then the database nodes wouldn’t add that SSD. Efficiency in the system is bigger than efficiency in the components. Nubank measures the cost of a flow, not a service. (They use honeycomb.io for this!)
Nubank learned: the Cost <-> Stability tradeoff isn’t. Instability is expensive! When fluctuations lead to scale-up, scale-down thrashing, efficiency is defeated.
They continually drive down cost per customer served, so they can serve more people. This is efficiency as business model.
Director of Engineering, Kubernetes at Zynga
1wGood take
cofounder/CTO, honeycomb.io
1wWell, it CAN be... every additional 9 of reliability costs an order of magnitude more. as Alex Ewerlöf and others have pointed out. 😅 But starting off by understanding what efficiency means to you and why is absolutely the right place to start, as Cat Swetel says!
Co-Founder & CEO at Draftt • Hiring ✨
1wThis is the kind of efficiency most dashboards miss. Loved the framing, efficiency at the system level, not the service level. That $1 → $3500 SSD example says it all.