Fluid Compute and Active CPU Pricing at Vercel: Innovation, Efficiency, How It Was Built, and the Challenges Solved
vercel.com

Fluid Compute and Active CPU Pricing at Vercel: Innovation, Efficiency, How It Was Built, and the Challenges Solved

In the world of web infrastructure, efficiency and cost have always been central topics. With Fluid compute and the introduction of Active CPU pricing, Vercel has taken a major step forward in redefining serverless computing. But what exactly is Fluid compute, how does Active CPU pricing work, how did Vercel build it, and what technical challenges had to be solved to make it possible? Plus, what insights did Lee Robinson share in his 5-minute demo?

What is Fluid Compute?

Fluid compute is an evolution of the traditional serverless model. In classic serverless, each function runs in its own instance and you pay for the total execution time—including idle periods. Fluid compute allows a single instance to handle multiple concurrent requests, maximizing resource utilization, minimizing cold starts, and significantly reducing costs, especially for I/O-bound workloads like API calls or AI models.

Instead of having instances waiting around doing nothing, Fluid compute reuses active resources and only scales when truly necessary. It also supports advanced tasks like streaming and post-response processing, all without requiring developers to change their code.

How Did Vercel Build Fluid Compute?

1. Inspired by React Server Components and Streaming

The initial inspiration came from the React team’s work on React Server Components (RSC), which enabled streaming UI updates from the server. Vercel wanted to bring true streaming capabilities to their serverless functions, but AWS Lambda (where Vercel Functions run) didn’t originally support streaming responses.

2. Building a Custom Transport Layer

To overcome Lambda’s limitations, Vercel engineered a secure TCP-based protocol that created a tunnel between each Lambda function and Vercel’s infrastructure. Instead of sending a single blob response, functions could now send multiple packets—such as ResponseStarted, ResponseBody, and ResponseEnd—enabling chunk-by-chunk HTTP streaming. This tunnel allowed Vercel’s Function Router to reconstruct and stream responses efficiently to clients.

3. The Rust-Based Core

At the heart of the system is a Rust-based core that acts as a bridge between Vercel’s infrastructure and the user’s code. This core communicates with the language runtime (Node.js or Python) via HTTP, and with the Function Router using the custom TCP protocol. Each response chunk from the language process is transformed into a packet and streamed back through the tunnel.

4. Protocol Extensibility

The custom protocol wasn’t limited to just HTTP responses. Vercel extended it to support new packet types for features like waitUntil (for background tasks), request metrics, session tracing, and larger logs. This extensibility allowed rapid rollout of new features without changing the core transport logic.

5. Breaking the One-Invocation-Per-Instance Model

Traditionally, AWS Lambda handled one request per instance at a time. Vercel’s new transport allowed them to multiplex multiple concurrent requests through the same tunnel to a single instance, drastically improving resource utilization and reducing cold starts.

6. Compute-Resolver: Smart Routing

To make this work at scale, Vercel built a service called compute-resolver—a DNS-like system that tracks which Function Router pods have active connections to a given function. This increases the chance of routing new requests to already-running instances, maximizing reuse and reducing the need to spin up new instances. The resolver handles over 100,000 requests per second with sub-millisecond latency.

7. Real-Time Resource Monitoring and Health Management

Efficient multiplexing requires careful resource management. The Rust core continuously monitors CPU, memory usage, file descriptors, and more. If an instance is overloaded, it can “nack” (negatively acknowledge) new requests, signaling the router to pause traffic until the instance is healthy again. This adaptive system ensures optimal performance and avoids crashes or latency spikes.

8. Language-Agnostic and Adaptive

All of these innovations are language-agnostic and work with both Node.js and Python (with more languages on the way). The system adapts automatically to each workload’s profile, optimizing resource usage several times per second.

Active CPU Pricing: Pay Only for What You Use

The real game-changer is Active CPU pricing. Traditionally, you pay for the wall time a function is “alive,” even if it’s just waiting for another service to respond. With Active CPU pricing, you’re only billed for the milliseconds when the CPU is actually working, plus the provisioned memory. This means savings of up to 95% on workloads with lots of waiting and little active computation. It’s a much fairer and more transparent model, ideal for modern applications, especially those relying on AI or high concurrency.

Real Impact and the Future

Today, more than 75% of Vercel function invocations use Fluid compute, with up to 95% savings on compute costs. This democratizes access to high-performance infrastructure and enables new applications that were previously too expensive to run.

In summary, Fluid compute and Active CPU pricing represent a fundamental shift in how we think about serverless infrastructure: more efficient, fairer, and ready for the challenges of the modern web.

source: https://vercel.com/blog/fluid-how-we-built-serverless-servers


To view or add a comment, sign in

Others also viewed

Explore topics