Fluid Compute and Active CPU Pricing at Vercel: Innovation, Efficiency, How It Was Built, and the Challenges Solved

Martín Marlatto

CSO at WillDom | Partner

Published Aug 1, 2025

In the world of web infrastructure, efficiency and cost have always been central topics. With Fluid compute and the introduction of Active CPU pricing, Vercel has taken a major step forward in redefining serverless computing. But what exactly is Fluid compute, how does Active CPU pricing work, how did Vercel build it, and what technical challenges had to be solved to make it possible? Plus, what insights did Lee Robinson share in his 5-minute demo?

What is Fluid Compute?

Fluid compute is an evolution of the traditional serverless model. In classic serverless, each function runs in its own instance and you pay for the total execution time—including idle periods. Fluid compute allows a single instance to handle multiple concurrent requests, maximizing resource utilization, minimizing cold starts, and significantly reducing costs, especially for I/O-bound workloads like API calls or AI models.

Instead of having instances waiting around doing nothing, Fluid compute reuses active resources and only scales when truly necessary. It also supports advanced tasks like streaming and post-response processing, all without requiring developers to change their code.

How Did Vercel Build Fluid Compute?

1. Inspired by React Server Components and Streaming

The initial inspiration came from the React team’s work on React Server Components (RSC), which enabled streaming UI updates from the server. Vercel wanted to bring true streaming capabilities to their serverless functions, but AWS Lambda (where Vercel Functions run) didn’t originally support streaming responses.

2. Building a Custom Transport Layer

To overcome Lambda’s limitations, Vercel engineered a secure TCP-based protocol that created a tunnel between each Lambda function and Vercel’s infrastructure. Instead of sending a single blob response, functions could now send multiple packets—such as ResponseStarted, ResponseBody, and ResponseEnd—enabling chunk-by-chunk HTTP streaming. This tunnel allowed Vercel’s Function Router to reconstruct and stream responses efficiently to clients.

3. The Rust-Based Core

At the heart of the system is a Rust-based core that acts as a bridge between Vercel’s infrastructure and the user’s code. This core communicates with the language runtime (Node.js or Python) via HTTP, and with the Function Router using the custom TCP protocol. Each response chunk from the language process is transformed into a packet and streamed back through the tunnel.

4. Protocol Extensibility

The custom protocol wasn’t limited to just HTTP responses. Vercel extended it to support new packet types for features like waitUntil (for background tasks), request metrics, session tracing, and larger logs. This extensibility allowed rapid rollout of new features without changing the core transport logic.

5. Breaking the One-Invocation-Per-Instance Model

Traditionally, AWS Lambda handled one request per instance at a time. Vercel’s new transport allowed them to multiplex multiple concurrent requests through the same tunnel to a single instance, drastically improving resource utilization and reducing cold starts.

6. Compute-Resolver: Smart Routing

To make this work at scale, Vercel built a service called compute-resolver—a DNS-like system that tracks which Function Router pods have active connections to a given function. This increases the chance of routing new requests to already-running instances, maximizing reuse and reducing the need to spin up new instances. The resolver handles over 100,000 requests per second with sub-millisecond latency.

7. Real-Time Resource Monitoring and Health Management

Efficient multiplexing requires careful resource management. The Rust core continuously monitors CPU, memory usage, file descriptors, and more. If an instance is overloaded, it can “nack” (negatively acknowledge) new requests, signaling the router to pause traffic until the instance is healthy again. This adaptive system ensures optimal performance and avoids crashes or latency spikes.

8. Language-Agnostic and Adaptive

All of these innovations are language-agnostic and work with both Node.js and Python (with more languages on the way). The system adapts automatically to each workload’s profile, optimizing resource usage several times per second.

Active CPU Pricing: Pay Only for What You Use

The real game-changer is Active CPU pricing. Traditionally, you pay for the wall time a function is “alive,” even if it’s just waiting for another service to respond. With Active CPU pricing, you’re only billed for the milliseconds when the CPU is actually working, plus the provisioned memory. This means savings of up to 95% on workloads with lots of waiting and little active computation. It’s a much fairer and more transparent model, ideal for modern applications, especially those relying on AI or high concurrency.

Real Impact and the Future

Today, more than 75% of Vercel function invocations use Fluid compute, with up to 95% savings on compute costs. This democratizes access to high-performance infrastructure and enables new applications that were previously too expensive to run.

In summary, Fluid compute and Active CPU pricing represent a fundamental shift in how we think about serverless infrastructure: more efficient, fairer, and ready for the challenges of the modern web.

source: https://vercel.com/blog/fluid-how-we-built-serverless-servers

Fluid Compute and Active CPU Pricing at Vercel: Innovation, Efficiency, How It Was Built, and the Challenges Solved

Martín Marlatto

CSO at WillDom | Partner

What is Fluid Compute?

How Did Vercel Build Fluid Compute?

1. Inspired by React Server Components and Streaming

2. Building a Custom Transport Layer

3. The Rust-Based Core

4. Protocol Extensibility

5. Breaking the One-Invocation-Per-Instance Model

6. Compute-Resolver: Smart Routing

7. Real-Time Resource Monitoring and Health Management

8. Language-Agnostic and Adaptive

Active CPU Pricing: Pay Only for What You Use

Real Impact and the Future

More articles by this author

Others also viewed

Dymanic Workload Schedule GA, Confidential Computing and KMS Autokey

AWS Announces 4th Generation Graviton General Purpose Compute SoC: Scale-up Apps, Dual-Socket-Capable

Scale-out processors: How an EU-funded project pioneered cloud-native chips

Why ComputerVault is Built for the Demands of AI Workloads—and Leaves VMware, Nutanix, and Citrix Behind

Intel Xeon CPUs for Server Stability and Performance

Striking the Perfect Balance: Presenting the ORION CX410R-G6 Server for Complex Workflows

Beyond Containers: Exploring the MicroVM Revolution 🔐 Part 6: The Security Model of MicroVMs – Why Firecracker Is Built for Isolation

Paper review - Kangaroo: Caching billions of tiny objects on flash (Part 1)

🔐 Deep Dive: Confidential Computing on OVHcloud — Expert Technical Guide

Beyond Containers: Exploring the MicroVM Revolution 🦀 Part 5: Exploring the Rust-VMM Ecosystem – Building Blocks for Custom VMMs

Explore topics

What is Fluid Compute?

How Did Vercel Build Fluid Compute?

1. Inspired by React Server Components and Streaming

2. Building a Custom Transport Layer

3. The Rust-Based Core

4. Protocol Extensibility

5. Breaking the One-Invocation-Per-Instance Model

6. Compute-Resolver: Smart Routing

7. Real-Time Resource Monitoring and Health Management

8. Language-Agnostic and Adaptive

Active CPU Pricing: Pay Only for What You Use

Real Impact and the Future

Breaking the Token Barrier: How Dynamic Chunking (H-Net) Is Redefining AI Text Processing

Aug 11, 2025

The New Code: Why Spec-Writing Is the Real Superpower in the Age of AI

Jul 24, 2025

Software is Changing (Again): Reflections on Andrej Karpathy's Vision for the AI Era

Jul 10, 2025

The Future of Work with AI Agents: What Do Workers Really Want?

Jun 20, 2025

The Future of AI: Insights from Sam Altman on AGI, GPT-5, and Project Stargate

Jun 19, 2025

Visiting the #StartupNation

Dec 21, 2017

New venture CEOs and CTOs wanted

Aug 1, 2016

Others also viewed

Dymanic Workload Schedule GA, Confidential Computing and KMS Autokey

AWS Announces 4th Generation Graviton General Purpose Compute SoC: Scale-up Apps, Dual-Socket-Capable

Scale-out processors: How an EU-funded project pioneered cloud-native chips

Why ComputerVault is Built for the Demands of AI Workloads—and Leaves VMware, Nutanix, and Citrix Behind

Intel Xeon CPUs for Server Stability and Performance

Striking the Perfect Balance: Presenting the ORION CX410R-G6 Server for Complex Workflows

Beyond Containers: Exploring the MicroVM Revolution 🔐 Part 6: The Security Model of MicroVMs – Why Firecracker Is Built for Isolation

Paper review - Kangaroo: Caching billions of tiny objects on flash (Part 1)

🔐 Deep Dive: Confidential Computing on OVHcloud — Expert Technical Guide

Beyond Containers: Exploring the MicroVM Revolution 🦀 Part 5: Exploring the Rust-VMM Ecosystem – Building Blocks for Custom VMMs

Explore topics