Why Cloud-Centric AI Agents Won’t Scale: Device-First Is the Only Viable Option

Siavash Alamouti

Published Jun 26, 2025

As we move into the era of agentic AI, where intelligent agents will assist us across every domain from healthcare to robotics, one flawed assumption continues to echo in boardrooms and product meetings:

“Most inference will happen in the cloud.”

That assumption is not just wrong. It’s economically, physically, and strategically unsustainable.

1. Cloud Inference Needs More Than a Prompt

Running AI inference in the cloud isn’t just about sending a text prompt. Real-world agents running on robots, autonomous vehicles, smart devices need to send rich, time-sensitive context to function correctly:

• Live video and audio feeds

• LiDAR and sensor fusion data

• Environmental conditions (temperature, noise, pressure)

• Interaction history, local state, user inputs

This isn’t theoretical. These are the inputs required to infer correctly. Without context, models hallucinate or misfire.

And this context must be streamed to the cloud for every inference, over and over again.

2. What Context Data Really Means (per Agent)

Let’s assume:

• 3 camera feeds @ 720p/30fps (compressed): ~2 Mbps

• 1 LiDAR stream: ~1 Mbps

• Microphone/audio: ~0.2 Mbps

• Environmental + metadata: ~0.1 Mbps

• Context snapshots, app data: ~1 GB/day

Total per agent: ~3 Mbps = ~32.4 GB/day

This is a conservative figure discounting autonomous vehicles and other smart things that generate up to 4 Tbps/day.

Let’s use credible third-party projections for the number of intelligent agents in the near future.

• Gartner: By 2028, “15 billion connected things will include embedded AI.”

• OECD / McKinsey: By 2030, over 100 billion connected devices, many with local AI capabilities

• Anthropic, OpenAI, and Google have all suggested that every user/device/app will host one or more persistent agents

Let’s conservatively model for:

• 10 billion agents (low-end conservative scenario)

• 1 million tokens/day/agent (still conservative)

For 10 billion agents, that’s:

• 324 exabytes/day

• Sustained throughput of ~3,000 Tbps globally

• 2.5× more bandwidth than the entire world’s international internet capacity (1,217 Tbps, source: Telegeography 2023)

3. Bandwidth Cost Just to Send Context to the Cloud

Let’s use hyperscaler internal transfer cost: $0.02 per GB

• 10B agents × 32.4 GB = 324 EB/day

• 1 EB = 1 billion GB → 324 EB = 324B GB/day

• × $0.02/GB = $6.48 billion/day

• $2.36 trillion/year, just to move context to the cloud

This doesn’t include inference cost, compute infrastructure, or retrievals just the cost of sending context upstream.

And that’s assuming no duplication, retries, or encryption overhead.

4. Real-time agents cannot wait for the cloud

Cloud inference latency typically ranges from 300 to 1,000 milliseconds. In contrast, real-world agents such as robotics systems, AR glasses, and autonomous vehicles require responses within 5 to 30 milliseconds. Even if we assume the cloud is always available and responsive, the latency gap alone makes real-time decision-making impossible.

In practice, cloud services are not always available. Availability figures ranging from 90 percent to 99.9 percent are common in service level agreements, depending on region, provider, and network conditions. For example, 99.9 percent availability still allows for more than 8 hours of downtime per year. In many consumer-grade or edge environments, availability may be closer to 90 percent, especially when connectivity is variable or infrastructure is shared.

This means agents relying on cloud inference are likely to fail 10 percent of the time in typical environments, and 100 percent of the time whenever connectivity is lost or degraded.

Cloud-based inference not only costs too much. It simply cannot meet the performance or reliability requirements of most agentic systems, even under ideal assumptions.

5. “High-Value AI” Isn’t Where You Think It Is

There’s a lingering belief that the real value in AI is in the cloud; training giant models, aggregating data.

But that was true in the Web 2.0 era, when AI gave us access to knowledge, and we were the inference engine.

In the agentic era:

• Knowledge is commoditized

• Inference is where decisions and value are created

Whether it’s a drone deciding on a flight path, a surgical robot adjusting grip, or an AI agent monitoring factory safety, the value lies in the decision made in context, in real time.

You don’t monetize knowledge. You monetize action.

6. The Only Sustainable Path: Device-First, Cloud-Aware AI

Article content — Attributes of Cloud-First vs. Device First Model.

7. Final Thought

Anyone arguing that AI inference must stay in the cloud is applying the logic of Web 2.0 to the reality of agentic AI. That model collapses under scale.

The AI economy will be built not on access to knowledge, but on the ability to act on it instantly, locally, and intelligently.

• Inference is the monetization layer

• Context lives on the device

• Cloud is a fallback, not a default

The future of AI is not more infrastructure. It’s smarter devices.

Anthony Loiacono

Founder, Chairman | Media, Licensing, Equine IOT Technologies

Absolutely. Cloud-first thinking made sense yesterday. But in the agentic era, intelligence isn’t about scale—it’s about proximity. Latency, energy, and bandwidth aren’t just tech hurdles—they’re economic and user experience dead-ends. The edge isn’t an optimization—it’s the requirement. Fay Arjomandi

Srinath Hosur

Touch and Sensing ASIC Architect at Apple

1mo

Nice breakdown!!

Chad Pralle

Windows AI NPU Strategy and Execution, BS and MS EE, ex-Microsoft, ex-AT&T Wireless, ex-Motorola, former CMO

1mo

Having spent the last few years getting AI onto edge devices, you're definitely on the right track here. The problem is that most people don't really understand what AI is and think it's only about the frontier LLM models. You are right to highlight other types of AI here and the need for some of that to be on the device. But there will always be models that are suitable for cloud inferencing, and simply too large to execute locally. Running those models in the cloud isn't a fallback, it's a necessity.

Ali Nouruzi

Research Associate

1mo

💡 Great insight

Debarag Banerjee

1mo

Completely agree Siavash Alamouti. That's why my long term roadmap includes Edge AI as a key element

Why Cloud-Centric AI Agents Won’t Scale: Device-First Is the Only Viable Option

Siavash Alamouti

1. Cloud Inference Needs More Than a Prompt

2. What Context Data Really Means (per Agent)

3. Bandwidth Cost Just to Send Context to the Cloud

4. Real-time agents cannot wait for the cloud

5. “High-Value AI” Isn’t Where You Think It Is

6. The Only Sustainable Path: Device-First, Cloud-Aware AI

7. Final Thought

More articles by this author

Explore topics

1. Cloud Inference Needs More Than a Prompt

2. What Context Data Really Means (per Agent)

3. Bandwidth Cost Just to Send Context to the Cloud

4. Real-time agents cannot wait for the cloud

5. “High-Value AI” Isn’t Where You Think It Is

6. The Only Sustainable Path: Device-First, Cloud-Aware AI

7. Final Thought

Why SaaS Will Soon Die

Jul 17, 2025

The Myth of “Nothing Is Impossible”: Why It’s Time to Kill This Cliché

Jun 30, 2025

THE FRAUD OF WORK-LIFE BALANCE IN THE CREATIVE AGE

May 14, 2025

Hybrid Edge Cloud Starts at EndPoint Devices: The Misplaced Hype Around Edge Gateways

May 5, 2025

DeepSeek's $6M LLM: Efficiency Masterclass or Propaganda?

Jan 28, 2025

Hybrid Edge Cloud: the Wi-Fi of the Agentic Era

Jan 17, 2025

The Unintended Consequences of Diversity and Inclusion Programs

Jan 15, 2025

California’s Priorities: A Call to Action for Resilience and Public Safety

Jan 10, 2025

The innovators' dilemma: legacy resistance to change!!!

Jan 22, 2016

Explore topics