Why Cloud-Centric AI Agents Won’t Scale: Device-First Is the Only Viable Option
As we move into the era of agentic AI, where intelligent agents will assist us across every domain from healthcare to robotics, one flawed assumption continues to echo in boardrooms and product meetings:
“Most inference will happen in the cloud.”
That assumption is not just wrong. It’s economically, physically, and strategically unsustainable.
1. Cloud Inference Needs More Than a Prompt
Running AI inference in the cloud isn’t just about sending a text prompt. Real-world agents running on robots, autonomous vehicles, smart devices need to send rich, time-sensitive context to function correctly:
• Live video and audio feeds
• LiDAR and sensor fusion data
• Environmental conditions (temperature, noise, pressure)
• Interaction history, local state, user inputs
This isn’t theoretical. These are the inputs required to infer correctly. Without context, models hallucinate or misfire.
And this context must be streamed to the cloud for every inference, over and over again.
2. What Context Data Really Means (per Agent)
Let’s assume:
• 3 camera feeds @ 720p/30fps (compressed): ~2 Mbps
• 1 LiDAR stream: ~1 Mbps
• Microphone/audio: ~0.2 Mbps
• Environmental + metadata: ~0.1 Mbps
• Context snapshots, app data: ~1 GB/day
Total per agent: ~3 Mbps = ~32.4 GB/day
This is a conservative figure discounting autonomous vehicles and other smart things that generate up to 4 Tbps/day.
Let’s use credible third-party projections for the number of intelligent agents in the near future.
• Gartner: By 2028, “15 billion connected things will include embedded AI.”
• OECD / McKinsey: By 2030, over 100 billion connected devices, many with local AI capabilities
• Anthropic, OpenAI, and Google have all suggested that every user/device/app will host one or more persistent agents
Let’s conservatively model for:
• 10 billion agents (low-end conservative scenario)
• 1 million tokens/day/agent (still conservative)
For 10 billion agents, that’s:
• 324 exabytes/day
• Sustained throughput of ~3,000 Tbps globally
• 2.5× more bandwidth than the entire world’s international internet capacity (1,217 Tbps, source: Telegeography 2023)
3. Bandwidth Cost Just to Send Context to the Cloud
Let’s use hyperscaler internal transfer cost: $0.02 per GB
• 10B agents × 32.4 GB = 324 EB/day
• 1 EB = 1 billion GB → 324 EB = 324B GB/day
• × $0.02/GB = $6.48 billion/day
• $2.36 trillion/year, just to move context to the cloud
This doesn’t include inference cost, compute infrastructure, or retrievals just the cost of sending context upstream.
And that’s assuming no duplication, retries, or encryption overhead.
4. Real-time agents cannot wait for the cloud
Cloud inference latency typically ranges from 300 to 1,000 milliseconds. In contrast, real-world agents such as robotics systems, AR glasses, and autonomous vehicles require responses within 5 to 30 milliseconds. Even if we assume the cloud is always available and responsive, the latency gap alone makes real-time decision-making impossible.
In practice, cloud services are not always available. Availability figures ranging from 90 percent to 99.9 percent are common in service level agreements, depending on region, provider, and network conditions. For example, 99.9 percent availability still allows for more than 8 hours of downtime per year. In many consumer-grade or edge environments, availability may be closer to 90 percent, especially when connectivity is variable or infrastructure is shared.
This means agents relying on cloud inference are likely to fail 10 percent of the time in typical environments, and 100 percent of the time whenever connectivity is lost or degraded.
Cloud-based inference not only costs too much. It simply cannot meet the performance or reliability requirements of most agentic systems, even under ideal assumptions.
5. “High-Value AI” Isn’t Where You Think It Is
There’s a lingering belief that the real value in AI is in the cloud; training giant models, aggregating data.
But that was true in the Web 2.0 era, when AI gave us access to knowledge, and we were the inference engine.
In the agentic era:
• Knowledge is commoditized
• Inference is where decisions and value are created
Whether it’s a drone deciding on a flight path, a surgical robot adjusting grip, or an AI agent monitoring factory safety, the value lies in the decision made in context, in real time.
You don’t monetize knowledge. You monetize action.
6. The Only Sustainable Path: Device-First, Cloud-Aware AI
7. Final Thought
Anyone arguing that AI inference must stay in the cloud is applying the logic of Web 2.0 to the reality of agentic AI. That model collapses under scale.
The AI economy will be built not on access to knowledge, but on the ability to act on it instantly, locally, and intelligently.
• Inference is the monetization layer
• Context lives on the device
• Cloud is a fallback, not a default
The future of AI is not more infrastructure. It’s smarter devices.
Founder, Chairman | Media, Licensing, Equine IOT Technologies
3wAbsolutely. Cloud-first thinking made sense yesterday. But in the agentic era, intelligence isn’t about scale—it’s about proximity. Latency, energy, and bandwidth aren’t just tech hurdles—they’re economic and user experience dead-ends. The edge isn’t an optimization—it’s the requirement. Fay Arjomandi
Touch and Sensing ASIC Architect at Apple
1moNice breakdown!!
Windows AI NPU Strategy and Execution, BS and MS EE, ex-Microsoft, ex-AT&T Wireless, ex-Motorola, former CMO
1moHaving spent the last few years getting AI onto edge devices, you're definitely on the right track here. The problem is that most people don't really understand what AI is and think it's only about the frontier LLM models. You are right to highlight other types of AI here and the need for some of that to be on the device. But there will always be models that are suitable for cloud inferencing, and simply too large to execute locally. Running those models in the cloud isn't a fallback, it's a necessity.
Research Associate
1mo💡 Great insight
Chief AI & Data Officer | Stanford PhD AI Pioneer | AI-Powered Tech Innovation for Global Enterprises | 15 Patents | Ex-Intel, Booking Holdings, Samsung, Flipkart, Jio | Successful Tech Entrepreneur -Avnera, WiViu
1moCompletely agree Siavash Alamouti. That's why my long term roadmap includes Edge AI as a key element