RAG is So Yesterday - Hello MCP

David Siroky

AI CTO @ Cisco APJC | Product Management | Solution Architecture | Generative AI

Published Aug 6, 2025

As someone who's been knee-deep with AI at Cisco, I've seen trends come and go, and today I want to talk about one that's starting to feel a bit dated: Retrieval-Augmented Generation, or RAG. Don't get me wrong—RAG has been a game-changer for making LLMs more accurate by pulling in external knowledge and combining that with the knowledge learned by the LLM. But as we push the boundaries of what AI can do, I'm excited about: Model Context Protocol, or MCP. It's not hype; it's just the next logical step in building smarter, more scalable systems. Let me explain why I think RAG might be yesterday's news, and how MCP could help us tackle real-world challenges more effectively.

First, a quick refresher on RAG for context. Imagine you're in a massive library, and you ask a question of the librarian (that's the LLM). The librarian is a super smart person but their knowledge is limited to what they have learned from what’s in the Library. If you ask them a question about something in a recently published book that’s not yet in their Library - they might infer a convincing answer based on their knowledge (hallucinate). Here is where RAG comes in and can help.

Retrieval: If the librarian thinks there may be more relevant information (context), they run off to search a filing cabinet for relevant new books that haven’t yet made it into the library, and grab a few excerpts of additional info that may help to answer the question.

Augmentation: The librarian then takes that information and weaves (generates) that info into their answer.

RAG is efficient for injecting fresh info into models that might otherwise hallucinate or rely on outdated training data to answer your question. It's also much less expensive than model Fine Tuning, so it's a great way to get started.

But here's the rub: as queries get more complex, that library search can become a bottleneck. Imagine you ask a follow up question, and the answer the LLM needed wasn’t in the relevant books it went searching for just a moment ago.

A basic RAG system might struggle on the follow up question if the excepts it initially pulled for the first question didn't contain information necessary to answer a follow up question. The system has to start the search process over, and if the follow-up question relies heavily on context from previous turns in the conversation, a simple RAG approach might also not efficiently connect those dots.

First, let's introduce function calling, supported by many newer LLMs. This is essentially the LLM's way of "phoning a friend" for help beyond its training data. When you ask a question like "What's the weather in Sydney?", the model doesn't hallucinate an answer—instead, it's trained to output a structured request (API call often in JSON) to call an external function, like getWeather(city: "Sydney")

The app then executes that function to get the result, and feeds it back to the LLM for a natural response. It's a game-changer for agents, enabling multi-step reasoning like ReAct (Reason + Act). But here's the catch: it's model-specific. OpenAI does it one way, Anthropic another, and you often need custom code for each tool, which can get messy in complex apps.

Now enter, MCP—short for Model Context Protocol, introduced by Anthropic in late 2024. Think of it as a universal adapter for LLMs, like USB-C for AI tools. MCP standardizes how models connect to external data, services, or actions through a client-server setup. An MCP server exposes tools (e.g., database queries or file access) in a consistent format, while an MCP client (integrated into your app or IDE) calls them dynamically.

Open Source LLMs such as Meta’s Llama family, and Mistral 7B support automatic function discovery via MCP Clients. What really sets MCP apart is its integration with tools via dedicated servers. These MCP servers act as a handyman’s toolbox, not just fetching data but actively using tools to reason and respond. For example, imagine an MCP server tapping into a SQL database for a quick lookup on customer metrics, reaching into Splunk for observability data or searching a PDF knowledge library to extract policy details.

The key differences? Function calling is LLM-driven: the model decides what to call and formats it, but execution relies on your app's custom logic. It's great for simple, one-off integrations but scales poorly with many tools or vendors. MCP layers on top, standardizing how calls happen—tool discovery, invocation, and responses are uniform, making it more interoperable for enterprise use. Function calling feels like ad-hoc wiring; MCP is a plug-and-play ecosystem. From what I've seen, MCP reduces integration headaches by 50% or more in multi-tool setups, though it's newer and requires some setup.

Where do they live in the stack? Function calling is baked into the LLM's core (fine-tuned during training) and handled in the toolchain—think backend servers or agent frameworks like LangChain. In an agentic web app, it's often in the middleware, parsing LLM outputs and executing calls. MCP, however, spans the ecosystem: servers run as lightweight processes (local or remote, e.g., for GitHub or Splunk access), while clients integrate into frontends like IDEs (Cursor, Cline, Claude Desktop) or custom agents. At Cisco, we're leveraging agentic protocols in our AI platforms to ensure secure tool access across hybrid clouds—it's all about making AI reliable without exposing sensitive data.

Looking ahead, I believe the future isn't just a chatting with a RAG-powered bot that does a simple lookup. Multiple Agentic calls will be operating in the background as you work – The next decade will be about building agentic systems—autonomous setups where MCP interfaces LLMs and tools dynamically. This setup scales "test-time compute," meaning we can pour more processing power into reasoning during inference without bloating the model itself. It also ramps up tool usage in the reasoning loop, allowing for iterative problem-solving.

Picture an AI agent at your company troubleshooting an issue: the Orchestration Agent (that’s the Agentic App in the picture below) uses reasoning to determine the workflow and the tools needed to action the prompt it received. It then calls MCP to pull historical logs via SQL or Splunk, cross-references them with a PDF search of internal policies and procedures - consults the LLM for insights, and loops back if needed—all without human intervention. It's practical scaling that addresses real enterprise needs, like handling massive data volumes on site without needing to move the data.

This is isn't without challenges. Implementing MCP requires thoughtful design to avoid context overload or tool integration hiccups.

There are MCP Servers for Cisco Networking, Webex & Splunk that can be integrated into agentic apps.

At Cisco, we're committed to leading in AI innovation, ensuring our solutions are reliable and ethical. We're investing in agentic frameworks, including A2A and AGNTCY to help create the next generation of Agentic protocols, making AI feel less like a black box and more like a trusted colleague. Just last week, Jeetu Patel (Cisco’s President & Chief Product Officer) announced Cisco is donating AGNTCY to the Linux foundation to ensure the future Agent to Agent interoperability is truly open.

In the end, MCP isn't about ditching RAG entirely—it's evolution. It's a humble reminder that AI is a journey, and we're all figuring it out together. What do you think? Have you experimented with similar concepts? Drop a comment below—I'd love to hear your take.

For more technical detail, read this blog post by @Omar Santos

Rupesh Raj

Building the App Store for AI Agents 🚀 at ContexaAI | Previously Co-Founder at LLMate | Previously Co-Founder at Jelly | Ex Alvarez and Marsal | IIT Bombay

Well articulated David Siroky. We are building ContexaAI - Firebase for MCP servers. You should definitely give it a try! Perfectly in your lane.

1 Reaction

Mike Masterson

David Siroky excellent post and insights for so many. All should read this - the evolution of RAG and the innovation of MCP. Keep it coming!

2 Reactions

Saiprasad Savant

AI Architect - Global AI Team

Given MCP is modular and scalable as also integrated with multiple AI ecosystems its well placed to manage fleet of AL models and Datasets...Well articulated David !!

RAG is So Yesterday - Hello MCP

David Siroky

AI CTO @ Cisco APJC | Product Management | Solution Architecture | Generative AI

More articles by this author

Others also viewed

High Performance Compute

Breakdown the BMC: Felafax

Nicole Hemsoth Prickett Joins VAST: A Q&A on the Future of AI and Data Infrastructure

Analysys Mason Insight - June

The Short

FOD#105: Will “Agent Wars” Shift From Cloud to Device?

It is not a race for the TOPs

Google: Five Things You (Probably) Didn't Know About Generative AI

May 21, 2025

Making AI Datacenters More Future-Proof Six Major Shifts Driven by the Evolution from Model Training to Inference

Explore topics

Blog: Agentic AI Without Hyperscaler Lock-In: Building Flexible, Future-Proof AI Systems

Aug 20, 2025

Blog: Why I'm A Big Fan of Vast Data's Advancements in Next-Gen AI Storage

Aug 13, 2025

My First 100 Days At Cisco - Building AI the Right Way

Jul 7, 2025

My First 100 Days At Cisco - Cisco Live 2025 & the Future of AI Applications

Jun 25, 2025

My First 100 Days At Cisco - Why Cisco’s Responsible AI Framework Matters More Than Anyone Thinks

Jun 17, 2025

My First 100 Days At Cisco - Reinventing Cybersecurity

Jun 10, 2025

My First 100 Days At Cisco - Making AI an Operating Principle

Jun 3, 2025

My First 100 Days At Cisco - The Splunk Acquisition

May 27, 2025

🚀 Unlocking the Future of AI Development with LlamaStack! 🦙

Oct 7, 2024

The Varying Pace of AI Adoption Across the Asia Pacific Region

Jun 11, 2024