LiteLLM and MCP: One Gateway to Rule All AI Models
Picture this: You’ve built a sophisticated AI tool integration, but your client suddenly wants to switch from OpenAI to Claude for cost reasons. Or maybe they need to use local models for sensitive data while leveraging cloud models for general queries. Without proper abstraction, each change means rewriting your integration code. LiteLLM combined with the Model Context Protocol (MCP) transforms this nightmare into a simple configuration change.
This article demonstrates how LiteLLM’s universal gateway integrates with MCP to create truly portable AI tool integrations. Whether using OpenAI, Anthropic, AWS Bedrock, or local models through Ollama, your MCP tools work seamlessly across all of them.
About Our MCP Server: The Customer Service Assistant
Before exploring how to connect LiteLLM to MCP, it’s helpful to understand what we’re linking to. We built a full customer service MCP server using FastMCP in our complete MCP guide. This server serves as our foundation for demonstrating different client integrations.
Our MCP server exposes three powerful tools that any AI system can leverage:
Available Tools:
The server also provides a customer resource (customer://{customer_id}) for direct customer data access and includes a prompt template for generating professional customer service responses.
What makes this special is that these tools work with any MCP-compatible client, whether using OpenAI, Claude, LangChain, DSPy, or any other framework. The same server, built once, serves them all. This is the power of standardization that MCP brings to AI tool integration.
In this article, we’ll explore how LiteLLM connects to this server and enables these tools to work with over 100 different LLM providers.
Understanding LiteLLM: The Universal LLM Gateway
LiteLLM is more than just another AI library — it’s a universal translator for language models. Think of it as the Rosetta Stone of AI APIs, enabling you to write code once and run it with any supported model. Key features include:
For deeper dives into LiteLLM’s capabilities, check out my articles on building a multi-provider chat application and enhancing it with RAG and streaming.
The Power of LiteLLM + MCP
Combining LiteLLM with MCP creates unprecedented flexibility:
Building Your First LiteLLM + MCP Integration
Let’s create an integration demonstrating LiteLLM’s ability to use MCP tools across different providers.
Step 1: Setting Up the Integration
Here’s the core setup that connects LiteLLM to your MCP server:
import asyncio
import json
import litellm
from litellm import experimental_mcp_client
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from config import Config
async def setup_litellm_mcp():
"""Set up LiteLLM with MCP tools."""
# Create MCP server connection
server_params = StdioServerParameters(
command="poetry",
args=["run", "python", "src/main.py"]
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize the MCP connection
await session.initialize()
# Load MCP tools in OpenAI format
tools = await experimental_mcp_client.load_mcp_tools(
session=session,
format="openai"
)
print(f"Loaded {len(tools)} MCP tools")
The key insight here is the format="openai" parameter. LiteLLM's experimental MCP client loads tools in OpenAI's function calling format, which LiteLLM can then translate to any provider's format.
Step 2: Multi-Model Testing
One of LiteLLM’s strengths is enabling the same code to work with multiple models:
# Dynamically select models based on configuration
models_to_test = []
if Config.LLM_PROVIDER == "openai":
models_to_test.append(Config.OPENAI_MODEL)
elif Config.LLM_PROVIDER == "anthropic":
models_to_test.append(Config.ANTHROPIC_MODEL)
else:
# Test with multiple providers
models_to_test = [Config.OPENAI_MODEL, Config.ANTHROPIC_MODEL]
for model in models_to_test:
print(f"\nTesting with {model}...")
# Same code works for any model
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools,
)
This flexibility means you can:
Step 3: Handling Tool Execution
LiteLLM standardizes tool execution across providers:
# Check if the model made tool calls
if hasattr(message, "tool_calls") and message.tool_calls:
print(f"🔧 Tool calls made: {len(message.tool_calls)}")
# Process each tool call
for call in message.tool_calls:
print(f" - Executing {call.function.name}")
# Execute the tool through MCP
arguments = json.loads(call.function.arguments)
result = await session.call_tool(
call.function.name,
arguments
)
# Add tool result to conversation
messages.append({
"role": "tool",
"content": str(result.content),
"tool_call_id": call.id,
})
# Get final response with tool results
final_response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools,
)
This code demonstrates how LiteLLM:
Understanding the Flow
Let’s visualize how LiteLLM orchestrates communication between different LLM providers and MCP tools:
This diagram reveals LiteLLM’s role as a universal translator, converting between different provider formats while maintaining a consistent interface for your application. It shows how it seamlessly allows many models to support MCP tooling.
Real-World Scenarios
Here are some interesting things that you can do with LiteLLM.
Scenario 1: Cost-Optimized Routing
# Route simple queries to cheaper models
if is_simple_query(message):
model = "gpt-4.1-mini" # Cheaper
else:
model = "gpt-4.1" # More capable
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools
)
Scenario 2: Compliance-Based Routing
# Use local models for sensitive data
if contains_pii(message):
model = "ollama/llama2" # Local model
else:
model = "claude-sonnet-4-0" # Cloud model
Scenario 3: Fallback Handling
# LiteLLM can automatically handle fallbacks
models = ["gpt-4.1", "claude-sonnet-4-0", "ollama/mixtral"]
for model in models:
try:
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools
)
break # Success, exit loop
except Exception as e:
print(f"Failed with {model}, trying next...")
continue
Architectural Insights
The complete architecture shows how LiteLLM bridges multiple worlds:
This architecture provides several key benefits:
Advanced Patterns
Pattern 1: Provider-Specific Optimizations
# Customize parameters per provider
provider_configs = {
"gpt-4.1": {"temperature": 0.7, "max_tokens": 2000},
"claude-sonnet-4-0": {"temperature": 0.5, "max_tokens": 4000},
"ollama/mixtral": {"temperature": 0.8, "max_tokens": 1000}
}
model = select_best_model(query)
config = provider_configs.get(model, {})
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools,
**config # Provider-specific parameters
)
Pattern 2: Cost and Performance Tracking
# LiteLLM tracks costs automatically
from litellm import completion_cost
response = await litellm.acompletion(
model=model,
messages=messages,
tools=tools
)
cost = completion_cost(completion_response=response)
print(f"This request cost: ${cost:.4f}")
Pattern 3: Streaming with Tool Support
# Stream responses while maintaining tool support
async for chunk in await litellm.acompletion(
model=model,
messages=messages,
tools=tools,
stream=True
):
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
# Handle streaming tool calls
if hasattr(chunk.choices[0].delta, "tool_calls"):
# Process tool calls in real-time
pass
Getting Started
Clone the example repository:
git clone git@github.com:RichardHightower/mcp_article1.git
cd mcp_article1
Install LiteLLM and dependencies (follow instructions in README.md):
poetry add litellm python-mcp-sdk
Configure your providers:
# .env file
OPENAI_API_KEY=your-key
ANTHROPIC_API_KEY=your-key
# Add any other provider keys
Run the integration:
poetry run python src/litellm_integration.py
Key Takeaways
The combination of LiteLLM and MCP represents the ultimate flexibility in AI tool integration:
By abstracting the tool layer (MCP) and the model layer (LiteLLM), you create AI systems that adapt to changing requirements without code changes. This is enterprise-grade flexibility at its finest.
References
Next Steps
Ready to build provider-agnostic AI tools? Here’s your roadmap:
The future of AI isn’t tied to a single provider — it’s about choosing the right tool for each job. With LiteLLM and MCP, you can make that choice without rewriting your code.
Want to explore more integration patterns? Check out our articles on OpenAI + MCP, DSPy’s self-optimizing approach, and LangChain workflows. For the complete guide to building MCP servers, see our comprehensive guide.
About the Author
Rick Hightower brings extensive enterprise experience as a former executive and distinguished engineer at a Fortune 100 company, where he specialized in Machine Learning and AI solutions to deliver an intelligent customer experience. His expertise spans the theoretical foundations and practical applications of AI technologies.
As a TensorFlow-certified professional and graduate of Stanford University’s comprehensive Machine Learning Specialization, Rick combines academic rigor with real-world implementation experience. His training includes mastery of supervised learning techniques, neural networks, and advanced AI concepts, which he has successfully applied to enterprise-scale solutions.
With a deep understanding of AI implementation's business and technical aspects, Rick bridges the gap between theoretical machine learning concepts and practical business applications, helping organizations leverage AI to create tangible value.