Multi-Provider Chat App: LiteLLM, Streamlit, Ollama, Gemini, Claude, Perplexity, and Modern LLM Integration
Ever dreamed of chatting with multiple AI models seamlessly? Discover how to build your own multi-provider chat app that connects ChatGPT, Claude, Gemini, and more — all in one conversation! Dive into the world of LiteLLM and Streamlit for a user-friendly experience.
Create a multi-provider chat app using LiteLLM and Streamlit to seamlessly connect various AI models like ChatGPT, Claude, and Gemini, enabling users to manage conversations and settings with minimal code.
The app integrates local models through Ollama for enhanced privacy and performance. Future enhancements include RAG capabilities and file upload support.
Building a Multi-Provider Chat App: LiteLLM, Streamlit, and Modern LLM Integration
Have you ever wanted to create your own chat application that can leverage multiple language models from different providers? Imagine switching seamlessly between ChatGPT, Claude, Gemini, and even local models running on your own machine — all within the same conversation interface.
In this tutorial, we’ll explore how to build exactly that: a powerful, flexible chat application that supports multiple LLM providers through a clean, user-friendly interface. Best of all, we’ll do it with surprisingly little code thanks to the power of LiteLLM and Streamlit.
What We’re Building
We’re creating a multi-provider chat application that allows users to:
Let’s dive into how this application works and explore the technologies that make it possible.
Streamlit: Rapid Application Development for AI
Streamlit is a revolutionary Python library that transforms the way developers build data and AI applications. Unlike traditional web frameworks that require HTML, CSS, and JavaScript knowledge, Streamlit lets you create interactive web applications using pure Python. This dramatically accelerates development time — what might take days or weeks with conventional frameworks can often be accomplished in hours with Streamlit.
I’ve explored Streamlit extensively in a series of articles, from basic concepts and implementations to more advanced techniques and real-world applications. For those wanting a comprehensive guide, check out our Streamlit Mastery book, which covers everything from fundamentals to deployment.
In our chat application, Streamlit powers the entire user interface — from the chat display to the sidebar controls — with just a few hundred lines of Python. This efficiency allows us to focus on integrating LLM providers rather than wrestling with frontend development.
Ollama: Bringing AI to Your Local Machine
Ollama represents a significant advancement in democratizing access to powerful language models. It allows you to run open-source large language models directly on your own hardware, eliminating API costs and addressing privacy concerns. In our application, we’ve integrated several cutting-edge models through Ollama, including:
Our application dynamically adjusts settings based on the selected model’s requirements, optimizing for performance and memory usage while providing helpful guidance to users about resource needs for different models.
LiteLLM: One API to Rule Them All
LiteLLM serves as the unifying layer that enables our application to communicate seamlessly with multiple LLM providers through a consistent interface. It abstracts away the differences between API formats, authentication methods, and response structures, allowing us to switch between providers with minimal code changes.
This library handles everything from formatting messages correctly for each provider to managing API keys and handling streaming responses. Without LiteLLM, we would need to implement separate client code for each provider, significantly increasing the complexity of our application.
Provider Integration Challenges and Solutions
Integrating multiple LLM providers presented several interesting challenges, each requiring custom solutions:
OpenAI (GPT-4o, GPT-4.1) required handling specific response formats and settings like reasoning_effort. We implemented special handling for newer models like GPT-4o that have different temperature restrictions and token limits compared to older models.
Google Gemini models needed particular attention to message formatting and response parsing. We created a dedicated provider class that correctly handles Gemini’s API quirks while maintaining the same interface as other providers.
Anthropic Claude models (Claude 3 Opus, Sonnet, and Haiku) required adaptation to their specific message structure and system prompt positioning. Our implementation dynamically adjusts token limits based on the specific Claude model being used.
Perplexity presented unique challenges with its strict alternating message format requirements. We implemented special validation logic to ensure messages are always properly structured before sending requests.
Each provider integration required careful tuning and customization while maintaining a consistent interface for our application. This approach allows users to seamlessly switch between providers while experiencing the unique strengths of each model.
The Technology Stack
Our application leverages several key technologies:
LiteLLM is the secret sauce that allows us to create a unified interface for multiple AI model providers. Instead of handling different API formats and authentication methods for each provider, LiteLLM provides a consistent abstraction layer.
Streamlit gives us the ability to quickly build a responsive web interface without writing HTML, CSS, or JavaScript. It turns Python code into interactive web applications with minimal effort.
Project Structure
Before diving into the details, let’s get a high-level overview of our project structure:
chat/
│
├── ./
│ ├── pyproject.toml # Project dependencies and metadata
│
├── test/ # Test directory
│ └── chat/ # Test files for the chat application
│
├── docs/ # Documentation
│ └── images/ # Images for documentation
│
└── src/ # Source code
└── chat/ # Main application code
├── __init__.py
├── app.py # Main application entry point
├── ai/ # LLM provider integrations
├── ui/ # User interface components
├── conversation/ # Conversation models and storage
└── util/ # Utility functions
Core Components
Let’s now look at the main components of our application:
Getting Started with LiteLLM
LiteLLM is a powerful library that provides a unified interface to multiple LLM providers. Let’s see how we’ve implemented the provider integration.
The LLM Provider Abstract Base Class
At the core of our provider integration is an abstract base class that defines the interface for all LLM providers:
class LLMProvider(ABC):
"""Abstract base class for LLM providers."""
@abstractmethod
async def generate_completion(
self,
prompt: str,
output_format: str = "text",
options: Optional[Dict[str, Any]] = None,
conversation: Optional[Conversation] = None
) -> str:
"""Generate a completion from the LLM for the given prompt."""
pass
async def generate_json(
self,
prompt: str,
schema: Dict[str, Any],
options: Optional[Dict[str, Any]] = None,
conversation: Optional[Conversation] = None
) -> Dict[str, Any]:
"""Generate JSON output matching the schema."""
# Implementation details...
This abstract class ensures that all providers implement the same interface, making them interchangeable in our application.
Provider Implementations
Let’s look at one of our provider implementations, the AnthropicProvider:
class AnthropicProvider(LLMProvider):
"""Integration with Anthropic Claude models using LiteLLM."""
def __init__(
self,
api_key: Optional[str] = None,
model: str = "claude-3-7-sonnet-latest"
):
self.api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
if not self.api_key:
raise ValueError(
"Anthropic API key is required. "
"Set it in .env or as an environment variable."
)
self.model = model
self.original_model_name = model
# Use LiteLLM's model naming convention for Anthropic
if not self.model.startswith("anthropic/"):
self.model = f"anthropic/{model}"
os.environ["ANTHROPIC_API_KEY"] = self.api_key
try:
self.client = litellm
logger.info(
f"AnthropicProvider initialized with model: {self.model}"
)
except ImportError:
logger.error(
"litellm package not installed. "
"Please install it (e.g., pip install litellm)"
)
raise
async def generate_completion(
self,
prompt: str,
output_format: str = "text",
options: Optional[Dict[str, Any]] = None,
conversation: Optional[Conversation] = None
) -> str:
"""Generate a completion from Claude using LiteLLM."""
# Implementation details...
We’ve implemented similar classes for OpenAI, Google Gemini, Perplexity, and Ollama, each following the same pattern but with provider-specific configurations.
Building the User Interface with Streamlit
Now, let’s look at how we’ve built the user interface using Streamlit. The UI is divided into three main components:
Chat UI
Here’s a look at the chat UI implementation:
def display_chat_messages(
messages: List[Dict[str, str]]
) -> None:
"""Display the chat message history."""
for message in messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
def handle_user_input(
llm_provider: Optional[LLMProvider],
conversation: Optional[Conversation],
conversation_storage: ConversationStorage,
selected_provider: str,
selected_model: str,
temperature: float,
system_prompt: str = "You are a helpful and concise "
"chat assistant..."
) -> None:
"""Handle user input, generate responses, and update
the conversation."""
# Implementation details...
The UI is clean and intuitive, leveraging Streamlit’s built-in chat components.
Sidebar for Settings
The sidebar provides settings for selecting the provider, model, and managing conversations:
def render_provider_settings(
providers: Dict[str, Dict[str, Any]]
) -> Tuple[str, str, float]:
"""Render the provider settings section in the sidebar."""
st.header("Provider Settings")
# Provider selection
selected_provider = st.selectbox(
"Select Provider",
list(providers.keys())
)
# Model selection for the chosen provider
provider_info = providers[selected_provider]
selected_model = st.selectbox(
"Select Model",
provider_info["models"]
)
# Temperature slider
temperature = st.slider(
"Temperature",
min_value=0.0,
max_value=1.0,
value=0.7,
step=0.1
)
# Provider-specific settings
if selected_provider == "Ollama":
render_ollama_settings(selected_model)
return selected_provider, selected_model, temperature
Conversation Management
A key feature of our application is the ability to save and load conversations. Let’s look at how we implement this functionality:
class Conversation(BaseModel):
"""A model for storing conversation history."""
id: str
title: Optional[str] = None
messages: List[Message] = Field(default_factory=list)
created_at: datetime = Field(default_factory=datetime.now)
updated_at: datetime = Field(default_factory=datetime.now)
def add_message(self, content: str, message_type: MessageType,
role: Optional[str] = None) -> Message:
"""Add a new message to the conversation."""
# Implementation details...
def to_llm_messages(self) -> List[dict]:
"""Convert conversation history to a format suitable for LLM APIs."""
# Implementation details...
And the storage mechanism:
class ConversationStorage:
"""Utility class for storing and retrieving
conversations."""
def __init__(
self,
storage_dir: Union[str, Path] = "conversations"
):
self.storage_dir = Path(storage_dir)
self.storage_dir.mkdir(
parents=True,
exist_ok=True
)
logger.info(
f"Initialized ConversationStorage in directory: "
f"{self.storage_dir}"
)
def save_conversation(
self,
conversation: Conversation
) -> bool:
"""Save a conversation to a JSON file."""
# Implementation details...
def load_conversation(
self,
conversation_id: str
) -> Optional[Conversation]:
"""Load a conversation from a JSON file."""
# Implementation details...
This allows us to persist conversations across sessions and switch between them.
Putting It All Together: The Main Application
Finally, let’s look at how everything comes together in the main application:
def main():
"""Main application function."""
# Setup environment and page
setup_environment()
setup_page()
# Get available providers
providers = get_available_providers()
# Get conversation storage
conversation_storage = get_conversation_storage()
# Render sidebar components
with st.sidebar:
# Provider settings
(
selected_provider,
selected_model,
temperature
) = render_provider_settings(providers)
# Conversation management
render_conversation_management(
conversation_storage,
selected_provider,
selected_model
)
# Initialize provider
llm_provider, error_message = initialize_provider(
selected_provider,
selected_model
)
# Display error message if provider initialization failed
if error_message:
st.error(error_message)
st.sidebar.error(
f"Provider failed: {error_message}"
)
# Initialize chat history
initialize_chat_history(
selected_provider,
selected_model
)
# Initialize conversation
initialize_conversation_id()
conversation = get_conversation(conversation_storage)
# Display existing chat messages
display_chat_messages(st.session_state.messages)
# Handle user input
handle_user_input(
llm_provider=llm_provider,
conversation=conversation,
conversation_storage=conversation_storage,
selected_provider=selected_provider,
selected_model=selected_model,
temperature=temperature
)
# Render current conversation details in sidebar
with st.sidebar:
render_current_conversation_details(
conversation_storage,
selected_provider,
selected_model
)
Special Feature: Ollama Integration
One of the most exciting features of our application is the ability to use local models through Ollama. Here’s how we’ve implemented Ollama-specific settings:
def render_ollama_settings(selected_model: str = ""):
"""Render Ollama-specific settings."""
st.subheader("Ollama Settings")
# Get the current base URL
current_base_url = os.environ.get(
"OLLAMA_BASE_URL",
"<http://localhost:11434>"
)
# Allow the user to change the base URL
ollama_base_url = st.text_input(
"Ollama API Base URL",
value=current_base_url
)
# Model-specific settings based on size
if selected_model:
st.subheader(f"Model: {selected_model}")
# Show different settings based on model size
is_large_model = any(
size in selected_model
for size in ["70b", "72b"]
)
is_medium_model = any(
size in selected_model
for size in ["27b", "32b"]
)
if is_large_model:
st.warning(
"⚠️ This is a very large model that requires "
"significant RAM (40-45GB)."
)
# Context size settings for large models
# ...
This allows users to use powerful local models like Llama, Gemma, and others directly from their own machines.
System Architecture Diagram
Here’s a high-level view of our application’s architecture:
Class Diagram
Here’s a simplified class diagram showing the relationships between our main components:
Sequence Diagram: Chat Interaction
This sequence diagram illustrates how a typical chat interaction works in our application:
Project Directory Structure
Let’s examine the detailed structure of our project:
chat/
├── ./
│ └── pyproject.toml # Project metadata, dependencies, and build configuration
├── test/ # Test directory
│ └── chat/ # Test files for the chat application
├── docs/ # Documentation
│ └── images/ # Images for documentation
└── src/ # Source code
└── chat/ # Main application code
├── __init__.py # Package initialization
├── app.py # Main application entry point
├── ai/ # LLM provider integrations
│ ├── __init__.py
│ ├── anthropic.py # Anthropic Claude provider
│ ├── google_gemini.py # Google Gemini provider
│ ├── llm_provider.py # Abstract base class for providers
│ ├── ollama.py # Ollama local model provider
│ ├── open_ai.py # OpenAI provider
│ ├── perplexity.py # Perplexity provider
│ └── provider_manager.py # Provider initialization and management
├── conversation/ # Conversation models and storage
│ ├── __init__.py
│ ├── conversation.py # Conversation and Message models
│ └── conversation_storage.py # Conversation persistence
├── ui/ # User interface components
│ ├── __init__.py
│ ├── chat.py # Chat display and input handling
│ ├── conversation_manager.py # UI for conversation management
│ └── sidebar.py # Sidebar UI components
└── util/ # Utility functions
├── __init__.py
├── json_util.py # JSON handling utilities
└── logging_util.py # Logging configuration
Key Directories and Files
Let’s briefly describe the main directories and their purposes:
3. src/chat/conversation/: Handles conversation models and storage:
4. src/chat/ui/: Contains the Streamlit UI components:
5. src/chat/util/: Utility functions:
6. src/chat/app.py: The main application entry point that ties everything together.
Running the Application
Now that we understand the structure and components of our application, let’s see how to run it:
pip install poetry poetry install
2. Set up API keys: Create a .env file in the root directory with your API keys:
OPENAI_API_KEY="your_openai_api_key"
ANTHROPIC_API_KEY=your_anthropic_api_key
GOOGLE_API_KEY=your_google_api_key
PERPLEXITY_API_KEY=your_perplexity_api_key
3. Run the application:
poetry run streamlit run src/chat/app.py
4. For Ollama support: Install Ollama from ollama.ai and pull the models you want to use:
ollama pull gemma3:27b
ollama pull llama4:scout
Local LLM Integration with Ollama
One of the most powerful features of our application is the integration with Ollama, which allows you to run models locally on your machine. This is especially valuable for:
Our application provides special configuration options for Ollama, including:
Here’s a glimpse of the Ollama provider implementation:
class OllamaProvider(LLMProvider):
"""Integration with Ollama models using LiteLLM."""
def __init__(
self,
api_key: Optional[str] = None,
model: str = "llama3.3:latest"
):
# Ollama doesn't require an API key, but we'll keep
# this parameter for consistency
self.api_key = api_key
# LiteLLM's naming convention for Ollama models
# depends on the model name format
self.original_model_name = model
if ":" in model:
# Models with versions/variants like gemma3:27b
# should be formatted as ollama/gemma3:27b
self.model = f"ollama/{model}"
elif not model.startswith("ollama/"):
self.model = f"ollama/{model}"
else:
self.model = model
# Default Ollama base URL
self.base_url = os.getenv(
"OLLAMA_BASE_URL",
"<http://localhost:11434>"
)
os.environ["OLLAMA_API_BASE"] = self.base_url
try:
self.client = litellm
logger.info(
f"OllamaProvider initialized with model: "
f"{self.model} at {self.base_url}"
)
except ImportError:
logger.error(
"litellm package not installed. "
"Please install it (e.g., pip install litellm)"
)
raise
Future Enhancements
While our current application is already quite powerful, there are several exciting enhancements planned for future articles:
1. RAG (Retrieval-Augmented Generation)
We’ll be adding RAG capabilities to allow the chat application to pull information from your documents and provide more contextually relevant responses. This will be particularly useful for domain-specific applications where you want the LLM to have access to your proprietary information.
For more information on building RAG systems, check out:
2. File Upload and Access
We’ll implement the ability to upload and process various file types, including:
This will allow the chat application to analyze and discuss the contents of these files.
3. MCP (Model Context Protocol) Support
We’ll add support for the Model Context Protocol, which enables more sophisticated interactions between different AI models. MCP allows for better reasoning, fact-checking, and specialized task delegation.
To learn more about MCP, check out:
Conclusion
In this tutorial, we’ve explored how to build a powerful multi-provider chat application using LiteLLM and Streamlit. We’ve seen how to:
The complete source code for this project is available on GitHub at https://github.com/RichardHightower/chat.
By leveraging these technologies, you can create a flexible, powerful chat application that gives you access to the best AI models available, all through a single interface.
About the Author
Rick Hightower is a software developer and technology enthusiast with a passion for AI and natural language processing. He has extensive experience in building scalable, distributed systems and is currently focused on AI integration in enterprise applications.
Rick Hightower brings extensive enterprise experience as a former executive and distinguished engineer at a Fortune 100 company, where he specialized in delivering Machine Learning and AI solutions to deliver intelligent customer experiences. His expertise spans both the theoretical foundations and practical applications of AI technologies.
As a TensorFlow certified professional and graduate of Stanford University’s comprehensive Machine Learning Specialization, Rick combines academic rigor with real-world implementation experience. His training includes mastery of supervised learning techniques, neural networks, and advanced AI concepts, which he has successfully applied to enterprise-scale solutions.
With a deep understanding of both the business and technical aspects of AI implementation, Rick bridges the gap between theoretical machine learning concepts and practical business applications, helping organizations leverage AI to create tangible value. Rick has been very actively developing GenAI applications.