LLMariner - Transform your Kubernetes Cluster Into a GenAI platform

© 2024 CloudNatix, All Rights Reserved
LLMariner
Transform your Kubernetes Cluster Into a GenAI platform

LLMariner
Provide a unified AI/ML platform
with efficient GPU and K8s management
LLMariner
LLM
(inference,
fine-tuning, RAG)
Workbench
(Jupyter Notebook)
Non-LLM
Training
public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)

Example Use Cases
● Develop LLM applications with the API that is compatible with
OpenAI API
○ Leverage existing ecosystem to build applications
● Fine-tune models while keeping data safely and securely
in your on-premise datacenter
Code auto-completion
Chat bot

Key Features
● LLM Inference
● LLM fine-tuning
● RAG
● Jupyter Notebook
● General-purpose training
● Flexible deployment model
● Efficient GPU management
● Security / access control
● GPU visibility/showback (*)
● Highly-reliable GPU
management (*)
For AI/ML team For infrastructure team
(*) under development

High Level Architecture
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Control plane K8s
cluster
LLMariner Control
Plane for AI/ML
API endpoint

Key Feature Details

Features for AI/ML team and infra team
APIs for the AI/ML team
K8s cluster
OpenAI-compatible API
(chat completion, embedding, RAG, fine-tuning, …)
Workbench with
Jupyter Notebooks
Inference engine
User mgmt
General purpose
training jobs
Cluster federation
GPU workloads mgmt Storage mgmt
Model mgmt
Open models
Closed models
owned by your org
Fine-tuned models
Runtime mgmt
(e.g., autoscaling, routing)
vLLM
Nvidia
Triton
Ollama
Fine-tuning jobs
API usage audits
K8s cluster K8s cluster
Files
Vector DBs
Jupyter Notebooks
Training jobs
Kueue
Dex
API authn/authz
API key mgmt Orgs & projects mgmt
Cluster mgmt Secure session mgmt

LLM Inference Serving
● Compatible with OpenAI API
○ Can leverage the existing ecosystem and applications
● Advanced capabilities surpassing standard inference runtimes,
such as vLLM
○ Optimized request serving and GPU management
○ Multiple inference runtime support
○ Multiple model support
○ Built-in RAG integration

Multiple Model and Runtime Support
● Multiple model support
● Multiple inference runtime support
Open models
from Hugging Face
Private models in
customers’
environment
Fine-tuned models
generated with
LLMariner
vLLM Ollama
Nvidia Triton
Inference Server
Hugging Face
TGI
Upcoming
Experimental

Cluster X
Optimized Inference Serving
● Efficiently utilize GPU to achieve high throughput and low latency
● Key technologies:
○ Autoscaling
○ Model-aware request load balancing & routing
○ Multi-model management & caching
○ Multi-cluster/cloud federation
LLMariner
Inference Manager Engine
vLLM
Llama 3.1
vLLM
Gemma 2
Autoscaling
vLLM
Llama 3.1
Ollama
Deepseek
Coder
Cluster Y

Built-in RAG Integration
● Use API compatible OpenAI to manage vector stores and files
○ Use Milvus as an underlying vector DB
● Inference engine retrieves relevant data when processing requests
File
File
File
Upload and create
embeddings
LLMariner
Inference
Engine
Retrieve data

GPU K8s cluster
Beyond LLM Inference
● Provide LLM fine-tuning, general-purpose training, and Jupyter
Notebook management
● Empower AI/ML teams to harness the full power of GPUs in a
secure self-contained environment
Supervised
Fine-tuning Trainer

A Fine-tuning Example
● Submit a fine-tuning job using the OpenAI Python library
○ Fine-tuned job runs in an underlying Kubernetes cluster
● Enforce quota with integration with open source Kueue
K8s cluster
GPU GPU GPU GPU
Fine-tuning
job
Fine-tuning
job
Quota enforcement
with Kueue
submit

Project X
Enterprise-Ready Access Control
● Control API scope with “organizations” and “projects”
○ A user in Project X can access fine-tuned models generated by
other users in project X
○ A user in Project Y cannot access the fine-tuned models in X
● Can be integrated with a customer’s identity management platform
(e.g., SAML, OIDC)
Project Y
User 1
User 2
Fine-tuned
model User 3
create
read
cannot access

Supported Deployment Models
Single public cloud
Single private cloud
Air-gapped env
Appliance
Hybrid cloud
(public & private)
Multi-cloud federation
Private cloud
Public cloud
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud A
LLMariner
Control Plane
K8s cluster
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud B
LLMariner
Agent
※ No need to open incoming ports in worker clusters, only outgoing port 443 is required

LLMariner - Transform your Kubernetes Cluster Into a GenAI platform

More Related Content

Similar to LLMariner - Transform your Kubernetes Cluster Into a GenAI platform (20)

Recently uploaded (20)

LLMariner - Transform your Kubernetes Cluster Into a GenAI platform