SlideShare a Scribd company logo
© 2024 CloudNatix, All Rights Reserved
LLMariner
Transform your Kubernetes Cluster Into a GenAI platform
© 2024 CloudNatix, All Rights Reserved
LLMariner
Provide a unified AI/ML platform
with efficient GPU and K8s management
LLMariner
LLM
(inference,
fine-tuning, RAG)
Workbench
(Jupyter Notebook)
Non-LLM
Training
public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
© 2024 CloudNatix, All Rights Reserved
Example Use Cases
● Develop LLM applications with the API that is compatible with
OpenAI API
○ Leverage existing ecosystem to build applications
● Fine-tune models while keeping data safely and securely
in your on-premise datacenter
Code auto-completion
Chat bot
© 2024 CloudNatix, All Rights Reserved
Key Features
● LLM Inference
● LLM fine-tuning
● RAG
● Jupyter Notebook
● General-purpose training
● Flexible deployment model
● Efficient GPU management
● Security / access control
● GPU visibility/showback (*)
● Highly-reliable GPU
management (*)
For AI/ML team For infrastructure team
(*) under development
© 2024 CloudNatix, All Rights Reserved
High Level Architecture
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Control plane K8s
cluster
LLMariner Control
Plane for AI/ML
API endpoint
© 2024 CloudNatix, All Rights Reserved
Key Feature Details
© 2024 CloudNatix, All Rights Reserved
Features for AI/ML team and infra team
APIs for the AI/ML team
K8s cluster
OpenAI-compatible API
(chat completion, embedding, RAG, fine-tuning, …)
Workbench with
Jupyter Notebooks
Inference engine
User mgmt
General purpose
training jobs
Cluster federation
GPU workloads mgmt Storage mgmt
Model mgmt
Open models
Closed models
owned by your org
Fine-tuned models
Runtime mgmt
(e.g., autoscaling, routing)
vLLM
Nvidia
Triton
Ollama
Fine-tuning jobs
API usage audits
K8s cluster K8s cluster
Files
Vector DBs
Jupyter Notebooks
Training jobs
Kueue
Dex
API authn/authz
API key mgmt Orgs & projects mgmt
Cluster mgmt Secure session mgmt
© 2024 CloudNatix, All Rights Reserved
LLM Inference Serving
● Compatible with OpenAI API
○ Can leverage the existing ecosystem and applications
● Advanced capabilities surpassing standard inference runtimes,
such as vLLM
○ Optimized request serving and GPU management
○ Multiple inference runtime support
○ Multiple model support
○ Built-in RAG integration
© 2024 CloudNatix, All Rights Reserved
Multiple Model and Runtime Support
● Multiple model support
● Multiple inference runtime support
Open models
from Hugging Face
Private models in
customers’
environment
Fine-tuned models
generated with
LLMariner
vLLM Ollama
Nvidia Triton
Inference Server
Hugging Face
TGI
Upcoming
Experimental
© 2024 CloudNatix, All Rights Reserved
Cluster X
Optimized Inference Serving
● Efficiently utilize GPU to achieve high throughput and low latency
● Key technologies:
○ Autoscaling
○ Model-aware request load balancing & routing
○ Multi-model management & caching
○ Multi-cluster/cloud federation
LLMariner
Inference Manager Engine
vLLM
Llama 3.1
vLLM
Gemma 2
Autoscaling
vLLM
Llama 3.1
Ollama
Deepseek
Coder
Cluster Y
© 2024 CloudNatix, All Rights Reserved
Built-in RAG Integration
● Use API compatible OpenAI to manage vector stores and files
○ Use Milvus as an underlying vector DB
● Inference engine retrieves relevant data when processing requests
File
File
File
Upload and create
embeddings
LLMariner
Inference
Engine
Retrieve data
© 2024 CloudNatix, All Rights Reserved
GPU K8s cluster
Beyond LLM Inference
● Provide LLM fine-tuning, general-purpose training, and Jupyter
Notebook management
● Empower AI/ML teams to harness the full power of GPUs in a
secure self-contained environment
Supervised
Fine-tuning Trainer
© 2024 CloudNatix, All Rights Reserved
A Fine-tuning Example
● Submit a fine-tuning job using the OpenAI Python library
○ Fine-tuned job runs in an underlying Kubernetes cluster
● Enforce quota with integration with open source Kueue
K8s cluster
GPU GPU GPU GPU
Fine-tuning
job
Fine-tuning
job
Quota enforcement
with Kueue
submit
© 2024 CloudNatix, All Rights Reserved
Project X
Enterprise-Ready Access Control
● Control API scope with “organizations” and “projects”
○ A user in Project X can access fine-tuned models generated by
other users in project X
○ A user in Project Y cannot access the fine-tuned models in X
● Can be integrated with a customer’s identity management platform
(e.g., SAML, OIDC)
Project Y
User 1
User 2
Fine-tuned
model User 3
create
read
cannot access
© 2024 CloudNatix, All Rights Reserved
Supported Deployment Models
Single public cloud
Single private cloud
Air-gapped env
Appliance
Hybrid cloud
(public & private)
Multi-cloud federation
Private cloud
Public cloud
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud A
LLMariner
Control Plane
K8s cluster
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud B
LLMariner
Agent
※ No need to open incoming ports in worker clusters, only outgoing port 443 is required

More Related Content

PDF
LLMariner - Transform your Kubernetes Cluster Into a GenAI platform
PDF
KubeCon & CloudNative Con 2024 Artificial Intelligent
PPTX
Leonid Kuligin "Training ML models with Cloud"
PDF
Productionizing Machine Learning - Bigdata meetup 5-06-2019
PDF
Machine Learning in the air
PPT
Enabling a hardware accelerated deep learning data science experience for Apa...
PDF
Distributed training of Deep Learning Models
PDF
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...
LLMariner - Transform your Kubernetes Cluster Into a GenAI platform
KubeCon & CloudNative Con 2024 Artificial Intelligent
Leonid Kuligin "Training ML models with Cloud"
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Machine Learning in the air
Enabling a hardware accelerated deep learning data science experience for Apa...
Distributed training of Deep Learning Models
“Parallelizing Machine Learning Applications in the Cloud with Kubernetes: A ...

Similar to LLMariner - Transform your Kubernetes Cluster Into a GenAI platform (20)

PPTX
GCP Deployment- Vertex AI
PDF
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
PDF
AI & Machine Learning Pipelines with Knative
PDF
IBM Cloud Paris Meetup - 20190520 - IA & Power
PDF
Machine learning on linux one - jupyter notebook
PDF
Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...
PDF
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
PDF
Netflix machine learning
PPTX
ECS for Amazon Deep Learning and Amazon Machine Learning
PPTX
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
PDF
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
PDF
PDF
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
PDF
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
PDF
GIST AI-X Computing Cluster
PDF
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
PDF
Running Legacy Applications with Containers
PDF
How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...
PPTX
Production ML Systems and Computer Vision with Google Cloud
PPTX
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
GCP Deployment- Vertex AI
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
AI & Machine Learning Pipelines with Knative
IBM Cloud Paris Meetup - 20190520 - IA & Power
Machine learning on linux one - jupyter notebook
Brain in the Cloud: Machine Learning on OpenStack & Kubernetes Done Right - E...
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Netflix machine learning
ECS for Amazon Deep Learning and Amazon Machine Learning
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GIST AI-X Computing Cluster
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Running Legacy Applications with Containers
How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...
Production ML Systems and Computer Vision with Google Cloud
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Ad

Recently uploaded (20)

PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPT
Project quality management in manufacturing
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Construction Project Organization Group 2.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Welding lecture in detail for understanding
DOCX
573137875-Attendance-Management-System-original
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CYBER-CRIMES AND SECURITY A guide to understanding
bas. eng. economics group 4 presentation 1.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Project quality management in manufacturing
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Construction Project Organization Group 2.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Welding lecture in detail for understanding
573137875-Attendance-Management-System-original
OOP with Java - Java Introduction (Basics)
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Lecture Notes Electrical Wiring System Components
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Geodesy 1.pptx...............................................
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Ad

LLMariner - Transform your Kubernetes Cluster Into a GenAI platform

  • 1. © 2024 CloudNatix, All Rights Reserved LLMariner Transform your Kubernetes Cluster Into a GenAI platform
  • 2. © 2024 CloudNatix, All Rights Reserved LLMariner Provide a unified AI/ML platform with efficient GPU and K8s management LLMariner LLM (inference, fine-tuning, RAG) Workbench (Jupyter Notebook) Non-LLM Training public/private cloud GPUs (gen G1, arch A1) GPUs (gen G2, arch A2) public/private cloud GPUs (gen G1, arch A1) GPUs (gen G2, arch A2) public/private cloud GPUs (gen G1, arch A1) GPUs (gen G2, arch A2)
  • 3. © 2024 CloudNatix, All Rights Reserved Example Use Cases ● Develop LLM applications with the API that is compatible with OpenAI API ○ Leverage existing ecosystem to build applications ● Fine-tune models while keeping data safely and securely in your on-premise datacenter Code auto-completion Chat bot
  • 4. © 2024 CloudNatix, All Rights Reserved Key Features ● LLM Inference ● LLM fine-tuning ● RAG ● Jupyter Notebook ● General-purpose training ● Flexible deployment model ● Efficient GPU management ● Security / access control ● GPU visibility/showback (*) ● Highly-reliable GPU management (*) For AI/ML team For infrastructure team (*) under development
  • 5. © 2024 CloudNatix, All Rights Reserved High Level Architecture Worker GPU K8s cluster LLMariner Agent for AI/ML Worker GPU K8s cluster LLMariner Agent for AI/ML Worker GPU K8s cluster LLMariner Agent for AI/ML Control plane K8s cluster LLMariner Control Plane for AI/ML API endpoint
  • 6. © 2024 CloudNatix, All Rights Reserved Key Feature Details
  • 7. © 2024 CloudNatix, All Rights Reserved Features for AI/ML team and infra team APIs for the AI/ML team K8s cluster OpenAI-compatible API (chat completion, embedding, RAG, fine-tuning, …) Workbench with Jupyter Notebooks Inference engine User mgmt General purpose training jobs Cluster federation GPU workloads mgmt Storage mgmt Model mgmt Open models Closed models owned by your org Fine-tuned models Runtime mgmt (e.g., autoscaling, routing) vLLM Nvidia Triton Ollama Fine-tuning jobs API usage audits K8s cluster K8s cluster Files Vector DBs Jupyter Notebooks Training jobs Kueue Dex API authn/authz API key mgmt Orgs & projects mgmt Cluster mgmt Secure session mgmt
  • 8. © 2024 CloudNatix, All Rights Reserved LLM Inference Serving ● Compatible with OpenAI API ○ Can leverage the existing ecosystem and applications ● Advanced capabilities surpassing standard inference runtimes, such as vLLM ○ Optimized request serving and GPU management ○ Multiple inference runtime support ○ Multiple model support ○ Built-in RAG integration
  • 9. © 2024 CloudNatix, All Rights Reserved Multiple Model and Runtime Support ● Multiple model support ● Multiple inference runtime support Open models from Hugging Face Private models in customers’ environment Fine-tuned models generated with LLMariner vLLM Ollama Nvidia Triton Inference Server Hugging Face TGI Upcoming Experimental
  • 10. © 2024 CloudNatix, All Rights Reserved Cluster X Optimized Inference Serving ● Efficiently utilize GPU to achieve high throughput and low latency ● Key technologies: ○ Autoscaling ○ Model-aware request load balancing & routing ○ Multi-model management & caching ○ Multi-cluster/cloud federation LLMariner Inference Manager Engine vLLM Llama 3.1 vLLM Gemma 2 Autoscaling vLLM Llama 3.1 Ollama Deepseek Coder Cluster Y
  • 11. © 2024 CloudNatix, All Rights Reserved Built-in RAG Integration ● Use API compatible OpenAI to manage vector stores and files ○ Use Milvus as an underlying vector DB ● Inference engine retrieves relevant data when processing requests File File File Upload and create embeddings LLMariner Inference Engine Retrieve data
  • 12. © 2024 CloudNatix, All Rights Reserved GPU K8s cluster Beyond LLM Inference ● Provide LLM fine-tuning, general-purpose training, and Jupyter Notebook management ● Empower AI/ML teams to harness the full power of GPUs in a secure self-contained environment Supervised Fine-tuning Trainer
  • 13. © 2024 CloudNatix, All Rights Reserved A Fine-tuning Example ● Submit a fine-tuning job using the OpenAI Python library ○ Fine-tuned job runs in an underlying Kubernetes cluster ● Enforce quota with integration with open source Kueue K8s cluster GPU GPU GPU GPU Fine-tuning job Fine-tuning job Quota enforcement with Kueue submit
  • 14. © 2024 CloudNatix, All Rights Reserved Project X Enterprise-Ready Access Control ● Control API scope with “organizations” and “projects” ○ A user in Project X can access fine-tuned models generated by other users in project X ○ A user in Project Y cannot access the fine-tuned models in X ● Can be integrated with a customer’s identity management platform (e.g., SAML, OIDC) Project Y User 1 User 2 Fine-tuned model User 3 create read cannot access
  • 15. © 2024 CloudNatix, All Rights Reserved Supported Deployment Models Single public cloud Single private cloud Air-gapped env Appliance Hybrid cloud (public & private) Multi-cloud federation Private cloud Public cloud LLMariner Control Plane LLMariner Agent Cloud Y Cloud A LLMariner Control Plane K8s cluster LLMariner Control Plane LLMariner Agent Cloud Y Cloud B LLMariner Agent ※ No need to open incoming ports in worker clusters, only outgoing port 443 is required