SlideShare a Scribd company logo
© 2024 CloudNatix, All Rights Reserved
LLMariner
Transform your Kubernetes Cluster Into a GenAI platform
© 2024 CloudNatix, All Rights Reserved
LLMariner
Provide a unified AI/ML platform
with efficient GPU and K8s management
LLMariner
LLM
(inference,
fine-tuning, RAG)
Workbench
(Jupyter Notebook)
Non-LLM
Training
public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
public/private cloud
GPUs
(gen G1,
arch A1)
GPUs
(gen G2,
arch A2)
© 2024 CloudNatix, All Rights Reserved
Example Use Cases
● Develop LLM applications with the API that is compatible with
OpenAI API
○ Leverage existing ecosystem to build applications
● Fine-tune models while keeping data safely and securely
in your on-premise datacenter
Code auto-completion
Chat bot
© 2024 CloudNatix, All Rights Reserved
Key Features
● LLM Inference
● LLM fine-tuning
● RAG
● Jupyter Notebook
● General-purpose training
● Flexible deployment model
● Efficient GPU management
● Security / access control
● GPU visibility/showback (*)
● Highly-reliable GPU
management (*)
For AI/ML team For infrastructure team
(*) under development
© 2024 CloudNatix, All Rights Reserved
High Level Architecture
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Worker GPU K8s
cluster
LLMariner Agent
for AI/ML
Control plane K8s
cluster
LLMariner Control
Plane for AI/ML
API endpoint
© 2024 CloudNatix, All Rights Reserved
Key Feature Details
© 2024 CloudNatix, All Rights Reserved
LLM Inference Serving
● Compatible with OpenAI API
○ Can leverage the existing ecosystem and applications
● Advanced capabilities surpassing standard inference runtimes,
such as vLLM
○ Optimized request serving and GPU management
○ Multiple inference runtime support
○ Multiple model support
○ Built-in RAG integration
© 2024 CloudNatix, All Rights Reserved
Multiple Model and Runtime Support
● Multiple model support
● Multiple inference runtime support
Open models
from Hugging Face
Private models in
customers’
environment
Fine-tuned models
generated with
LLMariner
vLLM Ollama
Hugging Face
TGI
Nvidia
Tensor-RT LLM
Upcoming
© 2024 CloudNatix, All Rights Reserved
Cluster X
Optimized Inference Serving
● Efficiently utilize GPU to achieve high throughput and low latency
● Key technologies:
○ Autoscaling
○ Model-aware request load balancing & routing
○ Multi-model management & caching
○ Multi-cluster/cloud federation
LLMariner
Inference Manager Engine
vLLM
Llama 3.1
vLLM
Gemma 2
Autoscaling
vLLM
Llama 3.1
Ollama
Deepseek
Coder
Cluster Y
© 2024 CloudNatix, All Rights Reserved
Built-in RAG Integration
● Use API compatible OpenAI to manage vector stores and files
○ Use Milvus as an underlying vector DB
● Inference engine retrieves relevant data when processing requests
File
File
File
Upload and create
embeddings
LLMariner
Inference
Engine
Retrieve data
© 2024 CloudNatix, All Rights Reserved
GPU K8s cluster
Beyond LLM Inference
● Provide LLM fine-tuning, general-purpose training, and Jupyter
Notebook management
● Empower AI/ML teams to harness the full power of GPUs in a
secure self-contained environment
Supervised
Fine-tuning Trainer
© 2024 CloudNatix, All Rights Reserved
A Fine-tuning Example
● Submit a fine-tuning job using the OpenAI Python library
○ Fine-tuned job runs in an underlying Kubernetes cluster
● Enforce quota with integration with open source Kueue
K8s cluster
GPU GPU GPU GPU
Fine-tuning
job
Fine-tuning
job
Quota enforcement
with Kueue
submit
© 2024 CloudNatix, All Rights Reserved
Project X
Enterprise-Ready Access Control
● Control API scope with “organizations” and “projects”
○ A user in Project X can access fine-tuned models generated by
other users in project X
○ A user in Project Y cannot access the fine-tuned models in X
● Can be integrated with a customer’s identity management platform
(e.g., SAML, OIDC)
Project Y
User 1
User 2
Fine-tuned
model User 3
create
read
cannot access
© 2024 CloudNatix, All Rights Reserved
Supported Deployment Models
Single public cloud
Single private cloud
Air-gapped env
Appliance
Hybrid cloud
(public & private)
Multi-cloud federation
Private cloud
Public cloud
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud A
LLMariner
Control Plane
K8s cluster
LLMariner
Control Plane
LLMariner
Agent
Cloud Y
Cloud B
LLMariner
Agent
※ No need to open incoming ports in worker clusters, only outgoing port 443 is required

More Related Content

PDF
LLMariner - Transform your Kubernetes Cluster Into a GenAI platform
PDF
KubeCon & CloudNative Con 2024 Artificial Intelligent
PDF
Machine Learning in the air
PPTX
Leonid Kuligin "Training ML models with Cloud"
PDF
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
PDF
IBM Cloud Paris Meetup - 20190520 - IA & Power
PDF
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
PDF
Distributed training of Deep Learning Models
LLMariner - Transform your Kubernetes Cluster Into a GenAI platform
KubeCon & CloudNative Con 2024 Artificial Intelligent
Machine Learning in the air
Leonid Kuligin "Training ML models with Cloud"
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
IBM Cloud Paris Meetup - 20190520 - IA & Power
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Distributed training of Deep Learning Models

Similar to LLMariner - Transform your Kubernetes Cluster Into a GenAI platform (20)

PPT
Enabling a hardware accelerated deep learning data science experience for Apa...
PPTX
Cloud Native AI Introduction, Challenges
PDF
Productionizing Machine Learning - Bigdata meetup 5-06-2019
PDF
Machine learning on linux one - jupyter notebook
PDF
Netflix machine learning
PDF
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
PDF
WIA 2019 - Unearth the Journey of Implementing Vision Based Deep Learning Sol...
PPTX
The Increasing Use of the National Research Platform by the CSU Campuses
PPTX
The Edge to AI Deep Dive Barcelona Meetup March 2019
PPTX
Production ML Systems and Computer Vision with Google Cloud
PPTX
Clipper at UC Berkeley RISECamp 2017
PDF
PPTX
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
PPTX
Rack Cluster Deployment for SDSC Supercomputer
PDF
On premise ai platform - from dc to edge
PPTX
MLAPI_DataGuild.pptx
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
PPTX
GCP Deployment- Vertex AI
PPTX
03_aiops-1.pptx
PDF
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Enabling a hardware accelerated deep learning data science experience for Apa...
Cloud Native AI Introduction, Challenges
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Machine learning on linux one - jupyter notebook
Netflix machine learning
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
WIA 2019 - Unearth the Journey of Implementing Vision Based Deep Learning Sol...
The Increasing Use of the National Research Platform by the CSU Campuses
The Edge to AI Deep Dive Barcelona Meetup March 2019
Production ML Systems and Computer Vision with Google Cloud
Clipper at UC Berkeley RISECamp 2017
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Rack Cluster Deployment for SDSC Supercomputer
On premise ai platform - from dc to edge
MLAPI_DataGuild.pptx
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
GCP Deployment- Vertex AI
03_aiops-1.pptx
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Ad

Recently uploaded (20)

PDF
Well-logging-methods_new................
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPT
Project quality management in manufacturing
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
PPT on Performance Review to get promotions
PPTX
OOP with Java - Java Introduction (Basics)
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Well-logging-methods_new................
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Foundation to blockchain - A guide to Blockchain Tech
UNIT 4 Total Quality Management .pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Project quality management in manufacturing
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Sustainable Sites - Green Building Construction
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
bas. eng. economics group 4 presentation 1.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT on Performance Review to get promotions
OOP with Java - Java Introduction (Basics)
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Ad

LLMariner - Transform your Kubernetes Cluster Into a GenAI platform

  • 1. © 2024 CloudNatix, All Rights Reserved LLMariner Transform your Kubernetes Cluster Into a GenAI platform
  • 2. © 2024 CloudNatix, All Rights Reserved LLMariner Provide a unified AI/ML platform with efficient GPU and K8s management LLMariner LLM (inference, fine-tuning, RAG) Workbench (Jupyter Notebook) Non-LLM Training public/private cloud GPUs (gen G1, arch A1) GPUs (gen G2, arch A2) public/private cloud GPUs (gen G1, arch A1) GPUs (gen G2, arch A2) public/private cloud GPUs (gen G1, arch A1) GPUs (gen G2, arch A2)
  • 3. © 2024 CloudNatix, All Rights Reserved Example Use Cases ● Develop LLM applications with the API that is compatible with OpenAI API ○ Leverage existing ecosystem to build applications ● Fine-tune models while keeping data safely and securely in your on-premise datacenter Code auto-completion Chat bot
  • 4. © 2024 CloudNatix, All Rights Reserved Key Features ● LLM Inference ● LLM fine-tuning ● RAG ● Jupyter Notebook ● General-purpose training ● Flexible deployment model ● Efficient GPU management ● Security / access control ● GPU visibility/showback (*) ● Highly-reliable GPU management (*) For AI/ML team For infrastructure team (*) under development
  • 5. © 2024 CloudNatix, All Rights Reserved High Level Architecture Worker GPU K8s cluster LLMariner Agent for AI/ML Worker GPU K8s cluster LLMariner Agent for AI/ML Worker GPU K8s cluster LLMariner Agent for AI/ML Control plane K8s cluster LLMariner Control Plane for AI/ML API endpoint
  • 6. © 2024 CloudNatix, All Rights Reserved Key Feature Details
  • 7. © 2024 CloudNatix, All Rights Reserved LLM Inference Serving ● Compatible with OpenAI API ○ Can leverage the existing ecosystem and applications ● Advanced capabilities surpassing standard inference runtimes, such as vLLM ○ Optimized request serving and GPU management ○ Multiple inference runtime support ○ Multiple model support ○ Built-in RAG integration
  • 8. © 2024 CloudNatix, All Rights Reserved Multiple Model and Runtime Support ● Multiple model support ● Multiple inference runtime support Open models from Hugging Face Private models in customers’ environment Fine-tuned models generated with LLMariner vLLM Ollama Hugging Face TGI Nvidia Tensor-RT LLM Upcoming
  • 9. © 2024 CloudNatix, All Rights Reserved Cluster X Optimized Inference Serving ● Efficiently utilize GPU to achieve high throughput and low latency ● Key technologies: ○ Autoscaling ○ Model-aware request load balancing & routing ○ Multi-model management & caching ○ Multi-cluster/cloud federation LLMariner Inference Manager Engine vLLM Llama 3.1 vLLM Gemma 2 Autoscaling vLLM Llama 3.1 Ollama Deepseek Coder Cluster Y
  • 10. © 2024 CloudNatix, All Rights Reserved Built-in RAG Integration ● Use API compatible OpenAI to manage vector stores and files ○ Use Milvus as an underlying vector DB ● Inference engine retrieves relevant data when processing requests File File File Upload and create embeddings LLMariner Inference Engine Retrieve data
  • 11. © 2024 CloudNatix, All Rights Reserved GPU K8s cluster Beyond LLM Inference ● Provide LLM fine-tuning, general-purpose training, and Jupyter Notebook management ● Empower AI/ML teams to harness the full power of GPUs in a secure self-contained environment Supervised Fine-tuning Trainer
  • 12. © 2024 CloudNatix, All Rights Reserved A Fine-tuning Example ● Submit a fine-tuning job using the OpenAI Python library ○ Fine-tuned job runs in an underlying Kubernetes cluster ● Enforce quota with integration with open source Kueue K8s cluster GPU GPU GPU GPU Fine-tuning job Fine-tuning job Quota enforcement with Kueue submit
  • 13. © 2024 CloudNatix, All Rights Reserved Project X Enterprise-Ready Access Control ● Control API scope with “organizations” and “projects” ○ A user in Project X can access fine-tuned models generated by other users in project X ○ A user in Project Y cannot access the fine-tuned models in X ● Can be integrated with a customer’s identity management platform (e.g., SAML, OIDC) Project Y User 1 User 2 Fine-tuned model User 3 create read cannot access
  • 14. © 2024 CloudNatix, All Rights Reserved Supported Deployment Models Single public cloud Single private cloud Air-gapped env Appliance Hybrid cloud (public & private) Multi-cloud federation Private cloud Public cloud LLMariner Control Plane LLMariner Agent Cloud Y Cloud A LLMariner Control Plane K8s cluster LLMariner Control Plane LLMariner Agent Cloud Y Cloud B LLMariner Agent ※ No need to open incoming ports in worker clusters, only outgoing port 443 is required