SlideShare a Scribd company logo
8
Most read
10
Most read
11
Most read
Build AI Cloud
using Kubernetes
CLOUD NATIVE INDORE: TECH TALK
Anjul Sahu, CEO, CloudRaft
About Me
Founder & CEO, CloudRaft - an
AI & Cloud Native Consulting
Organizer, Cloud Native Indore
More than 16 years in Industry
building large scale systems.
Previously worked for Telco,
Banks, Product & Startups
Passionate about new
technology
Anjul Sahu
CEO, CloudRaft
256 GH200 DGX Cluster
1 Exaflop, 144 TB GPU
2023
In this
Presentation
Overview
What is AI Cloud?
Current Trends in AI Infrastructure
How Cloud Native helps in running AI
Architecture of AI Cloud
Cloud Native Projects for AI
Challenges
Q&A
01
02
03
04
05
06
07
An AI Cloud simplifies AI implementation for
organizations by integrating it into daily
operations. AI Clouds cover the AI lifecycle,
from creating features and models to
operating, monitoring, and sharing them
throughout the organization. Platforms
supporting the full AI lifecycle are known as
AI platforms, and when available in scalable
environments, they are termed AI Clouds.
On-prem , Hybrid or Cloud Support end-to-end
lifecycle of AI
Self-service
Scalable Reliable
GPUs & High Performance
AI/ML Frameworks Full stack: IaaS, PaaS, SaaS
Billing or Chargeback
Features of AI Cloud
Current Trends in AI Infrastructure
Data Sovereignty
Requirements
Enterprise data loss
risk, AI Safety and
new Govt policies to
keep data local
Specialized Cloud
Eg: CoreWeave,
Salad, RunPod,
Nebius, Lambda labs
etc
Cloud Native and
Kubernetes is an
accelerator for AI
2x Data in every 18
months
The demand for data
to build better AI/ML
models is increasing
faster than Moore’s
Law, doubling every
18 months
GenAI: Bigger
Models
model size is
increasing that
means more
powerful
infrastructure is
required
01 02 03 04 05
AI Runs on GPUs Accelerators
AI = matrix multiplications which is massively parallelizable
GPUs are great at parallel programming
CPU < 32 cores/threads, GPUs> 4000 cores/threads
CPU is 10x slower at least
Impractical to train or even run any reasonable AI model outside ASICs
Build AI Cloud with CloudRaft AI Platform
How Cloud Native helps in running AI Workload
"Research teams can now take advantage of the frameworks we've built on top of Kubernetes, which
make it easy to launch experiments, scale them by 10x or 50x, and take little effort to manage."
— CHRISTOPHER BERNER, HEAD OF INFRASTRUCTURE FOR OPENAI
AI Cloud Reference Architecture
Cloud Native Projects for AI
Distributed Training
Model / LLM Observability
Vector Databases
Data Architectures
Governance and Policy
General Orchestration
ML Serving CI/CD Delivery
Workload Observability
AutoML
Ecosystem is evolving fast...
Security
Challenges in Building AI Cloud
Building an AI Cloud is a large investment
GPU supply chain issues
Skill issues
High reliability required for long running distributed training jobs
Unknown security threats and AI Risk in the fast evolving ecosystem
Sustainability - Each H100 energy consumption is more than avg household
Some of the hardware limitations becomes bottlenecks such as storage or the network
Why we need AI Cloud?
Data Privacy
AI is making humans more productive
AGI is possible
Cost is still less as compared to hyperscalers
It is a game changer for many enterprises
This talk is based on our recent work.
And it was not possible without the ground breaking
innovations done by
Kubernetes, NVIDIA and CNCF foundation
See our insights on AI
cloudraft.io/blog
Q & A
"Success in creating AI would be the biggest event
in human history. Unfortunately, it might also be
the last, unless we learn how to avoid the risks."
-Stephen Hawking, Theoretical Physicist

More Related Content

PDF
Key Trends Shaping Cloud Infrastructure and Edge Infrastructure
PPTX
Cloud Native AI Introduction, Challenges
PDF
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
PDF
Cloud-Native Meets Generative AI in Modern Apps
PPTX
OS for AI: Elastic Microservices & the Next Gen of ML
PDF
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
PDF
Containers & AI - Beauty and the Beast!?!
PDF
KubeCon & CloudNative Con 2024 Artificial Intelligent
Key Trends Shaping Cloud Infrastructure and Edge Infrastructure
Cloud Native AI Introduction, Challenges
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Cloud-Native Meets Generative AI in Modern Apps
OS for AI: Elastic Microservices & the Next Gen of ML
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Containers & AI - Beauty and the Beast!?!
KubeCon & CloudNative Con 2024 Artificial Intelligent

Similar to Build AI Cloud with CloudRaft AI Platform (20)

PDF
AI & Machine Learning Pipelines with Knative
PPTX
Agile architectures in a modern cloud-native ecosystem
PDF
Agile Architecture in a Modern Cloud-Native Ecosystem
PDF
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
PDF
leewayhertz.com-Cloud AI services A comprehensive guide.pdf
PDF
Artificial Intelligence Workloads and Data Center Management
PDF
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
PPTX
Cloud technologies
PDF
AI in Business - Key drivers and future value
PDF
Google Cloud Fundamentals
PPTX
aiproject.pptx
PDF
Key Trends Shaping the Future of Infrastructure.pdf
PDF
"Portrait of the developer as The Artist" Lockheed Architect Workshop
PDF
NVIDIA Artificial Intelligence Ecosystem and Workflows
PDF
Saving Human Lives with the IoT
PDF
On premise ai platform - from dc to edge
PPTX
Cloud native fundamentals
PPTX
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
PDF
Ai platform at scale
PPTX
Tlu introduction-to-cloud
AI & Machine Learning Pipelines with Knative
Agile architectures in a modern cloud-native ecosystem
Agile Architecture in a Modern Cloud-Native Ecosystem
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
leewayhertz.com-Cloud AI services A comprehensive guide.pdf
Artificial Intelligence Workloads and Data Center Management
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
Cloud technologies
AI in Business - Key drivers and future value
Google Cloud Fundamentals
aiproject.pptx
Key Trends Shaping the Future of Infrastructure.pdf
"Portrait of the developer as The Artist" Lockheed Architect Workshop
NVIDIA Artificial Intelligence Ecosystem and Workflows
Saving Human Lives with the IoT
On premise ai platform - from dc to edge
Cloud native fundamentals
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Ai platform at scale
Tlu introduction-to-cloud
Ad

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
The AUB Centre for AI in Media Proposal.docx
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
“AI and Expert System Decision Support & Business Intelligence Systems”
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Ad

Build AI Cloud with CloudRaft AI Platform

  • 1. Build AI Cloud using Kubernetes CLOUD NATIVE INDORE: TECH TALK Anjul Sahu, CEO, CloudRaft
  • 2. About Me Founder & CEO, CloudRaft - an AI & Cloud Native Consulting Organizer, Cloud Native Indore More than 16 years in Industry building large scale systems. Previously worked for Telco, Banks, Product & Startups Passionate about new technology Anjul Sahu CEO, CloudRaft
  • 3. 256 GH200 DGX Cluster 1 Exaflop, 144 TB GPU 2023
  • 4. In this Presentation Overview What is AI Cloud? Current Trends in AI Infrastructure How Cloud Native helps in running AI Architecture of AI Cloud Cloud Native Projects for AI Challenges Q&A 01 02 03 04 05 06 07
  • 5. An AI Cloud simplifies AI implementation for organizations by integrating it into daily operations. AI Clouds cover the AI lifecycle, from creating features and models to operating, monitoring, and sharing them throughout the organization. Platforms supporting the full AI lifecycle are known as AI platforms, and when available in scalable environments, they are termed AI Clouds.
  • 6. On-prem , Hybrid or Cloud Support end-to-end lifecycle of AI Self-service Scalable Reliable GPUs & High Performance AI/ML Frameworks Full stack: IaaS, PaaS, SaaS Billing or Chargeback Features of AI Cloud
  • 7. Current Trends in AI Infrastructure Data Sovereignty Requirements Enterprise data loss risk, AI Safety and new Govt policies to keep data local Specialized Cloud Eg: CoreWeave, Salad, RunPod, Nebius, Lambda labs etc Cloud Native and Kubernetes is an accelerator for AI 2x Data in every 18 months The demand for data to build better AI/ML models is increasing faster than Moore’s Law, doubling every 18 months GenAI: Bigger Models model size is increasing that means more powerful infrastructure is required 01 02 03 04 05
  • 8. AI Runs on GPUs Accelerators AI = matrix multiplications which is massively parallelizable GPUs are great at parallel programming CPU < 32 cores/threads, GPUs> 4000 cores/threads CPU is 10x slower at least Impractical to train or even run any reasonable AI model outside ASICs
  • 10. How Cloud Native helps in running AI Workload "Research teams can now take advantage of the frameworks we've built on top of Kubernetes, which make it easy to launch experiments, scale them by 10x or 50x, and take little effort to manage." — CHRISTOPHER BERNER, HEAD OF INFRASTRUCTURE FOR OPENAI
  • 11. AI Cloud Reference Architecture
  • 12. Cloud Native Projects for AI Distributed Training Model / LLM Observability Vector Databases Data Architectures Governance and Policy General Orchestration ML Serving CI/CD Delivery Workload Observability AutoML Ecosystem is evolving fast... Security
  • 13. Challenges in Building AI Cloud Building an AI Cloud is a large investment GPU supply chain issues Skill issues High reliability required for long running distributed training jobs Unknown security threats and AI Risk in the fast evolving ecosystem Sustainability - Each H100 energy consumption is more than avg household Some of the hardware limitations becomes bottlenecks such as storage or the network Why we need AI Cloud? Data Privacy AI is making humans more productive AGI is possible Cost is still less as compared to hyperscalers It is a game changer for many enterprises
  • 14. This talk is based on our recent work. And it was not possible without the ground breaking innovations done by Kubernetes, NVIDIA and CNCF foundation See our insights on AI cloudraft.io/blog
  • 15. Q & A "Success in creating AI would be the biggest event in human history. Unfortunately, it might also be the last, unless we learn how to avoid the risks." -Stephen Hawking, Theoretical Physicist