SlideShare a Scribd company logo
Kubernetes for Machine
Learning
by Akash Agrawal
Agenda
● Machine Learning Overview
● Machine Learning in Production
● Machine Learning on Google Cloud Platform (GCP)
● Kubernetes Overview
● Google Kubernetes Engine (GKE) Overview
● Kubeflow
● Design & Best Practices
About Me
● Google Developer Expert (on Google Cloud Platform Category)
● 11 years of experience in IT Industry
● Worked with various clients like Sabre/Citi Bank/Goldman Sachs/L&T
Infotech etc.
● Currently I work as Independent Consultant (as Technical
Adviser/Architect Role) & Tech Evangelist
What this Talk is (about/not about)
● About:
○ ML System Understanding
○ ML & Kubernetes Integration / Design
● Not About:
○ ML Code Syntax/Structure
○ ML Algorithms
Machine Learning Overview
● Teaching Computers to recognize patterns in the same way as our brains
do
● Model Building ---> Model Training ---> Model Serving
Machine Learning Overview
● Machine Learning Lifecycle:
○ Build Machine Learning Model:
■ Write Machine Learning Code in any supporting/framework e.g. TensorFlow, SciKit
Learn, XGBoost, PyTorch
○ Input Data:
■ You divide Input data into Training & Testing Data
■ Inference/Serving time you pass Inference Input Data
■ Data may have labels or not
○ Train the Model with Input Data:
■ Training generates Model (some kind of Graph e.g. TensorFlow Graph/DAG)
Machine Learning Overview
● Machine Learning Lifecycle:
○ Serve/Inference:
■ You can take the model & serve it as REST api endpoint
○ Predictions:
■ You use these REST api endpoints for Online/Batch Prediction (Confidence Value)
Machine Learning Overview
● Extra Steps:
○ Pre Processing
○ Post Processing
At what Stage are you with ML today
● Experimenting / Learning
● Building Proof of Concepts (POCs) / Prototyping
● Designing (Deployment/Workflows/Scaling/Management) for Production
Machine Learning In Production
● Few extra things to take care of:
○ Collaborative Environment with folks in different roles e.g. Data Scientists / Platform
Engineers / DevOPs / Researchers
○ Production ML Applications are designed to run 24/7/365
○ Input Data (Training/Testing & Inference) is floating continuously - Streaming/Batch
○ You can use different kind of frameworks for ML models building e.g. TensorFlow, SciKit
Learn, XGBoost, PyTorch etc.
○ These models constantly updated, improved upon & deployed
○ Repetitive ML Tasks like Feature Engineering, Hyperparameter Tuning, Data Cleansing &
Validations
Machine Learning In Production
● Few extra things to take care of:
○ Config Separation on different environments
○ RBAC (Role Based Access Control)
○ Different Deployment/Hosting Options : Cloud (e.g. GCP) or Private Data Centers/Cloud
(e.g. VMWare Based)
○ Different Hardwares/Accelerators for Compute intensive workloads e.g. GPUs/TPUs
○ Scaling Requirements:
■ Distributed Processing (Training or Serving)
■ Distributed Processing (e.g. One Model is running on multiple GPUs/TPUs or one
GPU is used to run multiple Models)
Machine Learning on GCP
● 3 ways:
○ ML as an API ( Cloud Vision API, Cloud Video Intelligence API, Cloud Speech API, Cloud
Natural Language API, Cloud Translation API)
○ AutoML
○ Custom Models
■ With Cloud ML Engine
■ With Kubernetes / GKE / Kubeflow etc.
Kubernetes Overview
● Kubernetes is an Open Source system for Container Orchestration
(Deployment/Management/Scaling)
● Features:
○ Scheduling
○ Self Healing / Auto Repairing
○ Scaling (Manual / Auto Scaling / Scaling Out / Scaling In)
○ ...
Google Kubernete Engine (GKE) Overview
● Managed Service for Kubernetes on Google Cloud (focused on
Deployment/Management/Scaling)
● Provides Reliable, Efficient & Secured way to run Kubernetes Clusters (on
GCP)
● GKE On-Prem
Google Kubernete Engine (GKE) Overview
● Features:
○ Fully Managed
○ Auto Scaling / Auto Upgrade / Auto Repair
○ Integration : IAM / StackDriver / VPC
○ Security, Compliance, Runs on Optimized OS (COS)
○ Accelerators Support : GPUs/TPUs
○ Various Cluster Topologies : Zonal Clusters / Regional Clusters
○ Workload Portability : On-Premises / Cloud
○ ...
Kubeflow:
● Focused on Deployment of ML Workflows on Kubernetes (Simple,
Portable & Scalable)
● Goal: is to support deployment of Best-of-breed Open Source Systems for
ML to diverse Infrastructure
● Anywhere you are running Kubernetes, you can run Kubeflow
Kubeflow:
● Features:
○ Pipelines: for deploying & managing End to End ML Workflows.
○ Integration:
■ Jupyter Notebooks
■ TensorFlow Model Training Controller
■ Seldon Core : for Model Serving
○ Multi-Framework Support: TensorFlow, PyTorch, Apache MXNet
○ Share/Reuse using AI Hub
Design & Best Practices:
● Separate out Compute & Storage
● Scaling & Self Healing Capabilities
● Cloud & GKE Topologies
● Docker Best Practices
● Kubernetes Best Practices
● ML Framework Best Practices
Look for:
● AI Hub: https://aihub.cloud.google.com/
● Qwik Labs:
○ Qwests:
■ Kubernetes Solutions: https://www.qwiklabs.com/quests/45
○ Labs:
■ Kubeflow Labs
Google Cloud Platform - Resources
● Google Cloud Platform 101 (Cloud Next ‘19):
https://www.youtube.com/watch?v=vmOMataJZWw
● Google Cloud Developer Cheat Sheet:
https://raw.githubusercontent.com/gregsramblings/google-cloud-4-
words/master/Poster-medres.png
● 100+ announcements from Google Cloud Next ‘19:
https://cloud.google.com/blog/topics/inside-google-cloud/100-plus-
announcements-from-google-cloud-next19
Google Cloud Platform - Resources
● Google Cloud Next ‘19 Sessions:
https://www.youtube.com/playlist?list=PLIivdWyY5sqIXvUGVrFuZibCUdK
VzEoUw
● GCP Certification Resources: https://github.com/ddneves/awesome-
gcp-certifications
Akash Agrawal
LinkedIn : akash-agrawal-58a97813
Twitter : @akkiagrawal29
Thanks

More Related Content

PPTX
PPTX
Azure DevOps
PDF
Using JMeter for Performance Testing Live Streaming Applications
PPTX
Jenkins for java world
PPT
Ppt of soap ui
PPTX
Internet Presentation
PPTX
kali linux.pptx
PPTX
Ubuntu installation-presentations
Azure DevOps
Using JMeter for Performance Testing Live Streaming Applications
Jenkins for java world
Ppt of soap ui
Internet Presentation
kali linux.pptx
Ubuntu installation-presentations

What's hot (20)

PPTX
Lan and wan
PPTX
Full stack web development
PPT
Twitter Powerpoint
PPT
Linux
PDF
Differences between OpenStack and AWS
PDF
Jenkins Pipelines
PPTX
Dev ops != Dev+Ops
PPTX
GIT presentation
PDF
REPORT ON ASP.NET
PPTX
Git hub
PPTX
1 - Introduction of Azure DevOps
PPTX
whatsapp ppt
PDF
DevOps Lifecycle | Edureka
PPTX
Webinar Cloud Native Community.pptx
PDF
Automation Testing using Selenium
PDF
DevOps Transformation: Learnings and Best Practices
PPTX
Web 2.0
PPTX
TDD - Agile
Lan and wan
Full stack web development
Twitter Powerpoint
Linux
Differences between OpenStack and AWS
Jenkins Pipelines
Dev ops != Dev+Ops
GIT presentation
REPORT ON ASP.NET
Git hub
1 - Introduction of Azure DevOps
whatsapp ppt
DevOps Lifecycle | Edureka
Webinar Cloud Native Community.pptx
Automation Testing using Selenium
DevOps Transformation: Learnings and Best Practices
Web 2.0
TDD - Agile
Ad

Similar to Kubernetes for machine learning (20)

PPTX
Machine learning in the wild deployment
PDF
Build and Monitor Machine Learning Services in Kubernetes
PDF
Containerized architectures for deep learning
PDF
Recreating "The Clock" with Machine Learning and Web Scraping
PDF
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
PDF
Kubernetes: The Next Research Platform
PDF
How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...
PDF
Modern big data and machine learning in the era of cloud, docker and kubernetes
PDF
Infrastructure Agnostic Machine Learning Workload Deployment
PDF
Hydrosphere.io for ODSC: Webinar on Kubeflow
PDF
MLOps with Kubernetes - Thiago Ramos.pdf
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
PPTX
From Data Science to MLOps
PPTX
Canada DevOps Summit 2020 Presentation Nov_03_2020
PDF
DutchMLSchool 2022 - Automation
PDF
Machine learning at scale with Google Cloud Platform
PDF
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
PDF
Deploy your machine learning models to production with Kubernetes
PPTX
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Machine learning in the wild deployment
Build and Monitor Machine Learning Services in Kubernetes
Containerized architectures for deep learning
Recreating "The Clock" with Machine Learning and Web Scraping
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kubernetes: The Next Research Platform
How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...
Modern big data and machine learning in the era of cloud, docker and kubernetes
Infrastructure Agnostic Machine Learning Workload Deployment
Hydrosphere.io for ODSC: Webinar on Kubeflow
MLOps with Kubernetes - Thiago Ramos.pdf
Scaling your Data Pipelines with Apache Spark on Kubernetes
ODSC East 2020 Accelerate ML Lifecycle with Kubernetes and Containerized Da...
From Data Science to MLOps
Canada DevOps Summit 2020 Presentation Nov_03_2020
DutchMLSchool 2022 - Automation
Machine learning at scale with Google Cloud Platform
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Deploy your machine learning models to production with Kubernetes
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Ad

More from Akash Agrawal (10)

PDF
Google Kubernetes Engine (GKE) deep dive
PPTX
Azure kubernetes service (aks)
PPTX
MicroServices with Containers, Kubernetes & ServiceMesh
PPTX
Cloud Native and CNCF
PDF
Amazon EKS - Aws community day bengaluru 2019
PPTX
Kubernetes & Google Kubernetes Engine (GKE)
PPTX
MicroService architecture_&_Kubernetes
PPTX
Google cloud infrastructure workshop
ODP
Kubernetes best practices.odf
ODP
Stateful applications on kubernetes
Google Kubernetes Engine (GKE) deep dive
Azure kubernetes service (aks)
MicroServices with Containers, Kubernetes & ServiceMesh
Cloud Native and CNCF
Amazon EKS - Aws community day bengaluru 2019
Kubernetes & Google Kubernetes Engine (GKE)
MicroService architecture_&_Kubernetes
Google cloud infrastructure workshop
Kubernetes best practices.odf
Stateful applications on kubernetes

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
KodekX | Application Modernization Development
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
KodekX | Application Modernization Development
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
Chapter 3 Spatial Domain Image Processing.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Understanding_Digital_Forensics_Presentation.pptx
Big Data Technologies - Introduction.pptx
NewMind AI Monthly Chronicles - July 2025
Review of recent advances in non-invasive hemoglobin estimation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx

Kubernetes for machine learning

  • 2. Agenda ● Machine Learning Overview ● Machine Learning in Production ● Machine Learning on Google Cloud Platform (GCP) ● Kubernetes Overview ● Google Kubernetes Engine (GKE) Overview ● Kubeflow ● Design & Best Practices
  • 3. About Me ● Google Developer Expert (on Google Cloud Platform Category) ● 11 years of experience in IT Industry ● Worked with various clients like Sabre/Citi Bank/Goldman Sachs/L&T Infotech etc. ● Currently I work as Independent Consultant (as Technical Adviser/Architect Role) & Tech Evangelist
  • 4. What this Talk is (about/not about) ● About: ○ ML System Understanding ○ ML & Kubernetes Integration / Design ● Not About: ○ ML Code Syntax/Structure ○ ML Algorithms
  • 5. Machine Learning Overview ● Teaching Computers to recognize patterns in the same way as our brains do ● Model Building ---> Model Training ---> Model Serving
  • 6. Machine Learning Overview ● Machine Learning Lifecycle: ○ Build Machine Learning Model: ■ Write Machine Learning Code in any supporting/framework e.g. TensorFlow, SciKit Learn, XGBoost, PyTorch ○ Input Data: ■ You divide Input data into Training & Testing Data ■ Inference/Serving time you pass Inference Input Data ■ Data may have labels or not ○ Train the Model with Input Data: ■ Training generates Model (some kind of Graph e.g. TensorFlow Graph/DAG)
  • 7. Machine Learning Overview ● Machine Learning Lifecycle: ○ Serve/Inference: ■ You can take the model & serve it as REST api endpoint ○ Predictions: ■ You use these REST api endpoints for Online/Batch Prediction (Confidence Value)
  • 8. Machine Learning Overview ● Extra Steps: ○ Pre Processing ○ Post Processing
  • 9. At what Stage are you with ML today ● Experimenting / Learning ● Building Proof of Concepts (POCs) / Prototyping ● Designing (Deployment/Workflows/Scaling/Management) for Production
  • 10. Machine Learning In Production ● Few extra things to take care of: ○ Collaborative Environment with folks in different roles e.g. Data Scientists / Platform Engineers / DevOPs / Researchers ○ Production ML Applications are designed to run 24/7/365 ○ Input Data (Training/Testing & Inference) is floating continuously - Streaming/Batch ○ You can use different kind of frameworks for ML models building e.g. TensorFlow, SciKit Learn, XGBoost, PyTorch etc. ○ These models constantly updated, improved upon & deployed ○ Repetitive ML Tasks like Feature Engineering, Hyperparameter Tuning, Data Cleansing & Validations
  • 11. Machine Learning In Production ● Few extra things to take care of: ○ Config Separation on different environments ○ RBAC (Role Based Access Control) ○ Different Deployment/Hosting Options : Cloud (e.g. GCP) or Private Data Centers/Cloud (e.g. VMWare Based) ○ Different Hardwares/Accelerators for Compute intensive workloads e.g. GPUs/TPUs ○ Scaling Requirements: ■ Distributed Processing (Training or Serving) ■ Distributed Processing (e.g. One Model is running on multiple GPUs/TPUs or one GPU is used to run multiple Models)
  • 12. Machine Learning on GCP ● 3 ways: ○ ML as an API ( Cloud Vision API, Cloud Video Intelligence API, Cloud Speech API, Cloud Natural Language API, Cloud Translation API) ○ AutoML ○ Custom Models ■ With Cloud ML Engine ■ With Kubernetes / GKE / Kubeflow etc.
  • 13. Kubernetes Overview ● Kubernetes is an Open Source system for Container Orchestration (Deployment/Management/Scaling) ● Features: ○ Scheduling ○ Self Healing / Auto Repairing ○ Scaling (Manual / Auto Scaling / Scaling Out / Scaling In) ○ ...
  • 14. Google Kubernete Engine (GKE) Overview ● Managed Service for Kubernetes on Google Cloud (focused on Deployment/Management/Scaling) ● Provides Reliable, Efficient & Secured way to run Kubernetes Clusters (on GCP) ● GKE On-Prem
  • 15. Google Kubernete Engine (GKE) Overview ● Features: ○ Fully Managed ○ Auto Scaling / Auto Upgrade / Auto Repair ○ Integration : IAM / StackDriver / VPC ○ Security, Compliance, Runs on Optimized OS (COS) ○ Accelerators Support : GPUs/TPUs ○ Various Cluster Topologies : Zonal Clusters / Regional Clusters ○ Workload Portability : On-Premises / Cloud ○ ...
  • 16. Kubeflow: ● Focused on Deployment of ML Workflows on Kubernetes (Simple, Portable & Scalable) ● Goal: is to support deployment of Best-of-breed Open Source Systems for ML to diverse Infrastructure ● Anywhere you are running Kubernetes, you can run Kubeflow
  • 17. Kubeflow: ● Features: ○ Pipelines: for deploying & managing End to End ML Workflows. ○ Integration: ■ Jupyter Notebooks ■ TensorFlow Model Training Controller ■ Seldon Core : for Model Serving ○ Multi-Framework Support: TensorFlow, PyTorch, Apache MXNet ○ Share/Reuse using AI Hub
  • 18. Design & Best Practices: ● Separate out Compute & Storage ● Scaling & Self Healing Capabilities ● Cloud & GKE Topologies ● Docker Best Practices ● Kubernetes Best Practices ● ML Framework Best Practices
  • 19. Look for: ● AI Hub: https://aihub.cloud.google.com/ ● Qwik Labs: ○ Qwests: ■ Kubernetes Solutions: https://www.qwiklabs.com/quests/45 ○ Labs: ■ Kubeflow Labs
  • 20. Google Cloud Platform - Resources ● Google Cloud Platform 101 (Cloud Next ‘19): https://www.youtube.com/watch?v=vmOMataJZWw ● Google Cloud Developer Cheat Sheet: https://raw.githubusercontent.com/gregsramblings/google-cloud-4- words/master/Poster-medres.png ● 100+ announcements from Google Cloud Next ‘19: https://cloud.google.com/blog/topics/inside-google-cloud/100-plus- announcements-from-google-cloud-next19
  • 21. Google Cloud Platform - Resources ● Google Cloud Next ‘19 Sessions: https://www.youtube.com/playlist?list=PLIivdWyY5sqIXvUGVrFuZibCUdK VzEoUw ● GCP Certification Resources: https://github.com/ddneves/awesome- gcp-certifications
  • 22. Akash Agrawal LinkedIn : akash-agrawal-58a97813 Twitter : @akkiagrawal29 Thanks