SlideShare a Scribd company logo
WORKSHOP
Monitor ML models in production
How to monitor and manage the health of your ml models
Leah Kolben, CTO
@leah4kosh
leah@cnvrg.io
whoami
• Developer/Data scientist => CTO
• cnvrg.io = built by data scientists, for data scientists to help teams:
• Get from data to models to production in the most efficient and fast way
• bridge science and engineering
agenda
• Introduction – recap previous webinars
• Kubernetes overview
• Why should we monitor our models
• Monitor tools
• LIVE workshop
• Summary
Introduction
• Previous webinars:
• Train your ML models on Kubernetes
• Run ETL jobs using spark on Kubernetes
• Deploy your ML model in production
• Today: “My model is in production, now what?”
• Use Grafana to monitor your model metrics – CPU & Memory
• Use Kibana to monitor your ML model logs
• Use elasicsearch to index your model and create alerts
Kubernetes - recap
• Provides a runtime environment for Docker containers
• Provides an abstraction layer for containers to run on
• Deploy as micro services
• All services are natively load balanced
• Can scale up and down dynamically
• Monitor the health of the containers
• Schedule runs and cronjobs
• Use the same API across EVERY cloud provider and bare metal!
• Goals:
• Quickly deploy your ML model as a service
• Reduce costs and man power with auto scaling
• Load balanced the traffic
• Natively monitored by Kubernetes
• Update your model continuously: canary deployments, blue/green deployments
ML in production - recap
Why should we monitor our models?
• Able to track you model performance
• Prevent model drift
• Monitor auto scaling and high load of traffic
• Know when to retrain your model and updat it
• Monitor updates – validate new version is better the old one
• Keep you production models updated and relevant!
Monitor tools
• Use the EKS stack on kubernetes
• Use open source tools that can be quickly installed on kuberentes using helm
• Use grafana to monitor the resources you rmodel uses
• Use Kibana to show and track you input/output
• Use elasticsearch to index the logs
• Use elasalert for tagging and creating alerts regarding the health of your models
Grafana
• Open source analytics and monitoring for a variety of databases
• Natively connects to the EKS stack
• Use custom dashboards to track your model usage
Grafana
Grafana – custom dashboard
Kibana
• Part of the EKS stack
• Log all input and output
• Log internal logs
• Search and find prediction outputs
• Visualize and understand the data
Kibana
Elasticsearch & elasalert
• Store and index all input/output of you models
• Get sense of you data
• Query your data
• Create custom rules on the data to monitor the health of your model
• Trigger custom webhooks upon alerts – retrain CI/CD pipelines
Let’s do it!
Using cnvrg to deploy & monitor your models
• Your only responsibility is to write the predict function
• end-2-end pipelines
• EKS stack is deployed automatically
• Dedicated Grafana for each endpoint
• Dedicated kibana for each endpoint
• Alerts and triggers out of the box
• Continual learning support
DEMO
Summary
• Kubernetes is the becoming the standard way to deploy models
• Overview Kubernetes
• Overview of what’s ML in production
• Monitor – why and how
• Live demo
• Deploy, monitor and retrain models using cnvrg
Thanks!
https://cnvrg.io
info@cnvrg.io
+972-506-660186

More Related Content

PDF
Build machine learning pipelines from research to production
PDF
CI/CD for Machine Learning
PDF
Training Machine Learning models directly from GitHub with cnvrg.io MLOps
PPTX
DataSciencePT #27 - Fifty Shades of Automated Machine Learning
PPTX
databricks ml flow demonstration using automatic features engineering
PDF
Machine Learning Infrastructure
PPTX
DevTernity - OOP in the enterprise
PDF
Grokking: Data Engineering Course
Build machine learning pipelines from research to production
CI/CD for Machine Learning
Training Machine Learning models directly from GitHub with cnvrg.io MLOps
DataSciencePT #27 - Fifty Shades of Automated Machine Learning
databricks ml flow demonstration using automatic features engineering
Machine Learning Infrastructure
DevTernity - OOP in the enterprise
Grokking: Data Engineering Course

What's hot (19)

PPTX
Lightning talk how to edit the Silverstripe CMS docs
PDF
Introduction to Scala by Piotr Wiśniowski Scalac
PDF
PDF
Data Monitoring with whylogs
PPTX
The Bleeding Edge - Whats New in Angular 2
PDF
Data engineering zoomcamp introduction
PPTX
Matlab-Assignment-Projects
PPTX
Just start coding
PPTX
Network-Simulation-Tools-Comparison
PPTX
Matlab-Programming-Homework-Help
PPTX
Intro to TypeScript, HTML5DevConf Oct 2013
PDF
Things we learned building a native IOS app
PDF
Buliding Reliable Data Apps
PDF
Free software and agile: Do they fit together?
PDF
Advancing your data science career
PDF
Cert01 70-483 - programming in c#
PPTX
How to Use Innoslate for Beginners
PDF
ArashResumeOct15
PDF
From Software Engineering To Machine Learning
Lightning talk how to edit the Silverstripe CMS docs
Introduction to Scala by Piotr Wiśniowski Scalac
Data Monitoring with whylogs
The Bleeding Edge - Whats New in Angular 2
Data engineering zoomcamp introduction
Matlab-Assignment-Projects
Just start coding
Network-Simulation-Tools-Comparison
Matlab-Programming-Homework-Help
Intro to TypeScript, HTML5DevConf Oct 2013
Things we learned building a native IOS app
Buliding Reliable Data Apps
Free software and agile: Do they fit together?
Advancing your data science career
Cert01 70-483 - programming in c#
How to Use Innoslate for Beginners
ArashResumeOct15
From Software Engineering To Machine Learning
Ad

Similar to How to monitor your ML models in production with Kubernetes (20)

PDF
Deploy your machine learning models to production with Kubernetes
PPTX
Machine Learning Models in Production
PPTX
From Data Science to MLOps
PDF
How to set up Kubernetes for all your machine learning workflows
PDF
Why more than half of ML models don't make it to production
PDF
Overcome the Hurdles of Machine Learning Model Deployment_ A Comprehensive Gu...
PDF
Hydrosphere.io for ODSC: Webinar on Kubeflow
PPTX
Machine Learning Models: From Research to Production 6.13.18
PDF
End to end MLworkflows
PDF
Managing the Machine Learning Lifecycle with MLOps
PDF
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
PDF
Michelangelo - Machine Learning Platform - 2018
PDF
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
PDF
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
PPTX
Why is dev ops for machine learning so different - dataxdays
PPTX
Why is dev ops for machine learning so different
PPTX
Deployment of the Machine Learning at the production level
PDF
Why APM Is Not the Same As ML Monitoring
PDF
Monitoring Models in Production
PDF
DutchMLSchool 2022 - Automation
Deploy your machine learning models to production with Kubernetes
Machine Learning Models in Production
From Data Science to MLOps
How to set up Kubernetes for all your machine learning workflows
Why more than half of ML models don't make it to production
Overcome the Hurdles of Machine Learning Model Deployment_ A Comprehensive Gu...
Hydrosphere.io for ODSC: Webinar on Kubeflow
Machine Learning Models: From Research to Production 6.13.18
End to end MLworkflows
Managing the Machine Learning Lifecycle with MLOps
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Michelangelo - Machine Learning Platform - 2018
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
How To Build Auto-Adaptive Machine Learning Models with Kubernetes
Why is dev ops for machine learning so different - dataxdays
Why is dev ops for machine learning so different
Deployment of the Machine Learning at the production level
Why APM Is Not the Same As ML Monitoring
Monitoring Models in Production
DutchMLSchool 2022 - Automation
Ad

Recently uploaded (20)

PPTX
1_Introduction to advance data techniques.pptx
PPTX
Computer network topology notes for revision
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to machine learning and Linear Models
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
1_Introduction to advance data techniques.pptx
Computer network topology notes for revision
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Reliability_Chapter_ presentation 1221.5784
Supervised vs unsupervised machine learning algorithms
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to machine learning and Linear Models
oil_refinery_comprehensive_20250804084928 (1).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Data_Analytics_and_PowerBI_Presentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Qualitative Qantitative and Mixed Methods.pptx
.pdf is not working space design for the following data for the following dat...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
annual-report-2024-2025 original latest.
Introduction-to-Cloud-ComputingFinal.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...

How to monitor your ML models in production with Kubernetes

  • 1. WORKSHOP Monitor ML models in production How to monitor and manage the health of your ml models Leah Kolben, CTO @leah4kosh leah@cnvrg.io
  • 2. whoami • Developer/Data scientist => CTO • cnvrg.io = built by data scientists, for data scientists to help teams: • Get from data to models to production in the most efficient and fast way • bridge science and engineering
  • 3. agenda • Introduction – recap previous webinars • Kubernetes overview • Why should we monitor our models • Monitor tools • LIVE workshop • Summary
  • 4. Introduction • Previous webinars: • Train your ML models on Kubernetes • Run ETL jobs using spark on Kubernetes • Deploy your ML model in production • Today: “My model is in production, now what?” • Use Grafana to monitor your model metrics – CPU & Memory • Use Kibana to monitor your ML model logs • Use elasicsearch to index your model and create alerts
  • 5. Kubernetes - recap • Provides a runtime environment for Docker containers • Provides an abstraction layer for containers to run on • Deploy as micro services • All services are natively load balanced • Can scale up and down dynamically • Monitor the health of the containers • Schedule runs and cronjobs • Use the same API across EVERY cloud provider and bare metal!
  • 6. • Goals: • Quickly deploy your ML model as a service • Reduce costs and man power with auto scaling • Load balanced the traffic • Natively monitored by Kubernetes • Update your model continuously: canary deployments, blue/green deployments ML in production - recap
  • 7. Why should we monitor our models? • Able to track you model performance • Prevent model drift • Monitor auto scaling and high load of traffic • Know when to retrain your model and updat it • Monitor updates – validate new version is better the old one • Keep you production models updated and relevant!
  • 8. Monitor tools • Use the EKS stack on kubernetes • Use open source tools that can be quickly installed on kuberentes using helm • Use grafana to monitor the resources you rmodel uses • Use Kibana to show and track you input/output • Use elasticsearch to index the logs • Use elasalert for tagging and creating alerts regarding the health of your models
  • 9. Grafana • Open source analytics and monitoring for a variety of databases • Natively connects to the EKS stack • Use custom dashboards to track your model usage
  • 11. Grafana – custom dashboard
  • 12. Kibana • Part of the EKS stack • Log all input and output • Log internal logs • Search and find prediction outputs • Visualize and understand the data
  • 14. Elasticsearch & elasalert • Store and index all input/output of you models • Get sense of you data • Query your data • Create custom rules on the data to monitor the health of your model • Trigger custom webhooks upon alerts – retrain CI/CD pipelines
  • 16. Using cnvrg to deploy & monitor your models • Your only responsibility is to write the predict function • end-2-end pipelines • EKS stack is deployed automatically • Dedicated Grafana for each endpoint • Dedicated kibana for each endpoint • Alerts and triggers out of the box • Continual learning support
  • 17. DEMO
  • 18. Summary • Kubernetes is the becoming the standard way to deploy models • Overview Kubernetes • Overview of what’s ML in production • Monitor – why and how • Live demo • Deploy, monitor and retrain models using cnvrg