SlideShare a Scribd company logo
CI/CD Templates: Continuous
Delivery of ML-Enabled Data
Pipelines on Databricks
Michael Shtelma, Sr. Solutions Architect
Ivan Trusov, Solutions Architect
Agenda
The Challenges of implementing
CI/CD for ML pipelines
The CI/CD challenges forcing ML teams to choose
between Databricks notebooks or local IDEs
Introducing DatabricksLabs
CI/CD Templates
How CI/CD Templates solves ML team production
challenges
Demo and Next Steps
Problem:
Organisations are struggling to get Business to start using
their models to drive additional revenue
Cause:
Due to complexity of ML lifecycle only few models end up
in production and drive additional revenue for business.
Most of them are either delayed or discontinued during
different ML Project stages
It is challenging for organizations to
gain value from ML due to complexity of
the ML lifecycle
What challenges do ML teams
face when then try to
implement CD4ML?
ML teams struggle to combine traditional CI/CD
tools with Databricks notebooks
1. Benefits to Databricks notebooks
Easy to use
Scalable
Provides access to ML tools such as mlflow for model logging and serving
2. Challenges
Non-trivial to hook into traditional software development tools such as CI tools or local IDEs.
3. Result
Teams find themselves choosing between
using traditional IDE based workflows but struggling to test and deploy at scale or
using Databricks notebooks or other cloud notebooks but then struggling to ensure
testing and deployment reliability via CI/CD pipelines.
What’s the solution?
CI/CD Templates gives you the benefits of
traditional CICD workflows and the scale of
databricks clusters
CI/CD Templates allows you to
● create a production pipeline via template in a few steps
● that automatically hooks to github actions and
● runs tests and deployments on databricks upon git commit or
whatever trigger you define and
● gives you a test success status directly in github so you know if your
commit broke the build
A scalable CI/CD pipeline in 5 easy steps
1. Install and customize with a single command
2. Create a new github repo containing your databricks host
and token secrets
3. Initialize git in your repo and commit the code.
4. Push your new cicd templates project to the repo. Your tests will
start running automatically on Databricks. Upon your tests’ success
or failure you will get a green checkmark or red x next to your commit
status.
5. You’re done! You now have a fully scalable CICD pipeline.
1
2
3
4
5
Project structure
1. Python package where the logic of the project will be developed.
Your models and pipelines will be developed here.
2. Configuration where you can configure define Databricks jobs
which can run pipelines developed in python package
3. Tests directory where local unit tests and integration tests will be
developed
1
2
3
CI/CD Templates execute tests and deployments
directly on databricks while storing packages, model
logging and other artifacts in Mlflow
CI/CD Templates - now powered by dbx
With dbx you can:
● customize project structure and specify it during deployments
● use new CI tools easily (PRs are welcome!)
● run custom data pipelines pipelines directly from IDE on interactive clusters
Push Flow
Release Flow
Demo:
CI/CD Templates
Summary
The Challenges of implementing
CD4ML
The CI/CD challenges forcing ML teams to choose
between Databricks notebooks or local IDEs
Introducing DatabricksLabs
CI/CD Templates
How CI/CD Templates solves ML team production
challenges
Next Steps
Search DatabricksLabs cicd-templates or go
directly to
https://github.com/databrickslabs/cicd-templates
to get started
michael.shtelma@databricks.com
ivan.trusov@databricks.com
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

PDF
Microservice Architecture Patterns, by Richard Langlois P. Eng.
PDF
Kubernetes Introduction
PDF
DevOps for Databricks
PPTX
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
PDF
“Houston, we have a model...” Introduction to MLOps
PDF
Event Driven Architecture: Mistakes, I've made a few...
PDF
From Zero to Hero with Kafka Connect
Microservice Architecture Patterns, by Richard Langlois P. Eng.
Kubernetes Introduction
DevOps for Databricks
Lessons Learned Building a Connector Using Kafka Connect (Katherine Stanley &...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
“Houston, we have a model...” Introduction to MLOps
Event Driven Architecture: Mistakes, I've made a few...
From Zero to Hero with Kafka Connect

What's hot (20)

PPTX
App Modernization Pitch Deck.pptx
PPTX
Maven
PDF
Mastering azure devOps - Dot Net Tricks
ODP
Introduction to Kafka connect
PPTX
Event-driven microservices
PPTX
AWS API Gateway
PPTX
Microservices with Docker
PDF
Kubernetes API code-base tour
PDF
Infographic: AWS vs Azure vs GCP: What's the best cloud platform for enterprise?
PDF
Stream Processing with Apache Kafka and .NET
PPTX
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
PPTX
Resiliency vs High Availability vs Fault Tolerance vs Reliability
PDF
Azure DevOps Presentation
PPTX
Docker and kubernetes
PDF
Google Cloud Fundamentals
PPTX
Reactive Web Best Practices
PPTX
VMware Tanzu Kubernetes Connect
PPTX
NFV Orchestration for Telcos using OpenStack Tacker
PDF
Kubernetes 101
ODP
Linux host orchestration with Foreman, Puppet and Gitlab
App Modernization Pitch Deck.pptx
Maven
Mastering azure devOps - Dot Net Tricks
Introduction to Kafka connect
Event-driven microservices
AWS API Gateway
Microservices with Docker
Kubernetes API code-base tour
Infographic: AWS vs Azure vs GCP: What's the best cloud platform for enterprise?
Stream Processing with Apache Kafka and .NET
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Resiliency vs High Availability vs Fault Tolerance vs Reliability
Azure DevOps Presentation
Docker and kubernetes
Google Cloud Fundamentals
Reactive Web Best Practices
VMware Tanzu Kubernetes Connect
NFV Orchestration for Telcos using OpenStack Tacker
Kubernetes 101
Linux host orchestration with Foreman, Puppet and Gitlab
Ad

Similar to CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks (20)

PDF
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
PDF
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
PDF
Productionalizing Models through CI/CD Design with MLflow
PDF
CI/CD for Machine Learning
PDF
CD4ML - ThoughtWorks MeetUp Munich Christoph Windheuser May 8th 2019
PDF
Continuous Integration & Continuous Delivery
PDF
Scaling and Modernizing Data Platform with Databricks
PDF
Continuous Intelligence: Keeping your AI Application in Production
PPTX
SCALABLE CI CD DEVOPS
PDF
Top CI/CD Tools Every QA Automation Engineer Should Use
PDF
Continuous Delivery for Machine Learning
PDF
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
PPTX
VTUProj I think I will be going to temple ect-1.pptx
PPTX
Deploying ML models to production (frequently and safely) - PYCON 2018
PPTX
Test Driven Development & CI/CD
PDF
Building Your Digital Assembly Line Mastering the Modern CICD Pipeline.pdf
PDF
Making Software Delivery Seamless Essential Knowledge About CICD Pipelines.pdf
PDF
Continuous Intelligence: Moving Machine Learning into Production Reliably
PDF
CD4ML and the challenges of testing and quality in ML systems
PDF
Benefits of implementing CI & CD for Machine Learning
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Productionalizing Models through CI/CD Design with MLflow
CI/CD for Machine Learning
CD4ML - ThoughtWorks MeetUp Munich Christoph Windheuser May 8th 2019
Continuous Integration & Continuous Delivery
Scaling and Modernizing Data Platform with Databricks
Continuous Intelligence: Keeping your AI Application in Production
SCALABLE CI CD DEVOPS
Top CI/CD Tools Every QA Automation Engineer Should Use
Continuous Delivery for Machine Learning
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
VTUProj I think I will be going to temple ect-1.pptx
Deploying ML models to production (frequently and safely) - PYCON 2018
Test Driven Development & CI/CD
Building Your Digital Assembly Line Mastering the Modern CICD Pipeline.pdf
Making Software Delivery Seamless Essential Knowledge About CICD Pipelines.pdf
Continuous Intelligence: Moving Machine Learning into Production Reliably
CD4ML and the challenges of testing and quality in ML systems
Benefits of implementing CI & CD for Machine Learning
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
Computer network topology notes for revision
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Global journeys: estimating international migration
PDF
Lecture1 pattern recognition............
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
Computer network topology notes for revision
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IB Computer Science - Internal Assessment.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Moving the Public Sector (Government) to a Digital Adoption
Global journeys: estimating international migration
Lecture1 pattern recognition............
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
IBA_Chapter_11_Slides_Final_Accessible.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Knowledge Engineering Part 1
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Launch Your Data Science Career in Kochi – 2025

CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks

  • 1. CI/CD Templates: Continuous Delivery of ML-Enabled Data Pipelines on Databricks Michael Shtelma, Sr. Solutions Architect Ivan Trusov, Solutions Architect
  • 2. Agenda The Challenges of implementing CI/CD for ML pipelines The CI/CD challenges forcing ML teams to choose between Databricks notebooks or local IDEs Introducing DatabricksLabs CI/CD Templates How CI/CD Templates solves ML team production challenges Demo and Next Steps
  • 3. Problem: Organisations are struggling to get Business to start using their models to drive additional revenue Cause: Due to complexity of ML lifecycle only few models end up in production and drive additional revenue for business. Most of them are either delayed or discontinued during different ML Project stages It is challenging for organizations to gain value from ML due to complexity of the ML lifecycle
  • 4. What challenges do ML teams face when then try to implement CD4ML?
  • 5. ML teams struggle to combine traditional CI/CD tools with Databricks notebooks 1. Benefits to Databricks notebooks Easy to use Scalable Provides access to ML tools such as mlflow for model logging and serving 2. Challenges Non-trivial to hook into traditional software development tools such as CI tools or local IDEs. 3. Result Teams find themselves choosing between using traditional IDE based workflows but struggling to test and deploy at scale or using Databricks notebooks or other cloud notebooks but then struggling to ensure testing and deployment reliability via CI/CD pipelines.
  • 7. CI/CD Templates gives you the benefits of traditional CICD workflows and the scale of databricks clusters CI/CD Templates allows you to ● create a production pipeline via template in a few steps ● that automatically hooks to github actions and ● runs tests and deployments on databricks upon git commit or whatever trigger you define and ● gives you a test success status directly in github so you know if your commit broke the build
  • 8. A scalable CI/CD pipeline in 5 easy steps 1. Install and customize with a single command 2. Create a new github repo containing your databricks host and token secrets 3. Initialize git in your repo and commit the code. 4. Push your new cicd templates project to the repo. Your tests will start running automatically on Databricks. Upon your tests’ success or failure you will get a green checkmark or red x next to your commit status. 5. You’re done! You now have a fully scalable CICD pipeline. 1 2 3 4 5
  • 9. Project structure 1. Python package where the logic of the project will be developed. Your models and pipelines will be developed here. 2. Configuration where you can configure define Databricks jobs which can run pipelines developed in python package 3. Tests directory where local unit tests and integration tests will be developed 1 2 3
  • 10. CI/CD Templates execute tests and deployments directly on databricks while storing packages, model logging and other artifacts in Mlflow
  • 11. CI/CD Templates - now powered by dbx With dbx you can: ● customize project structure and specify it during deployments ● use new CI tools easily (PRs are welcome!) ● run custom data pipelines pipelines directly from IDE on interactive clusters
  • 15. Summary The Challenges of implementing CD4ML The CI/CD challenges forcing ML teams to choose between Databricks notebooks or local IDEs Introducing DatabricksLabs CI/CD Templates How CI/CD Templates solves ML team production challenges Next Steps Search DatabricksLabs cicd-templates or go directly to https://github.com/databrickslabs/cicd-templates to get started michael.shtelma@databricks.com ivan.trusov@databricks.com
  • 16. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.