SlideShare a Scribd company logo
CD in Machine
Learning systems
Juan LĂłpez
@juaneto
Goals and
structure
Continuous deployment
What it is and why everybody wants it
Idea Develop
Deploy in
prod
Continuous deployment
What it is and why everybody wants it
Idea Develop
Deploy in
prod
● New features on the fly.
Continuous deployment
What it is and why everybody wants it
Idea Develop
Deploy in
prod
● New features on the fly.
● Quality goes up (smaller changes).
Continuous deployment
What it is and why everybody wants it
Idea Develop
Deploy in
prod
● New features on the fly.
● Quality goes up (smaller changes).
● Faster development.
Continuous deployment
What it is and why everybody wants it
Idea Develop
Deploy in
prod
● New features on the fly.
● Quality goes up (smaller changes).
● Faster development.
● Experimentation.
Continuous deployment
Idea Develop
Deploy in
prod
What it is and why everybody wants it
● New features on the fly.
● Quality goes up (smaller changes).
● Faster development.
● Experimentation.
● Innovation.
So
 we want to reduce the gap between
a new idea and when this idea is in
production.
Machine learning
Where do we use it? Not only hype
Machine learning
Where do we use it? Not only hype
● Image recognition
● Recommendations
● Predictions
● etc.
Machine learning
What is it?
Machine learning
What is it?
● Subset of artificial intelligence.
Machine learning
What is it?
● Subset of artificial intelligence.
● Statistical models that systems use to
effectively perform a specific task.
Machine learning
What is it?
● Subset of artificial intelligence.
● Statistical models that systems use to
effectively perform a specific task.
● It doesn®t use explicit instructions,
relying on patterns and inference
instead.
So
 we want to reduce the gap between
a new idea and when this idea is in
production.
How do we achieve CD?
2017 The ML Test Score:
A Rubric for ML Production Readiness and Technical Debt Reduction
Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, D. Sculley Google, Inc.
How do we achieve CD?
How do we achieve CD?
How do we achieve CD?
Machine
Learning
Systems
ModelCode
Data
Production
Monitoring
Code
ModelCode
Data
Production
Monitoring
Code
Apply the best practices for writing
your code. Code is always code
Code
Apply the best practices for writing
your code. Code is always code
● Not only model. Complex systems.
Code
Apply the best practices for writing
your code. Code is always code
● Not only model. Complex systems.
● Extreme programming.
Code
Apply the best practices for writing
your code. Code is always code
● Not only model. Complex systems.
● Extreme programming.
● Quality gates.
Code
Apply the best practices for writing
your code. Code is always code
● Not only model. Complex systems.
● Extreme programming.
● Quality gates.
● Feature toggles.
Code
Apply the best practices for writing
your code. Code is always code
● Not only model. Complex systems.
● Extreme programming.
● Quality gates.
● Feature toggles.
● Test Pyramid.
Manual session
based testing
Automated
GUI tests
Automated unit tests
Automated integration tests
Automated API tests
Automated component tests
* Vishal Naik
(Thoughtworks insights)
Builds Test
Continuous integration
Acceptance
test
Deploy to
staging
Continuous delivery
Deploy
to pro
Smoke
test
Continuous deployment
Code pipeline
Unlike in traditional software systems,
the šbehavior of ML systems is not specified
directly in code but is learned from dataš.
Unlike in traditional software systems,
the šbehavior of ML systems is not specified
directly in code but is learned from dataš.
So our tests depend on the sets
of data for training models.
ModelCode
Data
Production
Monitoring
Data
Data pipeline
Ingest
Ingest
● Data lake
Ingest
● Data lake
● Know your sources. Data Catalog.
Ingest
● Data lake
● Know your sources. Data Catalog.
● Have a schema. Governance your data.
Ingest
● Data lake
● Know your sources. Data Catalog.
● Have a schema. Governance your data.
● Watch for silent failures.
Data wrangling/mungling
Data wrangling/mungling
● Datamart (not data warehouse).
Data wrangling/mungling
● Datamart (not data warehouse).
● Be careful with data cooking:
if your features are bad, everything
is bad.
Data wrangling/mungling
● Datamart (not data warehouse).
● Be careful with data cooking:
if your features are bad, everything
is bad.
● Data cleaning
Get training data
Get training data
● data scientist. Make their life easier.
Get training data
● data scientist. Make their life easier.
● Big data. Importance-weight sampled.
Get training data
● data scientist. Make their life easier.
● Big data. Importance-weight sampled.
● Data security.
Get training data
● data scientist. Make their life easier.
● Big data. Importance-weight sampled.
● Data security.
● Versioning data.
● data scientist. Make their life easier.
● Big data. Importance-weight sampled.
● Data security.
● Versioning data.
● Training/Serving Skew.
Get training data
“All models are wrong”. Common aphorism in Statistics.
“All models are wrong”. Common aphorism in Statistics.
”All models are wrong, some are useful”. George Box.
“All models are wrong”. Common aphorism in Statistics.
”All models are wrong, some are useful”. George Box.
”All models are wrong, some are useful for a short
period of time”. Tensorflow®s team.
Model
ModelCode
Data
Production
Monitoring
First of all
First of all
● Design & evaluate the reward function.
First of all
● Design & evaluate the reward function.
● Define errors & failure.
First of all
● Design & evaluate the reward function.
● Define errors & failure.
● Ensure mechanisms for user feedback.
First of all
● Design & evaluate the reward function.
● Define errors & failure.
● Ensure mechanisms for user feedback.
● Try to tie model changes to a clear metric of the subjective user experience.
● Design & evaluate the reward function.
● Define errors & failure.
● Ensure mechanisms for user feedback.
● Try to tie model changes to a clear metric of the subjective user experience.
● Objective vs many metrics.
First of all
Model pipeline
Code new model candidate
● Code is code.
Code new model candidate
● Code is code.
● Run test in your pipeline.
Code new model candidate
● Code is code.
● Run test in your pipeline.
● New version of the model.
Code new model candidate
Training model
Training model
● Feature engineering. (Unbalancing data,
unknown unknowns, etc).
Training model
● Feature engineering. (Unbalancing data,
unknown unknowns, etc).
● Be critical with your features: data dependencies
cost more than code dependencies.
Training model
● Feature engineering. (Unbalancing data,
unknown unknowns, etc).
● Be critical with your features: data dependencies
cost more than code dependencies.
● Training/serving Skew.
Training model
● Feature engineering. (Unbalancing data,
unknown unknowns, etc).
● Be critical with your features: data dependencies
cost more than code dependencies.
● Training/serving Skew.
● Deterministic training dramatically simplifies.
Training model
● Feature engineering. (Unbalancing data,
unknown unknowns, etc).
● Be critical with your features: data dependencies
cost more than code dependencies.
● Training/serving Skew.
● Deterministic training dramatically simplifies.
● Tune hyperparameters.
Model competition
PRODUCTION
Model in PRO Model 1 Model 2 Model n
Model in
PRO
Model performance
Model performance
● Test performance with production data.
Model performance
● Test performance with production data.
● Check your reward functions and failures. E.g: ROC curve.
Model performance
● Test performance with production data.
● Check your reward functions and failures. E.g: ROC curve.
● Be careful. Satisfy a baseline of quality in all data slices.
Model performance
● Test performance with production data.
● Check your reward functions and failures. E.g: ROC curve.
● Be careful. Satisfy a baseline of quality in all data slices.
● Baseline of accuracy.
Model performance
● Test performance with production data.
● Check your reward functions and failures. E.g: ROC curve.
● Be careful. Satisfy a baseline of quality in all data slices.
● Baseline of accuracy.
● Feedback loop.
Model champion
PRODUCTION
Model in PRO Model 2 Model n
Model in
PRO Model 1
Deploy champion model
Deploy champion model
● Shadow traffic.
Deploy champion model
● Shadow traffic.
● Test the models with real data.
Deploy champion model
● Shadow traffic.
● Test the models with real data.
● Canary releases.
Deploy champion model
● Shadow traffic.
● Test the models with real data.
● Canary releases.
● Tests A/B.
Deploy champion model
● Shadow traffic.
● Test the models with real data.
● Canary releases.
● Tests A/B.
● Rollbacks.
Monitoring
...because shit happens
Monitoring
ModelCode
Data
Production
Monitoring
Monitoring
Monitoring
● Create a dashboard with clear and useful
information.
Monitoring
● Create a dashboard with clear and useful
information.
● Schema changes.
Monitoring
● Create a dashboard with clear and useful
information.
● Schema changes.
● Infra monitoring (training speed, serving
latency, RAM usage, etc).
Monitoring
● User feedback.
Monitoring
● User feedback.
● Stale models.
Monitoring
● User feedback.
● Stale models.
● Feedback loop.
Monitoring
● User feedback.
● Stale models.
● Feedback loop.
● Errors (model, apis, etc).
Monitoring
● User feedback.
● Stale models.
● Feedback loop.
● Errors (model, apis, etc).
● Silent failures.
Conclusions
● Code is always code
● Objective driven modeling
● Know your data
● Clear metrics for complex systems
Juan LĂłpez
@juaneto
Thank you

More Related Content

PDF
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
PDF
Microservices, the lean way
PPTX
Etl engine testing with scala
 
PDF
From legacy to DDD
PDF
catfx Datasheet_v1
PPTX
GraphQL - Missing Link In REST
PDF
Gradle(the innovation continues)
PDF
GraphQL & Relay
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
Microservices, the lean way
Etl engine testing with scala
 
From legacy to DDD
catfx Datasheet_v1
GraphQL - Missing Link In REST
Gradle(the innovation continues)
GraphQL & Relay

What's hot (20)

PPTX
vodQA Pune (2019) - Testing ethereum smart contracts
 
PDF
Scaling Ride-Hailing with Machine Learning on MLflow
PDF
BDD for RIAs with JavaScript - Skills Matter
PDF
"Production Driven Development", Serhii Kalinets
 
PPTX
vodQA Pune (2019) - Testing AI,ML applications
 
PDF
Angular vs React - Devoxx BE 2017
PDF
Introduction to lambda behave
PDF
The Graph-Native Advantage
 
PPTX
The Effect of Microservices on API Design
PDF
GraphQL vs BFF: A critical perspective
PPTX
OutSystems Tips and Tricks
PDF
SKS in git ops mode
PDF
Web Applications of the Future with TypeScript and GraphQL
PPTX
GraphQL Introduction
PDF
Serverless meets GraphQL
PPTX
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
PPT
Intranet show and_tell_2010
PDF
The Apollo and GraphQL Stack
PDF
OSGi for outsiders - Milen Dyankov
PDF
Building Observable Infrastructure and Code
vodQA Pune (2019) - Testing ethereum smart contracts
 
Scaling Ride-Hailing with Machine Learning on MLflow
BDD for RIAs with JavaScript - Skills Matter
"Production Driven Development", Serhii Kalinets
 
vodQA Pune (2019) - Testing AI,ML applications
 
Angular vs React - Devoxx BE 2017
Introduction to lambda behave
The Graph-Native Advantage
 
The Effect of Microservices on API Design
GraphQL vs BFF: A critical perspective
OutSystems Tips and Tricks
SKS in git ops mode
Web Applications of the Future with TypeScript and GraphQL
GraphQL Introduction
Serverless meets GraphQL
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Intranet show and_tell_2010
The Apollo and GraphQL Stack
OSGi for outsiders - Milen Dyankov
Building Observable Infrastructure and Code
Ad

Similar to CD in Machine Learning Systems (20)

PDF
Productionising Machine Learning Models
PDF
C2_W1---.pdf
PDF
Machine learning systems for engineers
PDF
Continuous Intelligence: Keeping your AI Application in Production
PDF
ML Application Life Cycle
PPTX
230208 MLOps Getting from Good to Great.pptx
PDF
Making Netflix Machine Learning Algorithms Reliable
PDF
Data Science: Good, Bad and Ugly by Irina Kukuyeva
PPTX
Machine Learning vs Decision Optimization comparison
PPTX
Ml2 production
PDF
Continuous delivery for machine learning
PPTX
Recommendations for Building Machine Learning Software
PDF
From Machine Learning Scientist to Full Stack Data Scientist: Lessons learned...
PPTX
DevOps for Machine Learning overview en-us
PPTX
Continuous Intelligence Workshop
PPTX
"ML in Production",Oleksandr Bagan
 
PDF
Demystifying ML/AI
PPTX
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
PDF
Course 2 Machine Learning Data LifeCycle in Production - Week 1
PDF
DevOps Days Rockies MLOps
Productionising Machine Learning Models
C2_W1---.pdf
Machine learning systems for engineers
Continuous Intelligence: Keeping your AI Application in Production
ML Application Life Cycle
230208 MLOps Getting from Good to Great.pptx
Making Netflix Machine Learning Algorithms Reliable
Data Science: Good, Bad and Ugly by Irina Kukuyeva
Machine Learning vs Decision Optimization comparison
Ml2 production
Continuous delivery for machine learning
Recommendations for Building Machine Learning Software
From Machine Learning Scientist to Full Stack Data Scientist: Lessons learned...
DevOps for Machine Learning overview en-us
Continuous Intelligence Workshop
"ML in Production",Oleksandr Bagan
 
Demystifying ML/AI
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Course 2 Machine Learning Data LifeCycle in Production - Week 1
DevOps Days Rockies MLOps
Ad

More from Thoughtworks (20)

PDF
Design System as a Product
PDF
Designers, Developers & Dogs
PDF
Cloud-first for fast innovation
PDF
More impact with flexible teams
PDF
Culture of Innovation
PDF
Dual-Track Agile
PDF
Developer Experience
PDF
When we design together
PDF
Hardware is hard(er)
PDF
Customer-centric innovation enabled by cloud
PDF
Amazon's Culture of Innovation
PDF
When in doubt, go live
PDF
Don't cross the Rubicon
PDF
Error handling
PDF
Your test coverage is a lie!
PDF
Docker container security
PDF
Redefining the unit
PPTX
Technology Radar Webinar UK - Vol. 22
PDF
A Tribute to Turing
PDF
Rsa maths worked out
Design System as a Product
Designers, Developers & Dogs
Cloud-first for fast innovation
More impact with flexible teams
Culture of Innovation
Dual-Track Agile
Developer Experience
When we design together
Hardware is hard(er)
Customer-centric innovation enabled by cloud
Amazon's Culture of Innovation
When in doubt, go live
Don't cross the Rubicon
Error handling
Your test coverage is a lie!
Docker container security
Redefining the unit
Technology Radar Webinar UK - Vol. 22
A Tribute to Turing
Rsa maths worked out

Recently uploaded (20)

PPTX
Online Work Permit System for Fast Permit Processing
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Digital Strategies for Manufacturing Companies
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
System and Network Administration Chapter 2
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
AI in Product Development-omnex systems
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
ai tools demonstartion for schools and inter college
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPT
Introduction Database Management System for Course Database
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
top salesforce developer skills in 2025.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
Online Work Permit System for Fast Permit Processing
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Digital Strategies for Manufacturing Companies
Softaken Excel to vCard Converter Software.pdf
System and Network Administration Chapter 2
ISO 45001 Occupational Health and Safety Management System
AI in Product Development-omnex systems
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
ai tools demonstartion for schools and inter college
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
How Creative Agencies Leverage Project Management Software.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Upgrade and Innovation Strategies for SAP ERP Customers
Introduction Database Management System for Course Database
CHAPTER 2 - PM Management and IT Context
top salesforce developer skills in 2025.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Odoo POS Development Services by CandidRoot Solutions
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Wondershare Filmora 15 Crack With Activation Key [2025

CD in Machine Learning Systems