CD in Machine Learning Systems

CD in Machine
Learning systems
Juan López
@juaneto

Continuous deployment
What it is and why everybody wants it
Idea Develop
Deploy in
prod

Idea Develop
Deploy in
prod
● New features on the fly.

Idea Develop
Deploy in
prod
● Quality goes up (smaller changes).

Idea Develop
Deploy in
prod
● Faster development.

Idea Develop
Deploy in
prod
● Experimentation.

Idea Develop
Deploy in
prod
● Experimentation.
● Innovation.

So… we want to reduce the gap between
a new idea and when this idea is in
production.

Machine learning
Where do we use it? Not only hype

Machine learning
Where do we use it? Not only hype
● Image recognition
● Recommendations
● Predictions
● etc.

Machine learning
What is it?
● Subset of artificial intelligence.

Machine learning
What is it?
● Statistical models that systems use to
effectively perform a specific task.

Machine learning
What is it?
● Statistical models that systems use to
effectively perform a specific task.
● It doesn´t use explicit instructions,
relying on patterns and inference
instead.

2017 The ML Test Score:
A Rubric for ML Production Readiness and Technical Debt Reduction
Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, D. Sculley Google, Inc.
How do we achieve CD?

ModelCode
Data
Production
Monitoring

Code
ModelCode
Data
Production
Monitoring

Code
Apply the best practices for writing
your code. Code is always code

Code
● Not only model. Complex systems.

Code
● Extreme programming.

Code
● Quality gates.

Code
● Quality gates.
● Feature toggles.

Code
● Quality gates.
● Feature toggles.
● Test Pyramid.
Manual session
based testing
Automated
GUI tests
Automated unit tests
Automated integration tests
Automated API tests
Automated component tests
* Vishal Naik
(Thoughtworks insights)

Builds Test
Continuous integration
Acceptance
test
Deploy to
staging
Continuous delivery
Deploy
to pro
Smoke
test
Code pipeline

Unlike in traditional software systems,
the ¨behavior of ML systems is not specified
directly in code but is learned from data¨.

Unlike in traditional software systems,
the ¨behavior of ML systems is not specified
directly in code but is learned from data¨.
So our tests depend on the sets
of data for training models.

ModelCode
Data
Production
Monitoring
Data

Ingest
● Data lake
● Know your sources. Data Catalog.

Ingest
● Data lake
● Have a schema. Governance your data.

Ingest
● Data lake
● Have a schema. Governance your data.
● Watch for silent failures.

Data wrangling/mungling
● Datamart (not data warehouse).

● Be careful with data cooking:
if your features are bad, everything
is bad.

● Be careful with data cooking:
if your features are bad, everything
is bad.
● Data cleaning

Get training data
● data scientist. Make their life easier.

Get training data
● Big data. Importance-weight sampled.

Get training data
● Data security.

Get training data
● Data security.
● Versioning data.

● Data security.
● Versioning data.
● Training/Serving Skew.
Get training data

“All models are wrong”. Common aphorism in Statistics.

”All models are wrong, some are useful”. George Box.

”All models are wrong, some are useful”. George Box.
”All models are wrong, some are useful for a short
period of time”. Tensorflow´s team.

Model
ModelCode
Data
Production
Monitoring

First of all
● Design & evaluate the reward function.

First of all
● Define errors & failure.

First of all
● Ensure mechanisms for user feedback.

First of all
● Try to tie model changes to a clear metric of the subjective user experience.

● Try to tie model changes to a clear metric of the subjective user experience.
● Objective vs many metrics.
First of all

● Code is code.
Code new model candidate

● Code is code.
● Run test in your pipeline.

● Code is code.
● Run test in your pipeline.
● New version of the model.

Training model
● Feature engineering. (Unbalancing data,
unknown unknowns, etc).

Training model
● Be critical with your features: data dependencies
cost more than code dependencies.

Training model
● Training/serving Skew.

Training model
● Deterministic training dramatically simplifies.

Training model
● Deterministic training dramatically simplifies.
● Tune hyperparameters.

Model competition
PRODUCTION
Model in PRO Model 1 Model 2 Model n
Model in
PRO

Model performance
● Test performance with production data.

Model performance
● Check your reward functions and failures. E.g: ROC curve.

Model performance
● Be careful. Satisfy a baseline of quality in all data slices.

Model performance
● Baseline of accuracy.

Model performance
● Baseline of accuracy.
● Feedback loop.

Model champion
PRODUCTION
Model in PRO Model 2 Model n
Model in
PRO Model 1

Deploy champion model
● Shadow traffic.

● Shadow traffic.
● Test the models with real data.

● Shadow traffic.
● Canary releases.

● Shadow traffic.
● Tests A/B.

● Shadow traffic.
● Tests A/B.
● Rollbacks.

Monitoring
...because shit happens

Monitoring
ModelCode
Data
Production
Monitoring

Monitoring
● Create a dashboard with clear and useful
information.

Monitoring
information.
● Schema changes.

Monitoring
information.
● Schema changes.
● Infra monitoring (training speed, serving
latency, RAM usage, etc).

Monitoring
● User feedback.
● Stale models.

Monitoring
● User feedback.
● Stale models.
● Feedback loop.

Monitoring
● User feedback.
● Stale models.
● Feedback loop.
● Errors (model, apis, etc).

Monitoring
● User feedback.
● Stale models.
● Feedback loop.
● Errors (model, apis, etc).
● Silent failures.

● Code is always code
● Objective driven modeling
● Know your data
● Clear metrics for complex systems

Juan López
@juaneto
Thank you

CD in Machine Learning Systems

More Related Content

What's hot (20)

Similar to CD in Machine Learning Systems (20)

More from Thoughtworks (20)

Recently uploaded (20)

CD in Machine Learning Systems