SlideShare a Scribd company logo
23 April 2019
IMPROVING HOW WE
DELIVER MACHINE
LEARNING MODELS
2
WHO ARE WE?
David Tan
Developer @ ThoughtWorks
Jonathan Heng
Developer @ ThoughtWorks
3
THE PLAN TODAY
Why do we need to improve the ML workflow?
What are some better practices?
How can we practice these practices?
4
TEMPERATURE CHECK
Who has...
● Trained a ML model before?
● Deployed a ML model for fun?
● Deployed a ML model at work?
● Deployed a model using an automated CI pipeline?
WHAT’S THE
PROBLEM?
6
OBSERVATION
One of these is not like the others
Continuous delivery practices can help us change this.
7
JOURNEY ON AN ML PROJECT
Jupyter
Notebooks
Production
???
8
IT’S NEVER JUST ML
We got 99 problems and machine learning ain’t one
Potentially
unethical
outcomes
How can we help people do difficult things?
Source: Machine Learning: The High Interest Credit Card of Technical Debt (Google, 2015)
9
HELPING PEOPLE DO DIFFICULT THINGS
Sensible defaults
● Two reference repos:
○ github.com/ThoughtWorksInc/ml-cd-starter-kit
○ github.com/ThoughtWorksInc/ml-app-template
● Better ways of working
○ 6 common problems and suggested solutions
10
LET’S GO
6 common problems and suggested solutions
1WORKS ON MY MACHINE
11
1212
1. WORKS ON MY MACHINE
To start, simply:
• docker build …
• docker run ...
1313
1. WORKS ON MY MACHINE
Demo
2NO DATA /
DATA SUCKS
Business problem
ML model
Feature engineering
Data
1515
2. NO DATA / DATA SUCKS
Mitigation measures
● Think about data access before starting project
● Collect better data with every
release (more on this a few slides
from now)
● “Wizard of Oz” / Fake it till we make it
○ Provide interface but without ML
implementation
16
3DEPLOYMENTS ARE
COMPLICATED
1717
3. DEPLOYMENTS ARE COMPLICATED
Mitigation measures
● CI pipeline
● Deploy early and often
● Tracer bullet
● Bring the pain forward
1818
3. DEPLOY EARLY AND OFTEN
Source: Continuous Delivery (Jez Humble, Dave Farley)
feedback
Run unit
tests
push
Source code
repository
trigger
Local env
1919
3. DEPLOY EARLY AND OFTEN
feedback
Run unit
tests
Train and
evaluate
model
push
Source code
repository
trigger
Local env
Source: Continuous Delivery (Jez Humble, Dave Farley)
2020
3. DEPLOY EARLY AND OFTEN
feedback
Run unit
tests
Deploy
candidate
model to
staging
Train and
evaluate
model
push
Source code
repository
trigger
Local env
Artifact
repositor
y
Source: Continuous Delivery (Jez Humble, Dave Farley)
2121
3. DEPLOY EARLY AND OFTEN
Run unit
tests
Deploy
candidate
model to
staging
Deploy
model to
production
Train and
evaluate
model
push
Source code
repository
trigger
Local env
Artifact
repositor
y
Source: Continuous Delivery (Jez Humble, Dave Farley)
22
4HOW DO WE CHOOSE
BETWEEN CANDIDATE
MODELS?
2323
4. CHOOSING BUILDS
Include model evaluation metrics in CI pipeline
24
5HOW’S THE MODEL
DOING IN THE WILD?
2525
5. OBSERVE!
Monitoring service usage
Benefit #1: Feedback on production model
2626
5. OBSERVE!
Monitoring model output
Benefit #1: Feedback on production model
2727
5. OBSERVE!
Monitoring model inputs
● Could help identify training-serving skew
Benefit #1: Feedback on production model
2828
5. OBSERVE!
Benefit #2: Interpretability of predictions
2929
5. OBSERVE!
Benefit #3: Closing the data collection loop
Data turking Train &
test
model
Deploy
model
Data / feature
repository
data
Evaluate
models
Flow of data
Flow of model
Model
Service
Logs
30
5. OBSERVE!
Benefit #4: Ability to measure goodness of any model
build_and_
test
deploy_
staging
deploy_
prod
evaluate_
model_w_new_data
(git push)
evaluate_
model
model = my-image:$BUILD_ID
r_2 = 0.7
rmse = 42
3131
5. OBSERVE!
Benefit #4: Ability to measure goodness of any model
3232
5. HOW’S THE MODEL IN THE WILD?
OBSERVE!
Summing up
● Mitigation measures
○ Logging + Monitoring
● Benefits
○ Feedback on production models
○ Interpretability (how did the model decide on this particular prediction?)
○ Better data for training
○ Better (unseen) data for evaluating candidate/champion models
3333
5. HOW’S THE MODEL IN THE WILD?
Demo
GoCD
MLFlow
Kubernetes
+
Helm
ElasticSearch
Fluentd
Kibana
Grafana
34
5. HOW’S THE MODEL IN THE WILD?
Demo
35
6HARMFUL MODELS IN
PRODUCTION
3636
6. HARMFUL MODELS IN PRODUCTION
● PredPol algorithm reinforces racial biases in policing data
● Recruiting tool shows bias against women
Actual news headlines
Image source: I’m an AI researcher, and here’s what scares me about AI (Rachel Thomas)
3737
● Discuss and define what “bad” looks like in our context
● “Black mirror” retros
● Measure unfairness
○ Make fairness a measurable fitness function
● Data ethics checklist (link)
● Human-in-the-loop / appeal processes
● Ability to recover from harmful models
37
6. HARMFUL MODELS IN PRODUCTION
Mitigation measures
38
6. HARMFUL MODELS IN PRODUCTION
Demo: rollback to last good build
39
SUMMING UP
How can we make easier to do the right thing?
40
MAKE IT EASIER TO DO THE RIGHT THING
● Better ways of working
○ Environment management
○ Closing the data collection loop
○ Deploy early and often
○ Automated tracking of hyperparameters and metrics
○ Logging and monitoring
○ Do no harm
● Two reference repos:
○ github.com/ThoughtWorksInc/ml-cd-starter-kit
○ github.com/ThoughtWorksInc/ml-app-template
4141
Provision and configure cross-cutting services
GoCD
EFKG
MLFlow
github.com/ThoughtWorksInc/ml-cd-starter-kit
github.com/ThoughtWorksInc/ml-app-template
Project boilerplate template
Unit tests
Train model
Test model metrics
Dockerised setup
Store CI pipeline as code
Track hyperparameters and metrics of each training run on CI
Logging (predictions, inputs, explanatory variables)
424242
SUMMING UP
Notebook
/
playgroun
d
PROD
(maybe
)
commit and push
Experiment /
Develop
Monitor Deploy
Test
Continuous
Delivery
43
FURTHER READING
● https://www.thoughtworks.com/intelligent-empowerment
● www.continuousdelivery.com
David Tan / Jonathan Heng
davidtan+jonheng@thoughtworks.co
m
THANK YOU.
44

More Related Content

PPTX
Deploying ML models to production (frequently and safely) - PYCON 2018
PPTX
Deploying ML models to production (frequently and safely) - PYCON 2018
PPTX
Test Driven Development
PPTX
Agile Software Development and Test Driven Development: Agil8's Dave Putman 3...
PDF
Test driven development
PDF
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
PDF
Test driven development
PPTX
Managing and Versioning Machine Learning Models in Python
Deploying ML models to production (frequently and safely) - PYCON 2018
Deploying ML models to production (frequently and safely) - PYCON 2018
Test Driven Development
Agile Software Development and Test Driven Development: Agil8's Dave Putman 3...
Test driven development
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Test driven development
Managing and Versioning Machine Learning Models in Python

What's hot (20)

PDF
Test Driven Development by Denis Lutz
PPT
Scrum and Test-driven development
PDF
Codeception Testing Framework -- English #phpkansai
PDF
Agile testing overview
PDF
Weekly #106: Deep Learning on Mobile
PDF
Test Driven iOS Development (TDD)
PDF
Agile Test Driven Development
PDF
Dream QA: Designing the QA team where we'd love to work
PDF
Testing practicies not only in scala
PPTX
Practices of agile developers
PDF
Implementing Quality on a Java Project
PDF
Why Test Driven Development?
PPTX
Teaching Kids Programming
PDF
Usability Test Results Xtext New Project Wizard
PPTX
Test Driven Development (TDD) Preso 360|Flex 2010
PPTX
Creating testing tools to support development
PDF
Hey You Got Your TDD in my SQL DB by Jeff McKenzie
ODP
xUnit and TDD: Why and How in Enterprise Software, August 2012
PPTX
TDD That Was Easy!
PPT
Metamorphosis from Forms to Java: a technical lead's perspective
Test Driven Development by Denis Lutz
Scrum and Test-driven development
Codeception Testing Framework -- English #phpkansai
Agile testing overview
Weekly #106: Deep Learning on Mobile
Test Driven iOS Development (TDD)
Agile Test Driven Development
Dream QA: Designing the QA team where we'd love to work
Testing practicies not only in scala
Practices of agile developers
Implementing Quality on a Java Project
Why Test Driven Development?
Teaching Kids Programming
Usability Test Results Xtext New Project Wizard
Test Driven Development (TDD) Preso 360|Flex 2010
Creating testing tools to support development
Hey You Got Your TDD in my SQL DB by Jeff McKenzie
xUnit and TDD: Why and How in Enterprise Software, August 2012
TDD That Was Easy!
Metamorphosis from Forms to Java: a technical lead's perspective
Ad

Similar to Improving How We Deliver Machine Learning Models (XCONF 2019) (20)

PPTX
Continuous Intelligence Workshop
PDF
Pitfalls of machine learning in production
PDF
Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...
PDF
Ml ops intro session
PPTX
DevOps for Machine Learning overview en-us
PDF
Productionising Machine Learning Models
PDF
CD4ML - ThoughtWorks MeetUp Munich Christoph Windheuser May 8th 2019
PDF
Continuous Delivery for Machine Learning
PDF
Making Netflix Machine Learning Algorithms Reliable
PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
PPTX
From Notebook to Production: What Most ML Tutorials Don’t Teach
PPTX
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
PPTX
Why is dev ops for machine learning so different - dataxdays
PPTX
Why do the majority of Data Science projects never make it to production?
PDF
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
PDF
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
PPTX
MLOps.pptx
PPTX
Why is dev ops for machine learning so different
PDF
C2_W1---.pdf
PDF
What are the Unique Challenges and Opportunities in Systems for ML?
Continuous Intelligence Workshop
Pitfalls of machine learning in production
Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...
Ml ops intro session
DevOps for Machine Learning overview en-us
Productionising Machine Learning Models
CD4ML - ThoughtWorks MeetUp Munich Christoph Windheuser May 8th 2019
Continuous Delivery for Machine Learning
Making Netflix Machine Learning Algorithms Reliable
Using MLOps to Bring ML to Production/The Promise of MLOps
From Notebook to Production: What Most ML Tutorials Don’t Teach
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
Why is dev ops for machine learning so different - dataxdays
Why do the majority of Data Science projects never make it to production?
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
MLOps.pptx
Why is dev ops for machine learning so different
C2_W1---.pdf
What are the Unique Challenges and Opportunities in Systems for ML?
Ad

Recently uploaded (20)

PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
medical staffing services at VALiNTRY
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Nekopoi APK 2025 free lastest update
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Reimagine Home Health with the Power of Agentic AI​
How to Migrate SBCGlobal Email to Yahoo Easily
Operating system designcfffgfgggggggvggggggggg
Odoo Companies in India – Driving Business Transformation.pdf
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Designing Intelligence for the Shop Floor.pdf
medical staffing services at VALiNTRY
Understanding Forklifts - TECH EHS Solution
Upgrade and Innovation Strategies for SAP ERP Customers
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Nekopoi APK 2025 free lastest update
Design an Analysis of Algorithms II-SECS-1021-03
Design an Analysis of Algorithms I-SECS-1021-03
2025 Textile ERP Trends: SAP, Odoo & Oracle
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Adobe Illustrator 28.6 Crack My Vision of Vector Design
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)

Improving How We Deliver Machine Learning Models (XCONF 2019)

  • 1. 23 April 2019 IMPROVING HOW WE DELIVER MACHINE LEARNING MODELS
  • 2. 2 WHO ARE WE? David Tan Developer @ ThoughtWorks Jonathan Heng Developer @ ThoughtWorks
  • 3. 3 THE PLAN TODAY Why do we need to improve the ML workflow? What are some better practices? How can we practice these practices?
  • 4. 4 TEMPERATURE CHECK Who has... ● Trained a ML model before? ● Deployed a ML model for fun? ● Deployed a ML model at work? ● Deployed a model using an automated CI pipeline?
  • 6. 6 OBSERVATION One of these is not like the others Continuous delivery practices can help us change this.
  • 7. 7 JOURNEY ON AN ML PROJECT Jupyter Notebooks Production ???
  • 8. 8 IT’S NEVER JUST ML We got 99 problems and machine learning ain’t one Potentially unethical outcomes How can we help people do difficult things? Source: Machine Learning: The High Interest Credit Card of Technical Debt (Google, 2015)
  • 9. 9 HELPING PEOPLE DO DIFFICULT THINGS Sensible defaults ● Two reference repos: ○ github.com/ThoughtWorksInc/ml-cd-starter-kit ○ github.com/ThoughtWorksInc/ml-app-template ● Better ways of working ○ 6 common problems and suggested solutions
  • 10. 10 LET’S GO 6 common problems and suggested solutions
  • 11. 1WORKS ON MY MACHINE 11
  • 12. 1212 1. WORKS ON MY MACHINE To start, simply: • docker build … • docker run ...
  • 13. 1313 1. WORKS ON MY MACHINE Demo
  • 15. Business problem ML model Feature engineering Data 1515 2. NO DATA / DATA SUCKS Mitigation measures ● Think about data access before starting project ● Collect better data with every release (more on this a few slides from now) ● “Wizard of Oz” / Fake it till we make it ○ Provide interface but without ML implementation
  • 17. 1717 3. DEPLOYMENTS ARE COMPLICATED Mitigation measures ● CI pipeline ● Deploy early and often ● Tracer bullet ● Bring the pain forward
  • 18. 1818 3. DEPLOY EARLY AND OFTEN Source: Continuous Delivery (Jez Humble, Dave Farley) feedback Run unit tests push Source code repository trigger Local env
  • 19. 1919 3. DEPLOY EARLY AND OFTEN feedback Run unit tests Train and evaluate model push Source code repository trigger Local env Source: Continuous Delivery (Jez Humble, Dave Farley)
  • 20. 2020 3. DEPLOY EARLY AND OFTEN feedback Run unit tests Deploy candidate model to staging Train and evaluate model push Source code repository trigger Local env Artifact repositor y Source: Continuous Delivery (Jez Humble, Dave Farley)
  • 21. 2121 3. DEPLOY EARLY AND OFTEN Run unit tests Deploy candidate model to staging Deploy model to production Train and evaluate model push Source code repository trigger Local env Artifact repositor y Source: Continuous Delivery (Jez Humble, Dave Farley)
  • 22. 22 4HOW DO WE CHOOSE BETWEEN CANDIDATE MODELS?
  • 23. 2323 4. CHOOSING BUILDS Include model evaluation metrics in CI pipeline
  • 25. 2525 5. OBSERVE! Monitoring service usage Benefit #1: Feedback on production model
  • 26. 2626 5. OBSERVE! Monitoring model output Benefit #1: Feedback on production model
  • 27. 2727 5. OBSERVE! Monitoring model inputs ● Could help identify training-serving skew Benefit #1: Feedback on production model
  • 28. 2828 5. OBSERVE! Benefit #2: Interpretability of predictions
  • 29. 2929 5. OBSERVE! Benefit #3: Closing the data collection loop Data turking Train & test model Deploy model Data / feature repository data Evaluate models Flow of data Flow of model Model Service Logs
  • 30. 30 5. OBSERVE! Benefit #4: Ability to measure goodness of any model build_and_ test deploy_ staging deploy_ prod evaluate_ model_w_new_data (git push) evaluate_ model model = my-image:$BUILD_ID r_2 = 0.7 rmse = 42
  • 31. 3131 5. OBSERVE! Benefit #4: Ability to measure goodness of any model
  • 32. 3232 5. HOW’S THE MODEL IN THE WILD? OBSERVE! Summing up ● Mitigation measures ○ Logging + Monitoring ● Benefits ○ Feedback on production models ○ Interpretability (how did the model decide on this particular prediction?) ○ Better data for training ○ Better (unseen) data for evaluating candidate/champion models
  • 33. 3333 5. HOW’S THE MODEL IN THE WILD? Demo GoCD MLFlow Kubernetes + Helm ElasticSearch Fluentd Kibana Grafana
  • 34. 34 5. HOW’S THE MODEL IN THE WILD? Demo
  • 36. 3636 6. HARMFUL MODELS IN PRODUCTION ● PredPol algorithm reinforces racial biases in policing data ● Recruiting tool shows bias against women Actual news headlines Image source: I’m an AI researcher, and here’s what scares me about AI (Rachel Thomas)
  • 37. 3737 ● Discuss and define what “bad” looks like in our context ● “Black mirror” retros ● Measure unfairness ○ Make fairness a measurable fitness function ● Data ethics checklist (link) ● Human-in-the-loop / appeal processes ● Ability to recover from harmful models 37 6. HARMFUL MODELS IN PRODUCTION Mitigation measures
  • 38. 38 6. HARMFUL MODELS IN PRODUCTION Demo: rollback to last good build
  • 39. 39 SUMMING UP How can we make easier to do the right thing?
  • 40. 40 MAKE IT EASIER TO DO THE RIGHT THING ● Better ways of working ○ Environment management ○ Closing the data collection loop ○ Deploy early and often ○ Automated tracking of hyperparameters and metrics ○ Logging and monitoring ○ Do no harm ● Two reference repos: ○ github.com/ThoughtWorksInc/ml-cd-starter-kit ○ github.com/ThoughtWorksInc/ml-app-template
  • 41. 4141 Provision and configure cross-cutting services GoCD EFKG MLFlow github.com/ThoughtWorksInc/ml-cd-starter-kit github.com/ThoughtWorksInc/ml-app-template Project boilerplate template Unit tests Train model Test model metrics Dockerised setup Store CI pipeline as code Track hyperparameters and metrics of each training run on CI Logging (predictions, inputs, explanatory variables)
  • 42. 424242 SUMMING UP Notebook / playgroun d PROD (maybe ) commit and push Experiment / Develop Monitor Deploy Test Continuous Delivery
  • 44. David Tan / Jonathan Heng davidtan+jonheng@thoughtworks.co m THANK YOU. 44