SlideShare a Scribd company logo
“Deployment for free”:
removing the need to write model
deployment code at Stitch Fix
mlconf April 2021
Stefan Krawczyk
#mlconf #MLOps #machinelearning
@stefkrawczyk
linkedin.com/in/skrawczyk
Try out Stitch Fix → goo.gl/Q3tCQ3
> Stitch Fix
“Deployment for free”
Model Envelope & envelope mechanics
Impact of being on-call
Summary & Future Work
#mlconf #MLOps #machinelearning
Stitch Fix is a personal styling service
Key points:
1. Very algorithmically driven company
2. Single DS Department: Algorithms (145+)
3. “Full Stack Data Science”
a. No reimplementation handoff
b. End to end ownership
c. Built on top of data platform tools & abstractions.
For more information: https://algorithms-tour.stitchfix.com/ & https://cultivating-algos.stitchfix.com/
3
#mlconf #MLOps #machinelearning
Where do I fit in?
Stefan Krawczyk
Mgr. Data Platform - Model Lifecycle
4
Pre-covid look
#mlconf #MLOps #machinelearning
Stitch Fix
> “Deployment for free”
Model Envelope & envelope mechanics
Impact of being on-call
Summary & Future Work
#mlconf #MLOps #machinelearning
Typical Model Deployment Process
6
#mlconf #MLOps #machinelearning
● Many ways to approach.
● Heavily impacts MLOps.
Model Deployment at Stitch Fix
7
#mlconf #MLOps #machinelearning
Once a model is in
an envelope...
This comes for free!
Who owns what?
8
#mlconf #MLOps #machinelearning
DS Concerns Platform Concerns DS Concerns
Deployments are “triggered”
9
#mlconf #MLOps #machinelearning
DS Concerns Platform Concerns DS Concerns
Guess who is on-call?
Reality: two steps to get a model to production
10
#mlconf #MLOps #machinelearning
Self-service: takes <1 hour
No code is written!
Can be a terminal point.
Step 1 Step 2
Step 1. save a model via Model Envelope API
11
#mlconf #MLOps #machinelearning
etl.py
import model_envelope as me
from sklearn import linear_model
df_X, df_y = load_data_somehow()
model = linear_model.LogisticRegression(multi_class='auto')
model.fit(df_X, df_y)
my_envelope = me.save_model(instance_name='my_model_instance_name',
instance_description='my_model_instance_description',
model=model,
query_function='predict',
api_input=df_X, api_output=df_y,
tags={'canonical_name':'foo-bar'})
Note: no deployment trigger in ETL code.
Step 2a. deploy model as a microservice
12
#mlconf #MLOps #machinelearning
Go to Model Envelope Registry UI:
1) Create deployment configuration.
2) Create Rule for auto deployment.
a) Else query for model & hit deploy.
3) Done.
Result:
● Web service with API endpoints
○ Comes with a Swagger UI & schema →
● Model in production < 1 hour.
Step 2b. deploy model as a batch task
13
#mlconf #MLOps #machinelearning
Create workflow configuration:
1) Create batch inference task in workflow.
a) Specify Rule & inputs + outputs.
2) Deploy workflow.
3) Done.
Result:
● Spark or Python task that creates a table.
● We keep an inference log.
● Model in production < 1 hour.
Stitch Fix
“Deployment for free”
> Model Envelope & envelope mechanics
Impact of being on-call
Summary & Future Work
#mlconf #MLOps #machinelearning
Q: What is the Model Envelope? A: It’s a container.
15
Enables treating the
model as a “black box”
#mlconf #MLOps #machinelearning
Enables thinking about models as a “black box”.
🤔 Wait this feels familiar?
16
You: “MLFlow much?”
Me: Yes & No.
This is all internal code -- nothing from open source.
In terms of functionality we’re closer to a mix of:
● MLFlow
● ModelDB
● TFX
But this talk is too short to cover everything...
#mlconf #MLOps #machinelearning
Typical Model Envelope use
1. call save_model() right after model creation in an ETL.
2. also have APIs to save metrics & hyperparameters, and retrieve envelopes.
3. once in an ✉ information is immutable except:
a. tags -- for curative purposes.
b. metrics -- can add/adjust metrics.
#mlconf #MLOps #machinelearning 17
What does save_model() do?
#mlconf #MLOps #machinelearning 18
1
2
3
4
What does save_model() do?
Let’s dive deeper into these.
#mlconf #MLOps #machinelearning 19
1
2
3
4
How do we infer a Model API Schema?
Goal: infer from code rather than explicit specification.
Require either fully annotated functions with only python/typing standard types:
#mlconf #MLOps #machinelearning 20
def good_predict_function(self, x: float, y: List[int]) -> List[float]:
def predict_needs_examples_function(self, x: pd.Dataframe, y):
my_envelope = me.save_model(instance_name='my_model_instance_name',
instance_description='my_model_instance_description',
model=model,
query_function='predict',
api_input=df_X, api_output=df_y,
tags={'canonical_name':'foo-bar'})
Or, example inputs that are inspected to get a schema from:
required for DF inputs →
Model API Schema - Under the hood
● One of the most complex parts of the code base (90%+ test coverage!)
● We make heavy use of the typing_inspect module & isinstance().
○ We create a schema similar to TFX.
● Key component to enable exercising models in different contexts.
○ Enables code creation and input/output validation.
● Current limitations: no default values, one function per envelope.
21
#mlconf #MLOps #machinelearning
How do we capture python dependencies?
import model_envelope as me
from sklearn import linear_model
df_X, df_y = load_data_somehow()
model = linear_model.LogisticRegression(multi_class='auto')
model.fit(df_X, df_y)
my_envelope = me.save_model(instance_name='my_model_instance_name',
instance_description='my_model_instance_description',
model=model,
query_function='predict',
api_input=df_X, api_output=df_y,
tags={'canonical_name':'foo-bar'})
22
#mlconf #MLOps #machinelearning
Point: no explicit passing of scikit-learn to save_model().
How do we capture python dependencies?
23
#mlconf #MLOps #machinelearning
Assumption:
We all run on the same* base linux environment in training & production.
Store the following in the Model Envelope:
● Result of import sys; sys.version_info
● Results of > pip freeze
● Results of > conda list --export
Local python modules (not installable):
● Add modules as part of save_model() call.
● We store them with the model bytes.
How do we build the python deployment env.?
24
#mlconf #MLOps #machinelearning
Filter:
● hard coded list of dependencies to filter. E.g. jupyterhub.
● upkeep cheap; add/update every few months.
Stitch Fix
“Deployment for free”
Model Envelope & envelope mechanics
> Impact of being on-call
Summary & Future Work
#mlconf #MLOps #machinelearning
Remember this split:
26
#mlconf #MLOps #machinelearning
DS Concerns Platform Concerns DS Concerns
My team is on-call for
Impact of being on-call
27
#mlconf #MLOps #machinelearning
Two truths:
● No one wants to be paged.
● No one wants to be paged for a model they didn’t write!
But, this incentivizes Platform to build out MLOps capabilities:
● Capture bad models before they’re deployed!
● Enable observability, monitoring, and alerting to speed up debugging.
Luckily we have autonomy and freedom to do so!
a pager
What can we change?
28
#mlconf #MLOps #machinelearning
API
Automatic capture == license to change:
● Model API schema
● Dependency capture
● Environment info: git, job, etc.
Incentives for DS to additionally provide:
● Datasets for analysis
● Metrics
● Tags
Deployment
MLOps approaches to:
● Model validation
● Model deployment & rollback
● Model deployment vehicle:
○ From logging, monitoring, alerting
○ To architecture: microservice, or Ray, or?
● Dashboarding/UIs
Overarching benefit
29
#mlconf #MLOps #machinelearning
1. Data Scientists get to focus more on modeling.
a. more business wins.
2. Platform focuses on MLOps:
a. can be a rising tide that raises all boats!
Stitch Fix
“Deployment for free”
Model Envelope & envelope mechanics
Impact of being on-call
> Summary & Future Work
#mlconf #MLOps #machinelearning
We enable deployment for free by:
● Capturing a comprehensive model artifact we call the Model Envelope.
● The Model Envelope facilitates code & environment generation for model deployment.
● Platform owns the Model Envelope and is on-call for generated services & tasks.
Business wins:
● Data Scientists get to focus more on modeling.
● Platform is incentivized to improve and iterate on MLOps practices.
Summary - “Deployment for free”
#mlconf #MLOps #machinelearning 31
Future Work
● Better MLOps features:
○ Observability, scalable data capture, & alerting.
○ Model Validation & CD patterns.
● “Models on Rails”:
○ Target specific SLA requirements.
● Configuration driven model creation:
○ Abstract away glue code required to train & save models.
#mlconf #MLOps #machinelearning 32
Thank you! We’re hiring! Questions?
Try out Stitch Fix → goo.gl/Q3tCQ3
@stefkrawczyk
linkedin.com/in/skrawczyk
#mlconf #MLOps #machinelearning

More Related Content

PDF
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
PDF
FlinkML - Big data application meetup
PDF
FlinkML: Large Scale Machine Learning with Apache Flink
PDF
Kubernetes as data platform
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
PDF
Testing data streaming applications
PDF
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
PDF
Mlflow with databricks
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
FlinkML - Big data application meetup
FlinkML: Large Scale Machine Learning with Apache Flink
Kubernetes as data platform
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Testing data streaming applications
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
Mlflow with databricks

What's hot (20)

PDF
Machine Learning with Apache Flink at Stockholm Machine Learning Group
PDF
Denys Kovalenko "Scaling Data Science at Bolt"
PDF
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
PDF
Engineering data quality
PDF
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
PDF
Continuous delivery for machine learning
PPTX
SICS: Apache Flink Streaming
PDF
Data pipelines from zero to solid
PDF
Workflow Engines + Luigi
PPTX
Streaming Distributed Data Processing with Silk #deim2014
PDF
Ufuc Celebi – Stream & Batch Processing in one System
PPTX
GraphQL - The new "Lingua Franca" for API-Development
PDF
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
PDF
Full Stack Graph in the Cloud
PDF
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
PDF
Signals from outer space
PPTX
Extending the Yahoo Streaming Benchmark
PDF
Crafting APIs
PDF
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
PDF
Alexander Kolb – Flink. Yet another Streaming Framework?
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Denys Kovalenko "Scaling Data Science at Bolt"
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Engineering data quality
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous delivery for machine learning
SICS: Apache Flink Streaming
Data pipelines from zero to solid
Workflow Engines + Luigi
Streaming Distributed Data Processing with Silk #deim2014
Ufuc Celebi – Stream & Batch Processing in one System
GraphQL - The new "Lingua Franca" for API-Development
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Full Stack Graph in the Cloud
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Signals from outer space
Extending the Yahoo Streaming Benchmark
Crafting APIs
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Alexander Kolb – Flink. Yet another Streaming Framework?
Ad

Similar to "Deployment for free": removing the need to write model deployment code at Stitch Fix (20)

PDF
Pitfalls of machine learning in production
PDF
MLFlow: Platform for Complete Machine Learning Lifecycle
PDF
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
PPTX
Building APIs with Mule and Spring Boot
PDF
Creating a custom Machine Learning Model for your applications - Java Dev Day...
PPTX
Why is dev ops for machine learning so different
PDF
The A-Z of Data: Introduction to MLOps
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PDF
Creating a Custom ML Model for your Application - Kotlin/Everywhere
PPTX
From Data Science to MLOps
PDF
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
PDF
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
PDF
"Managing the Complete Machine Learning Lifecycle with MLflow"
PDF
Productionalizing Models through CI/CD Design with MLflow
PDF
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
PDF
Consolidating MLOps at One of Europe’s Biggest Airports
PPTX
Why is dev ops for machine learning so different - dataxdays
PPTX
Notes on Deploying Machine-learning Models at Scale
PDF
“Houston, we have a model...” Introduction to MLOps
PDF
MLflow with Databricks
Pitfalls of machine learning in production
MLFlow: Platform for Complete Machine Learning Lifecycle
Webinar: Começando seus trabalhos com Machine Learning utilizando ferramentas...
Building APIs with Mule and Spring Boot
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Why is dev ops for machine learning so different
The A-Z of Data: Introduction to MLOps
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Creating a Custom ML Model for your Application - Kotlin/Everywhere
From Data Science to MLOps
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
"Managing the Complete Machine Learning Lifecycle with MLflow"
Productionalizing Models through CI/CD Design with MLflow
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Consolidating MLOps at One of Europe’s Biggest Airports
Why is dev ops for machine learning so different - dataxdays
Notes on Deploying Machine-learning Models at Scale
“Houston, we have a model...” Introduction to MLOps
MLflow with Databricks
Ad

Recently uploaded (20)

PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Essential Infomation Tech presentation.pptx
PDF
Digital Strategies for Manufacturing Companies
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PPTX
ISO 45001 Occupational Health and Safety Management System
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
DOCX
The Five Best AI Cover Tools in 2025.docx
PPTX
Materi_Pemrograman_Komputer-Looping.pptx
PDF
System and Network Administraation Chapter 3
PPT
JAVA ppt tutorial basics to learn java programming
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
System and Network Administration Chapter 2
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PTS Company Brochure 2025 (1).pdf.......
How Creative Agencies Leverage Project Management Software.pdf
Essential Infomation Tech presentation.pptx
Digital Strategies for Manufacturing Companies
How to Choose the Right IT Partner for Your Business in Malaysia
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Which alternative to Crystal Reports is best for small or large businesses.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Materi-Enum-and-Record-Data-Type (1).pptx
ISO 45001 Occupational Health and Safety Management System
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
The Five Best AI Cover Tools in 2025.docx
Materi_Pemrograman_Komputer-Looping.pptx
System and Network Administraation Chapter 3
JAVA ppt tutorial basics to learn java programming
Odoo POS Development Services by CandidRoot Solutions
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
System and Network Administration Chapter 2

"Deployment for free": removing the need to write model deployment code at Stitch Fix

  • 1. “Deployment for free”: removing the need to write model deployment code at Stitch Fix mlconf April 2021 Stefan Krawczyk #mlconf #MLOps #machinelearning @stefkrawczyk linkedin.com/in/skrawczyk Try out Stitch Fix → goo.gl/Q3tCQ3
  • 2. > Stitch Fix “Deployment for free” Model Envelope & envelope mechanics Impact of being on-call Summary & Future Work #mlconf #MLOps #machinelearning
  • 3. Stitch Fix is a personal styling service Key points: 1. Very algorithmically driven company 2. Single DS Department: Algorithms (145+) 3. “Full Stack Data Science” a. No reimplementation handoff b. End to end ownership c. Built on top of data platform tools & abstractions. For more information: https://algorithms-tour.stitchfix.com/ & https://cultivating-algos.stitchfix.com/ 3 #mlconf #MLOps #machinelearning
  • 4. Where do I fit in? Stefan Krawczyk Mgr. Data Platform - Model Lifecycle 4 Pre-covid look #mlconf #MLOps #machinelearning
  • 5. Stitch Fix > “Deployment for free” Model Envelope & envelope mechanics Impact of being on-call Summary & Future Work #mlconf #MLOps #machinelearning
  • 6. Typical Model Deployment Process 6 #mlconf #MLOps #machinelearning ● Many ways to approach. ● Heavily impacts MLOps.
  • 7. Model Deployment at Stitch Fix 7 #mlconf #MLOps #machinelearning Once a model is in an envelope... This comes for free!
  • 8. Who owns what? 8 #mlconf #MLOps #machinelearning DS Concerns Platform Concerns DS Concerns
  • 9. Deployments are “triggered” 9 #mlconf #MLOps #machinelearning DS Concerns Platform Concerns DS Concerns Guess who is on-call?
  • 10. Reality: two steps to get a model to production 10 #mlconf #MLOps #machinelearning Self-service: takes <1 hour No code is written! Can be a terminal point. Step 1 Step 2
  • 11. Step 1. save a model via Model Envelope API 11 #mlconf #MLOps #machinelearning etl.py import model_envelope as me from sklearn import linear_model df_X, df_y = load_data_somehow() model = linear_model.LogisticRegression(multi_class='auto') model.fit(df_X, df_y) my_envelope = me.save_model(instance_name='my_model_instance_name', instance_description='my_model_instance_description', model=model, query_function='predict', api_input=df_X, api_output=df_y, tags={'canonical_name':'foo-bar'}) Note: no deployment trigger in ETL code.
  • 12. Step 2a. deploy model as a microservice 12 #mlconf #MLOps #machinelearning Go to Model Envelope Registry UI: 1) Create deployment configuration. 2) Create Rule for auto deployment. a) Else query for model & hit deploy. 3) Done. Result: ● Web service with API endpoints ○ Comes with a Swagger UI & schema → ● Model in production < 1 hour.
  • 13. Step 2b. deploy model as a batch task 13 #mlconf #MLOps #machinelearning Create workflow configuration: 1) Create batch inference task in workflow. a) Specify Rule & inputs + outputs. 2) Deploy workflow. 3) Done. Result: ● Spark or Python task that creates a table. ● We keep an inference log. ● Model in production < 1 hour.
  • 14. Stitch Fix “Deployment for free” > Model Envelope & envelope mechanics Impact of being on-call Summary & Future Work #mlconf #MLOps #machinelearning
  • 15. Q: What is the Model Envelope? A: It’s a container. 15 Enables treating the model as a “black box” #mlconf #MLOps #machinelearning Enables thinking about models as a “black box”.
  • 16. 🤔 Wait this feels familiar? 16 You: “MLFlow much?” Me: Yes & No. This is all internal code -- nothing from open source. In terms of functionality we’re closer to a mix of: ● MLFlow ● ModelDB ● TFX But this talk is too short to cover everything... #mlconf #MLOps #machinelearning
  • 17. Typical Model Envelope use 1. call save_model() right after model creation in an ETL. 2. also have APIs to save metrics & hyperparameters, and retrieve envelopes. 3. once in an ✉ information is immutable except: a. tags -- for curative purposes. b. metrics -- can add/adjust metrics. #mlconf #MLOps #machinelearning 17
  • 18. What does save_model() do? #mlconf #MLOps #machinelearning 18 1 2 3 4
  • 19. What does save_model() do? Let’s dive deeper into these. #mlconf #MLOps #machinelearning 19 1 2 3 4
  • 20. How do we infer a Model API Schema? Goal: infer from code rather than explicit specification. Require either fully annotated functions with only python/typing standard types: #mlconf #MLOps #machinelearning 20 def good_predict_function(self, x: float, y: List[int]) -> List[float]: def predict_needs_examples_function(self, x: pd.Dataframe, y): my_envelope = me.save_model(instance_name='my_model_instance_name', instance_description='my_model_instance_description', model=model, query_function='predict', api_input=df_X, api_output=df_y, tags={'canonical_name':'foo-bar'}) Or, example inputs that are inspected to get a schema from: required for DF inputs →
  • 21. Model API Schema - Under the hood ● One of the most complex parts of the code base (90%+ test coverage!) ● We make heavy use of the typing_inspect module & isinstance(). ○ We create a schema similar to TFX. ● Key component to enable exercising models in different contexts. ○ Enables code creation and input/output validation. ● Current limitations: no default values, one function per envelope. 21 #mlconf #MLOps #machinelearning
  • 22. How do we capture python dependencies? import model_envelope as me from sklearn import linear_model df_X, df_y = load_data_somehow() model = linear_model.LogisticRegression(multi_class='auto') model.fit(df_X, df_y) my_envelope = me.save_model(instance_name='my_model_instance_name', instance_description='my_model_instance_description', model=model, query_function='predict', api_input=df_X, api_output=df_y, tags={'canonical_name':'foo-bar'}) 22 #mlconf #MLOps #machinelearning Point: no explicit passing of scikit-learn to save_model().
  • 23. How do we capture python dependencies? 23 #mlconf #MLOps #machinelearning Assumption: We all run on the same* base linux environment in training & production. Store the following in the Model Envelope: ● Result of import sys; sys.version_info ● Results of > pip freeze ● Results of > conda list --export Local python modules (not installable): ● Add modules as part of save_model() call. ● We store them with the model bytes.
  • 24. How do we build the python deployment env.? 24 #mlconf #MLOps #machinelearning Filter: ● hard coded list of dependencies to filter. E.g. jupyterhub. ● upkeep cheap; add/update every few months.
  • 25. Stitch Fix “Deployment for free” Model Envelope & envelope mechanics > Impact of being on-call Summary & Future Work #mlconf #MLOps #machinelearning
  • 26. Remember this split: 26 #mlconf #MLOps #machinelearning DS Concerns Platform Concerns DS Concerns My team is on-call for
  • 27. Impact of being on-call 27 #mlconf #MLOps #machinelearning Two truths: ● No one wants to be paged. ● No one wants to be paged for a model they didn’t write! But, this incentivizes Platform to build out MLOps capabilities: ● Capture bad models before they’re deployed! ● Enable observability, monitoring, and alerting to speed up debugging. Luckily we have autonomy and freedom to do so! a pager
  • 28. What can we change? 28 #mlconf #MLOps #machinelearning API Automatic capture == license to change: ● Model API schema ● Dependency capture ● Environment info: git, job, etc. Incentives for DS to additionally provide: ● Datasets for analysis ● Metrics ● Tags Deployment MLOps approaches to: ● Model validation ● Model deployment & rollback ● Model deployment vehicle: ○ From logging, monitoring, alerting ○ To architecture: microservice, or Ray, or? ● Dashboarding/UIs
  • 29. Overarching benefit 29 #mlconf #MLOps #machinelearning 1. Data Scientists get to focus more on modeling. a. more business wins. 2. Platform focuses on MLOps: a. can be a rising tide that raises all boats!
  • 30. Stitch Fix “Deployment for free” Model Envelope & envelope mechanics Impact of being on-call > Summary & Future Work #mlconf #MLOps #machinelearning
  • 31. We enable deployment for free by: ● Capturing a comprehensive model artifact we call the Model Envelope. ● The Model Envelope facilitates code & environment generation for model deployment. ● Platform owns the Model Envelope and is on-call for generated services & tasks. Business wins: ● Data Scientists get to focus more on modeling. ● Platform is incentivized to improve and iterate on MLOps practices. Summary - “Deployment for free” #mlconf #MLOps #machinelearning 31
  • 32. Future Work ● Better MLOps features: ○ Observability, scalable data capture, & alerting. ○ Model Validation & CD patterns. ● “Models on Rails”: ○ Target specific SLA requirements. ● Configuration driven model creation: ○ Abstract away glue code required to train & save models. #mlconf #MLOps #machinelearning 32
  • 33. Thank you! We’re hiring! Questions? Try out Stitch Fix → goo.gl/Q3tCQ3 @stefkrawczyk linkedin.com/in/skrawczyk #mlconf #MLOps #machinelearning