SlideShare a Scribd company logo
6
Most read
7
Most read
16
Most read
MLOps pipelines using MLFlow - From training to production
Dr. Andreas Weiden, skillbyte
CAIML#24, April 13th, 2023
Problem Description
Team A, a (majority) Data Scientist team, creates many machine learning models
Team B, a (majority) Data Engineer team, needs to deploy these models into production and use them for a
recommender system
So two problems:
1. Technical
Need to deploy these models to multiple targets (→ this talk)
2. Organizational
Need to make two teams work together (→ not this talk)
MLOps
Got its name from DevOps and GitOps
Continuous training and deployment for machine learning systems
ml-ops.org
Pipelines
Created and maintained by Team A
Daily training runs, since fresh data is constantly coming in
Output various artifacts which are needed by the prediction services, run by Team B
Examples of what the Data Science Magic™ can be:
popularity of items
user embeddings from user-item interactions
item embeddings from item descriptions
Need somewhere to store the outputs of those
pipelines
And deploy them, too
Manage end-to-end machine learning
lifecycle
Open source: Github
Four pillars:
Tracking
Log parameters, code versions,
metrics, artifacts
Projects
Models
Registry
Basic unit is a Run
Whenever your pipeline runs, a new Run is created
Runs can be grouped under Experiments
You can add arbitrary data to a Run as well as the output
artifacts
import mlflow
import datetime, numpy as np, pickle, random
from tempfile import TemporaryDirectory
mlflow.set_experiment("Pipeline A")
run_name = f"Pipeline A {datetime.datetime.now().isoformat()}"
tags = {"version": "0.0.1"}
with mlflow.start_run(run_name=run_name, tags=tags) as run:
mlflow.log_param("ndims", 1024)
mlflow.log_metric("recall", random.random())
with TemporaryDirectory() as temp_dir:
with open(f"{temp_dir}/out.pickle", "wb") as f:
pickle.dump([np.random.rand(1024) for _ in range(100)], f)
mlflow.log_artifacts(temp_dir)
Manage end-to-end machine learning
lifecycle
Open source: Github
Four pillars:
Tracking
Log parameters, code versions,
metrics, artifacts
Projects
Models
Registry
Basic unit is a Run
Whenever your pipeline runs, a new Run is created
Runs can be grouped under Experiments
You can add arbitrary data to a Run as well as the output
artifacts
Manage end-to-end machine learning
lifecycle
Open source: Github
Four pillars:
Tracking
Log parameters, code versions,
metrics, artifacts
Projects
Package Data Science code including
dependencies
Git and containerization already does
this, if you lock your dependencies,
which you should
Models
Registry
Manage end-to-end machine learning
lifecycle
Open source: Github
Four pillars:
Tracking
Log parameters, code versions,
metrics, artifacts
Projects
Package Data Science code including
dependencies
Git and containerization already does
this, if you lock your dependencies,
which you should
Models
Package ML models and deploy them
Containers and/or model artifacts
Registry
A standard format for packaging machine learning models
that can be used in a variety of downstream tools e.g.
real-time serving through a REST API
batch inference on Apache Spark
Saves model specific data and environment data:
# Directory written by mlflow.sklearn.save_model(model, "my_model")
my_model/
├── MLmodel
├── model.pkl
├── conda.yaml
├── python_env.yaml
└── requirements.txt
Manage end-to-end machine learning
lifecycle
Open source: Github
Four pillars:
Tracking
Log parameters, code versions,
metrics, artifacts
Projects
Package Data Science code including
dependencies
Git and containerization already does
this, if you lock your dependencies,
which you should
Models
Package ML models and deploy them
Containers and/or model artifacts
Registry
Model storage and lifecycle
(versioning, stage transitions)
Can associate runs with a Model:
import mlflow
with mlflow.start_run() as run:
...
mlflow.register_model(f"runs:/{run.info.run_id}", "Model A")
Manage end-to-end machine learning
lifecycle
Open source: Github
Four pillars:
Tracking
Log parameters, code versions,
metrics, artifacts
Projects
Package Data Science code including
dependencies
Git and containerization already does
this, if you lock your dependencies,
which you should
Models
Package ML models and deploy them
Containers and/or model artifacts
Registry
Model storage and lifecycle
(versioning, stage transitions)
Sounds interesting, but only very
rudimentary
Deployment options
ml-ops.org
Allow deploying arbitrary machine learning
artifacts
Full control of the API
Not so fast…
… managed vector stores are also a thing
They give you e.g.
pre-filtering
a full-blown query syntax
Quite a few options exist nowadays
Elasticsearch
Google Vertex AI
RediSearch
Milvus
…
→ Need a generic way to deploy machine learning artifacts to
multiple targets
Watcher
Simple microservice that periodically polls the MLFlow registry
Pushes the updated artifacts to all ML deployment options
Uploads embeddings to managed databases
Updates ConfigMap definitions with the correct Run ID per model and environment
Deployment targets
Elasticsearch Google Vertex AI Kubernetes
ConfigMap
POST /${ES_INDEX}/_doc/ HTTP/1.1
Host: ${ES_URL}
Content-Type: application/json
{
"id": "foo",
"vector": [1, 2, 3],
"popularity": 123,
"type": "bar"
}
GET /v1/${INDEX_URL}:upsertDatapoint HTTP/1.1
Host: ${VERTEX_ENDPOINT}
Content-Type: application/json
Authorization: Bearer `gcloud auth print-access-token`
{
"datapoints": [
{
"datapoint_id": "foo",
"feature_vector": [1, 2, 3],
"restricts": {
"namespace": "type",
"allow_list": ["bar"]
}
}
]
}
apiVersion: v1
kind: ConfigMap
metadata:
name: foo-configmap
data:
RUN_ID: "1234abcde"
Configmap reloader
https://github.com/stakater/Reloader
Ensures that all deployments that rely on a ConfigMap get restarted whenever
that ConfigMap changes
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
configmap.reloader.stakater.com/reload: "foo-configmap"
spec:
template:
spec:
containers:
- name: foo
image: foo:0.0.1
env:
- name: MLFLOW_RUN_ID
valueFrom:
configMapKeyRef:
name: foo-configmap
key: RUN_ID
apiVersion: v1
kind: ConfigMap
metadata:
name: foo-configmap
data:
RUN_ID: "1234abcde"
Alternatives
Possible alternatives that we considered:
Model deployment directly through MLFlow
Seldon Core
AWS SageMaker
(Got more? Let me know!)
However, they all have the same drawbacks:
No control over final images
Image size not optimised
Custom logging, metrics, tracing, … difficult
No control over API
Only deploy to REST-APIs, but we also want other targets
Summary
Embeddings and models are centrally produced → Need some central model storage
Each of the targets supports training and or deploying ML models individually, none of them support doing
so for all targets
→ If your needs are diverse enough, you may need to roll your own (ML deployment)
→ But use existing tools where applicable
Questions
or

More Related Content

PDF
Carbohydrates and Classification of Carbohydrates
PPTX
Cambridge igcse biology ( 2016 2018) cell
PPTX
Sécurisation de vos applications web à l’aide du composant Security de Symfony
PPTX
Photosynthesis and the leaf - starch test
PPTX
Ch10 study of compounds nitric acid
PPTX
Biological molecules 2018
PPTX
Introduction to acids,bases and salts
PPTX
CARBOHYDRATES
Carbohydrates and Classification of Carbohydrates
Cambridge igcse biology ( 2016 2018) cell
Sécurisation de vos applications web à l’aide du composant Security de Symfony
Photosynthesis and the leaf - starch test
Ch10 study of compounds nitric acid
Biological molecules 2018
Introduction to acids,bases and salts
CARBOHYDRATES

Similar to MLOps pipelines using MLFlow - From training to production (20)

PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PDF
Mlflow with databricks
PDF
MLflow with Databricks
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
PPTX
Serverless machine learning architectures at Helixa
PPTX
Machine Learning for .NET Developers - ADC21
PDF
"Managing the Complete Machine Learning Lifecycle with MLflow"
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
PDF
MLFlow 1.0 Meetup
PDF
Un puente enre MLops y Devops con Openshift AI
PDF
Scaling up Machine Learning Development
PDF
Utilisation de MLflow pour le cycle de vie des projet Machine learning
PDF
Tech leaders guide to effective building of machine learning products
PDF
DEVOPS AND MACHINE LEARNING
PDF
MLFlow: Platform for Complete Machine Learning Lifecycle
PDF
Introduction to MLflow
PDF
Hydrosphere.io for ODSC: Webinar on Kubeflow
PPTX
Azure machine learning service
PDF
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
PPTX
Apache Eagle in Action
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Mlflow with databricks
MLflow with Databricks
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Serverless machine learning architectures at Helixa
Machine Learning for .NET Developers - ADC21
"Managing the Complete Machine Learning Lifecycle with MLflow"
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLFlow 1.0 Meetup
Un puente enre MLops y Devops con Openshift AI
Scaling up Machine Learning Development
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Tech leaders guide to effective building of machine learning products
DEVOPS AND MACHINE LEARNING
MLFlow: Platform for Complete Machine Learning Lifecycle
Introduction to MLflow
Hydrosphere.io for ODSC: Webinar on Kubeflow
Azure machine learning service
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Apache Eagle in Action
Ad

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Computer network topology notes for revision
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Analytics and business intelligence.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
IB Computer Science - Internal Assessment.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Foundation of Data Science unit number two notes
Computer network topology notes for revision
Qualitative Qantitative and Mixed Methods.pptx
annual-report-2024-2025 original latest.
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Introduction-to-Cloud-ComputingFinal.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Fluorescence-microscope_Botany_detailed content
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Ad

MLOps pipelines using MLFlow - From training to production

  • 1. MLOps pipelines using MLFlow - From training to production Dr. Andreas Weiden, skillbyte CAIML#24, April 13th, 2023
  • 2. Problem Description Team A, a (majority) Data Scientist team, creates many machine learning models Team B, a (majority) Data Engineer team, needs to deploy these models into production and use them for a recommender system So two problems: 1. Technical Need to deploy these models to multiple targets (→ this talk) 2. Organizational Need to make two teams work together (→ not this talk)
  • 3. MLOps Got its name from DevOps and GitOps Continuous training and deployment for machine learning systems ml-ops.org
  • 4. Pipelines Created and maintained by Team A Daily training runs, since fresh data is constantly coming in Output various artifacts which are needed by the prediction services, run by Team B Examples of what the Data Science Magic™ can be: popularity of items user embeddings from user-item interactions item embeddings from item descriptions Need somewhere to store the outputs of those pipelines And deploy them, too
  • 5. Manage end-to-end machine learning lifecycle Open source: Github Four pillars: Tracking Log parameters, code versions, metrics, artifacts Projects Models Registry Basic unit is a Run Whenever your pipeline runs, a new Run is created Runs can be grouped under Experiments You can add arbitrary data to a Run as well as the output artifacts import mlflow import datetime, numpy as np, pickle, random from tempfile import TemporaryDirectory mlflow.set_experiment("Pipeline A") run_name = f"Pipeline A {datetime.datetime.now().isoformat()}" tags = {"version": "0.0.1"} with mlflow.start_run(run_name=run_name, tags=tags) as run: mlflow.log_param("ndims", 1024) mlflow.log_metric("recall", random.random()) with TemporaryDirectory() as temp_dir: with open(f"{temp_dir}/out.pickle", "wb") as f: pickle.dump([np.random.rand(1024) for _ in range(100)], f) mlflow.log_artifacts(temp_dir)
  • 6. Manage end-to-end machine learning lifecycle Open source: Github Four pillars: Tracking Log parameters, code versions, metrics, artifacts Projects Models Registry Basic unit is a Run Whenever your pipeline runs, a new Run is created Runs can be grouped under Experiments You can add arbitrary data to a Run as well as the output artifacts
  • 7. Manage end-to-end machine learning lifecycle Open source: Github Four pillars: Tracking Log parameters, code versions, metrics, artifacts Projects Package Data Science code including dependencies Git and containerization already does this, if you lock your dependencies, which you should Models Registry
  • 8. Manage end-to-end machine learning lifecycle Open source: Github Four pillars: Tracking Log parameters, code versions, metrics, artifacts Projects Package Data Science code including dependencies Git and containerization already does this, if you lock your dependencies, which you should Models Package ML models and deploy them Containers and/or model artifacts Registry A standard format for packaging machine learning models that can be used in a variety of downstream tools e.g. real-time serving through a REST API batch inference on Apache Spark Saves model specific data and environment data: # Directory written by mlflow.sklearn.save_model(model, "my_model") my_model/ ├── MLmodel ├── model.pkl ├── conda.yaml ├── python_env.yaml └── requirements.txt
  • 9. Manage end-to-end machine learning lifecycle Open source: Github Four pillars: Tracking Log parameters, code versions, metrics, artifacts Projects Package Data Science code including dependencies Git and containerization already does this, if you lock your dependencies, which you should Models Package ML models and deploy them Containers and/or model artifacts Registry Model storage and lifecycle (versioning, stage transitions) Can associate runs with a Model: import mlflow with mlflow.start_run() as run: ... mlflow.register_model(f"runs:/{run.info.run_id}", "Model A")
  • 10. Manage end-to-end machine learning lifecycle Open source: Github Four pillars: Tracking Log parameters, code versions, metrics, artifacts Projects Package Data Science code including dependencies Git and containerization already does this, if you lock your dependencies, which you should Models Package ML models and deploy them Containers and/or model artifacts Registry Model storage and lifecycle (versioning, stage transitions) Sounds interesting, but only very rudimentary
  • 11. Deployment options ml-ops.org Allow deploying arbitrary machine learning artifacts Full control of the API
  • 12. Not so fast… … managed vector stores are also a thing They give you e.g. pre-filtering a full-blown query syntax Quite a few options exist nowadays Elasticsearch Google Vertex AI RediSearch Milvus … → Need a generic way to deploy machine learning artifacts to multiple targets
  • 13. Watcher Simple microservice that periodically polls the MLFlow registry Pushes the updated artifacts to all ML deployment options Uploads embeddings to managed databases Updates ConfigMap definitions with the correct Run ID per model and environment
  • 14. Deployment targets Elasticsearch Google Vertex AI Kubernetes ConfigMap POST /${ES_INDEX}/_doc/ HTTP/1.1 Host: ${ES_URL} Content-Type: application/json { "id": "foo", "vector": [1, 2, 3], "popularity": 123, "type": "bar" } GET /v1/${INDEX_URL}:upsertDatapoint HTTP/1.1 Host: ${VERTEX_ENDPOINT} Content-Type: application/json Authorization: Bearer `gcloud auth print-access-token` { "datapoints": [ { "datapoint_id": "foo", "feature_vector": [1, 2, 3], "restricts": { "namespace": "type", "allow_list": ["bar"] } } ] } apiVersion: v1 kind: ConfigMap metadata: name: foo-configmap data: RUN_ID: "1234abcde"
  • 15. Configmap reloader https://github.com/stakater/Reloader Ensures that all deployments that rely on a ConfigMap get restarted whenever that ConfigMap changes apiVersion: apps/v1 kind: Deployment metadata: annotations: configmap.reloader.stakater.com/reload: "foo-configmap" spec: template: spec: containers: - name: foo image: foo:0.0.1 env: - name: MLFLOW_RUN_ID valueFrom: configMapKeyRef: name: foo-configmap key: RUN_ID apiVersion: v1 kind: ConfigMap metadata: name: foo-configmap data: RUN_ID: "1234abcde"
  • 16. Alternatives Possible alternatives that we considered: Model deployment directly through MLFlow Seldon Core AWS SageMaker (Got more? Let me know!) However, they all have the same drawbacks: No control over final images Image size not optimised Custom logging, metrics, tracing, … difficult No control over API Only deploy to REST-APIs, but we also want other targets
  • 17. Summary Embeddings and models are centrally produced → Need some central model storage Each of the targets supports training and or deploying ML models individually, none of them support doing so for all targets → If your needs are diverse enough, you may need to roll your own (ML deployment) → But use existing tools where applicable