SlideShare a Scribd company logo
: Accelerating Production
Machine Learning
Matei Zaharia
@matei_zaharia
Machine Learning
Development is Complex
ML Lifecycle
3
Delta
Data Prep
Training
Deploy
Raw Data
μ
λ θ Tuning
Scale
μ
λ θ Tuning
Scale
Scale
Scale
Model
Exchange
Governance
Custom ML Platforms
Facebook FBLearner, Uber Michelangelo, Google TFX
Standardize the data prep / training / deploy cycle:
if you work within the platform, you get these!
Limited to a few algorithms or frameworks
Tied to one company’s infrastructure
Can we provide similar benefits in an open manner?
www.mlflow.org
Introducing
Open source machine learning platform
• Works with any ML library & language
• Runs the same way everywhere & cross-cloud
• Scales to big data with Apache Spark
Launched this June
• Already 48 contributors and many new features!
This Talk: MLflow Overview + New Announcements
How Does MLflow Work?
8
Tracking
Record and query
experiments: code,
data, config, results
Projects
Package workflows
into reproducible
and reusable steps
Models
Model packaging
format for diverse
deployment tools
Model Development without MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
print(“For n=%d, lr=%f: accuracy=%f”
% (n, lr, score))
pickle.dump(model, open(“model.pkl”))
For n=2, lr=0.1: accuracy=0.71
For n=2, lr=0.2: accuracy=0.79
For n=2, lr=0.5: accuracy=0.83
For n=2, lr=0.9: accuracy=0.79
For n=3, lr=0.1: accuracy=0.83
For n=3, lr=0.2: accuracy=0.82
For n=4, lr=0.5: accuracy=0.75
...
What if I expand
the input data?
What if I tune this
other parameter?
What if I upgrade
my ML library?
What version of
my code was this
result from?
Model Deployment without MLflow
Code & Models
DATA
SCIENTIST
PRODUCTION
ENGINEER
Please deploy this
SciKit model!
Please deploy this
Spark model!
Please deploy this
R model!
Please deploy this
TensorFlow model!
Please deploy this
ArXiv paper!
…
Model Development with MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
print(“For n=%d, lr=%f: accuracy=%f”
% (n, lr, score))
pickle.dump(model, open(“model.pkl”))
$ mlflow ui
Model Development with MLflow
data = load_text(file)
ngrams = extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learning_rate”, lr)
mlflow.log_metric(“score”, score)
mlflow.sklearn.log_model(model)
Track parameters, metrics,
output files & code version
Search using UI or API
MLflow UI: Inspecting Runs
MLflow UI: Comparing Runs
Project Spec
Code ConfigDeps
Local Execution
Remote Cluster
Packaging Code: MLflow Projects
$ mlflow run git://...
Model Format
ONNX Flavor
Python Flavor
Training Apps
Batch & Stream Scoring
REST Serving Tools
Packaging Models: MLflow Models
Packaging Format
. . .
Inference Code
MLlib
Model Deployment with MLflow
DATA
SCIENTIST
PRODUCTION
ENGINEER
Please deploy this
MLflow Model!
OK, it’s up in our REST
server & Spark!
Please run this
MLflow Project
nightly for updates!
Don’t even tell me
what ArXiv paper
that’s from...
MLflow Development Status
Many new features since our release in June
• Model packaging for MLlib, H2O, TensorFlow, PyTorch, Keras
• Storage on Azure, AWS, Google, SFTP
• Java and Scala API
• New examples and UI features
Just released MLflow 0.7.0 today!
Major Announcement in MLflow 0.7.0
+
RStudio partnered with Databricks to add an MLflow R API
See Kevin Kuo’s talk on this at 14:40 today!
Conclusion
Workflow platforms can greatly simplify ML development
• Improve usability for both data scientists and engineers
Get started at mlflow.org, and come see our other talks:
• MLflow R API: 14:40 today
• ML Factories: 11:40 Thursday
• 1h Deep Dive: 14:00 Thursday

More Related Content

PDF
MLflow: A Platform for Production Machine Learning
PDF
Managing the Complete Machine Learning Lifecycle with MLflow
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
PDF
What's Next for MLflow in 2019
PDF
Seamless End-to-End Production Machine Learning with Seldon and MLflow
PDF
"Managing the Complete Machine Learning Lifecycle with MLflow"
PDF
mlflow: Accelerating the End-to-End ML lifecycle
PDF
Use MLflow to manage and deploy Machine Learning model on Spark
MLflow: A Platform for Production Machine Learning
Managing the Complete Machine Learning Lifecycle with MLflow
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
What's Next for MLflow in 2019
Seamless End-to-End Production Machine Learning with Seldon and MLflow
"Managing the Complete Machine Learning Lifecycle with MLflow"
mlflow: Accelerating the End-to-End ML lifecycle
Use MLflow to manage and deploy Machine Learning model on Spark

What's hot (20)

PDF
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
PDF
Simplifying Model Management with MLflow
PDF
Reproducible AI using MLflow and PyTorch
PDF
MLflow with R
PDF
The Quest for an Open Source Data Science Platform
PPTX
Pythonsevilla2019 - Introduction to MLFlow
PDF
MLFlow: Platform for Complete Machine Learning Lifecycle
PDF
Productionalizing Models through CI/CD Design with MLflow
PDF
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
PDF
Whats new in_mlflow
PPTX
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
PDF
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
PDF
Scaling up Machine Learning Development
PDF
Apply MLOps at Scale by H&M
PDF
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
PDF
Importance of ML Reproducibility & Applications with MLfLow
PDF
Streaming Inference with Apache Beam and TFX
PDF
NLP Text Recommendation System Journey to Automated Training
PDF
Zipline - A Declarative Feature Engineering Framework
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
Simplifying Model Management with MLflow
Reproducible AI using MLflow and PyTorch
MLflow with R
The Quest for an Open Source Data Science Platform
Pythonsevilla2019 - Introduction to MLFlow
MLFlow: Platform for Complete Machine Learning Lifecycle
Productionalizing Models through CI/CD Design with MLflow
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Whats new in_mlflow
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Scaling up Machine Learning Development
Apply MLOps at Scale by H&M
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
MLOps and Data Quality: Deploying Reliable ML Models in Production
Importance of ML Reproducibility & Applications with MLfLow
Streaming Inference with Apache Beam and TFX
NLP Text Recommendation System Journey to Automated Training
Zipline - A Declarative Feature Engineering Framework
Ad

Similar to Accelerating Production Machine Learning with MLflow with Matei Zaharia (20)

PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
PPTX
Accelerating the machine learning lifecycle with m lflow
PDF
Introduction to MLflow
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PDF
MLFlow 1.0 Meetup
PDF
MLOps Using MLflow
PPTX
From Notebook to Production: What Most ML Tutorials Don’t Teach
PDF
Managing the Machine Learning Lifecycle with MLflow
PPTX
databricks ml flow demonstration using automatic features engineering
PDF
Mlflow with databricks
PDF
MLflow with Databricks
PDF
MLflow-presentation______________________________
PDF
Reproducible AI Using PyTorch and MLflow
PDF
Utilisation de MLflow pour le cycle de vie des projet Machine learning
PDF
Managing the Machine Learning Lifecycle with MLOps
PDF
Productionizing Real-time Serving With MLflow
PDF
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
PDF
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
PDF
What are the Unique Challenges and Opportunities in Systems for ML?
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Accelerating the machine learning lifecycle with m lflow
Introduction to MLflow
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
MLFlow 1.0 Meetup
MLOps Using MLflow
From Notebook to Production: What Most ML Tutorials Don’t Teach
Managing the Machine Learning Lifecycle with MLflow
databricks ml flow demonstration using automatic features engineering
Mlflow with databricks
MLflow with Databricks
MLflow-presentation______________________________
Reproducible AI Using PyTorch and MLflow
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Managing the Machine Learning Lifecycle with MLOps
Productionizing Real-time Serving With MLflow
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS...
What are the Unique Challenges and Opportunities in Systems for ML?
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Computer network topology notes for revision
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Ppt On Nestle.pptx huunnnhhgfvu
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Knowledge Engineering Part 1
climate analysis of Dhaka ,Banglades.pptx
1_Introduction to advance data techniques.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction-to-Cloud-ComputingFinal.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Supervised vs unsupervised machine learning algorithms
Reliability_Chapter_ presentation 1221.5784
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Computer network topology notes for revision
Database Infoormation System (DBIS).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

Accelerating Production Machine Learning with MLflow with Matei Zaharia

  • 1. : Accelerating Production Machine Learning Matei Zaharia @matei_zaharia
  • 3. ML Lifecycle 3 Delta Data Prep Training Deploy Raw Data μ λ θ Tuning Scale μ λ θ Tuning Scale Scale Scale Model Exchange Governance
  • 4. Custom ML Platforms Facebook FBLearner, Uber Michelangelo, Google TFX Standardize the data prep / training / deploy cycle: if you work within the platform, you get these! Limited to a few algorithms or frameworks Tied to one company’s infrastructure Can we provide similar benefits in an open manner?
  • 5. www.mlflow.org Introducing Open source machine learning platform • Works with any ML library & language • Runs the same way everywhere & cross-cloud • Scales to big data with Apache Spark Launched this June • Already 48 contributors and many new features! This Talk: MLflow Overview + New Announcements
  • 6. How Does MLflow Work? 8 Tracking Record and query experiments: code, data, config, results Projects Package workflows into reproducible and reusable steps Models Model packaging format for diverse deployment tools
  • 7. Model Development without MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”)) For n=2, lr=0.1: accuracy=0.71 For n=2, lr=0.2: accuracy=0.79 For n=2, lr=0.5: accuracy=0.83 For n=2, lr=0.9: accuracy=0.79 For n=3, lr=0.1: accuracy=0.83 For n=3, lr=0.2: accuracy=0.82 For n=4, lr=0.5: accuracy=0.75 ... What if I expand the input data? What if I tune this other parameter? What if I upgrade my ML library? What version of my code was this result from?
  • 8. Model Deployment without MLflow Code & Models DATA SCIENTIST PRODUCTION ENGINEER Please deploy this SciKit model! Please deploy this Spark model! Please deploy this R model! Please deploy this TensorFlow model! Please deploy this ArXiv paper! …
  • 9. Model Development with MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) print(“For n=%d, lr=%f: accuracy=%f” % (n, lr, score)) pickle.dump(model, open(“model.pkl”))
  • 10. $ mlflow ui Model Development with MLflow data = load_text(file) ngrams = extract_ngrams(data, N=n) model = train_model(ngrams, learning_rate=lr) score = compute_accuracy(model) mlflow.log_param(“data_file”, file) mlflow.log_param(“n”, n) mlflow.log_param(“learning_rate”, lr) mlflow.log_metric(“score”, score) mlflow.sklearn.log_model(model) Track parameters, metrics, output files & code version Search using UI or API
  • 13. Project Spec Code ConfigDeps Local Execution Remote Cluster Packaging Code: MLflow Projects $ mlflow run git://...
  • 14. Model Format ONNX Flavor Python Flavor Training Apps Batch & Stream Scoring REST Serving Tools Packaging Models: MLflow Models Packaging Format . . . Inference Code MLlib
  • 15. Model Deployment with MLflow DATA SCIENTIST PRODUCTION ENGINEER Please deploy this MLflow Model! OK, it’s up in our REST server & Spark! Please run this MLflow Project nightly for updates! Don’t even tell me what ArXiv paper that’s from...
  • 16. MLflow Development Status Many new features since our release in June • Model packaging for MLlib, H2O, TensorFlow, PyTorch, Keras • Storage on Azure, AWS, Google, SFTP • Java and Scala API • New examples and UI features Just released MLflow 0.7.0 today!
  • 17. Major Announcement in MLflow 0.7.0 + RStudio partnered with Databricks to add an MLflow R API See Kevin Kuo’s talk on this at 14:40 today!
  • 18. Conclusion Workflow platforms can greatly simplify ML development • Improve usability for both data scientists and engineers Get started at mlflow.org, and come see our other talks: • MLflow R API: 14:40 today • ML Factories: 11:40 Thursday • 1h Deep Dive: 14:00 Thursday