SlideShare a Scribd company logo
@TheNickWalsh +@TheNickWalsh +
Version Control for Machine
Learning and AI
@TheNickWalsh
Workshop
Metis SF
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Before we begin:
datmo.com/get-started
Datmo installation:*
@TheNickWalsh +@TheNickWalsh +
Nick Walsh
Developer Evangelist, Datmo
@TheNickWalsh
@TheNickWalsh +@TheNickWalsh +
Workshop Outline
1. Conventional version control
2. The curious case of QoD’s
3. How Datmo bridges the gap
4. Iris dataset model example
5. Reproduce + use the model
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
What is Version Control?
The management of changes to
documents, computer programs, large
web sites, and other collections of
information.
*AKA `Source Control`
“
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
https://www.ctl.io/developers/assets/images/blog/scmhistory.png
Version Control Timeline
mercurial
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
https://www.ctl.io/developers/assets/images/blog/scmhistory.png
Version Control Timeline
mercurial
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
You’ve probably heard of Git.
Git is a version control system for tracking
changes in computer files and
coordinating work on those files among
multiple people. It is primarily used
for source code management in software
development, but it can be used to keep
track of changes in any set of files.
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
So, GitHub, right?
(Yes, and no.)
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Git(Hub) Revolutionized
Software Development
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
GitHub = SCM + Hosting + Much More
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
For developers: For enterprises:
• Self-managed SMC servers
became a thing of the past
• Developers could leverage
industry best practices for their
own personal work
• Community of knowledge
built around a known standard
• Collaboration on Open Source
Software
• Advent of continuous
integration / deployment
• Removed need for external
code issue tracking tool
• Consolidation of code storage
and versioning tool

• Pull Requests, code review,
documentation through
ReadMe
@TheNickWalsh +@TheNickWalsh +
Workshop Outline
1. Conventional version control
2. The curious case of QoD’s
3. How Datmo bridges the gap
4. Iris dataset model example
5. Reproduce + use the model
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
QoD’s == Quantitative Oriented Developers
Artificial IntelligenceData Science Machine Learning
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
It’s time to talk about MLOps
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdf
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
MLOps: The Elephant in the Room
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdf
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
ML systems have a special capacity for incurring
technical debt, because they have all of the
maintenance problems of traditional code plus an
additional set of ML-specific issues. This debt may be
difficult to detect because it exists at the system level.
“
(Sculley et. al, 2015)
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Typical methods for paying down code level
technical debt are not sufficient to address
ML-specific technical debt at the system level.
“
(Sculley et. al, 2015)
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
http://eng.uber.com/wp-content/uploads/2017/09/image8.png
Here’s where traditional SCM falls short
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
https://eng.uber.com/michelangelo/
https://code.facebook.com/posts/1072626246134461/
introducing-fblearner-flow-facebook-s-ai-backbone/
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
As for everyone else?
@TheNickWalsh +@TheNickWalsh +
Workshop Outline
1. Conventional version control
2. The curious case of QoD’s
3. How Datmo bridges the gap
4. Iris dataset model example
5. Reproduce + use the model
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
What is Datmo?
Datmo is a CLI workflow tool for ML,
AI, and Data Science developers. It
helps with managing model version
control, easy environment handling,
and reproducing results through the
power of snapshots.
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
What are Datmo Snapshots?
Code
Environment
Configuration
Files*
Metrics
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Why are they important?
Environment
Configuration
Metrics
Datmo Snapshots
Git Commits
Code
Files*
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
How will it help?
Datmo leverages containers to quickly
spin up perfectly reproducible
developer environments. It tracks this
environment, along with model
metadata inside of snapshots.
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
From a broad perspective:
Make ML Ops and workflows
manageable and simple, not
completely abstracted away.
Reduce the amount of glue code
so that people can have more
robust pipelines.
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
GitHub = SCM + Hosting + More
Datmo = Model Versioning +
Environments + Deployment + More
@TheNickWalsh +@TheNickWalsh +
Workshop Outline
1. Conventional version control
2. The curious case of QoD’s
3. How Datmo bridges the gap
4. Iris dataset model example
5. Reproduce + use the model
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Datmo in today’s example
We’re going to use Datmo to show how we can
quickly iterate on our model and streamline our
workflow.
We’ll go through using snapshots for A/B testing,
saving our tasks, and enabling you all to reproduce
my results/make your own changes to the model.
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Problem:
Multiple Classification of Flower Species
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Dataset: Fisher’s Iris Flower
http://archive.ics.uci.edu/ml/datasets/Iris
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
At a glance:
- 4 Features
- 3 Classes
- 150 Rows (50 per class)
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Model Experimentation
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Live Demo
@TheNickWalsh +@TheNickWalsh +
Workshop Outline
1. Conventional version control
2. The curious case of QoD’s
3. How Datmo bridges the gap
4. MNIST model example
5. Reproduce + use the model
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Reproducing the Model
$ datmo setup
One time initial setup:
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Fork the model
Fork from Web Platform GUI (top right corner):
https://datmo.com/nmwalsh/workshop-iris-classification
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Fetch the model from Datmo
$ datmo clone nmwalsh/workshop-iris-classification
Clone the Datmo Model:
$ cd workshop-iris-classification
Jump into this directory:
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
View all model Snapshots
$ datmo snapshot ls
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Checkout to a particular snapshot
$ datmo snapshot checkout --id ______
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Track Snapshots
https://datmo.com/nmwalsh/workshop-iris-classification
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Run the Task
$ datmo task run “python3 classifier.py”
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Run the Task
$ datmo task run “python3 classifier.py”
We want our Python file to be run
inside of the container. Why?
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
What just happened?
• Datmo cloned the model from the platform,
bringing all of the necessary resources to local.
• Datmo set your current code to the state of the
desired snapshot.
• Datmo built the environment inside of a
container.
• Datmo executed the task inside of the container,
and logged the results.
datmo clone
datmo snapshot
checkout
Command Result
datmo task run
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Make Your Own Snapshot!
1. Change a hyperparameter
2. Save the file
3. $ datmo task run “python3 classifier.py”
4. $ datmo snapshot task --id “<id>” -m “<your message>”
Tip: To see your new snapshot, use $ datmo snapshot ls
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
1. Traditional Source Control isn’t enough for QoD
Key Takeaways
2. Think about ML Ops before you’re “in too deep”
3. In the same way GitHub revolutionized Software
Engineering, Datmo does the same for QoD’s
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Going Forward
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Nuts and Bolts of Source Control:
http://ericsink.com/scm/source_control.html
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
2015 NIPS Paper from Google
https://papers.nips.cc/paper/5656-hidden-
technical-debt-in-machine-learning-systems.pdf
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Code Available at:
https://datmo.com/nmwalsh/workshop-iris-classification
@TheNickWalsh +@TheNickWalsh +@TheNickWalsh
Thank You!

More Related Content

PPTX
How to Empower a Platform With a Data Pipeline At a Scale
PPTX
From Data Science to MLOps
PPTX
Fifth elephant 2017 Data Pipeline workshop
PPTX
MLOps and Data Quality: Deploying Reliable ML Models in Production
PDF
MLflow with R
PDF
CI/CD for Machine Learning
PPTX
databricks ml flow demonstration using automatic features engineering
PDF
Provenance in Production-Grade Machine Learning
How to Empower a Platform With a Data Pipeline At a Scale
From Data Science to MLOps
Fifth elephant 2017 Data Pipeline workshop
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLflow with R
CI/CD for Machine Learning
databricks ml flow demonstration using automatic features engineering
Provenance in Production-Grade Machine Learning

What's hot (20)

PDF
Version Control in Machine Learning + AI (Stanford)
PPTX
Magdalena Stenius: MLOPS Will Change Machine Learning
PDF
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
PDF
Productionzing ML Model Using MLflow Model Serving
PDF
The Quest for an Open Source Data Science Platform
PDF
MLOps at OLX
PDF
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
PDF
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
PDF
Seamless MLOps with Seldon and MLflow
PDF
How to Build a ML Platform Efficiently Using Open-Source
PDF
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
PDF
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
PDF
[AI] ML Operationalization with Microsoft Azure
PDF
Building Real-Time Search at MailChimp
PDF
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
PDF
Tech leaders guide to effective building of machine learning products
PDF
GraphQL Advanced
PDF
What is MLOps
PDF
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
PPTX
Richard Coffey (x18140785) - Research in Computing CA2
Version Control in Machine Learning + AI (Stanford)
Magdalena Stenius: MLOPS Will Change Machine Learning
Developing ML-enabled Data Pipelines on Databricks using IDE & CI/CD at Runta...
Productionzing ML Model Using MLflow Model Serving
The Quest for an Open Source Data Science Platform
MLOps at OLX
Apache Liminal (Incubating)—Orchestrate the Machine Learning Pipeline
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Seamless MLOps with Seldon and MLflow
How to Build a ML Platform Efficiently Using Open-Source
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
[AI] ML Operationalization with Microsoft Azure
Building Real-Time Search at MailChimp
Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
Tech leaders guide to effective building of machine learning products
GraphQL Advanced
What is MLOps
Moving a Fraud-Fighting Random Forest from scikit-learn to Spark with MLlib, ...
Richard Coffey (x18140785) - Research in Computing CA2
Ad

Similar to Version Control in AI/Machine Learning by Datmo (20)

PDF
2018 data engineering for ml asset management for features and models
PDF
What’s New with Databricks Machine Learning
PDF
Knowledge Discovery
PDF
The Future of Computing is Distributed
PPTX
DevOps for Machine Learning overview en-us
PDF
DVC meetup
PPTX
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
PDF
EMMM: A Unified Meta-Model for Tracking Machine Learning Experiments
PDF
Inria - Software assets - Energy
PDF
Data-X-Sparse-v2
PPTX
Managing and Versioning Machine Learning Models in Python
PDF
Data-X-v3.1
PDF
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
PPTX
Machine Learning with ML.NET and Azure - Andy Cross
PDF
Model versioning done right: A ModelDB 2.0 Walkthrough
PDF
Data Science meets Software Development
PDF
Experimentación ágil de machine learning con DVC
PDF
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
PDF
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
PDF
DVC - Git-like Data Version Control for Machine Learning projects
2018 data engineering for ml asset management for features and models
What’s New with Databricks Machine Learning
Knowledge Discovery
The Future of Computing is Distributed
DevOps for Machine Learning overview en-us
DVC meetup
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
EMMM: A Unified Meta-Model for Tracking Machine Learning Experiments
Inria - Software assets - Energy
Data-X-Sparse-v2
Managing and Versioning Machine Learning Models in Python
Data-X-v3.1
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
Machine Learning with ML.NET and Azure - Andy Cross
Model versioning done right: A ModelDB 2.0 Walkthrough
Data Science meets Software Development
Experimentación ágil de machine learning con DVC
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
DVC - Git-like Data Version Control for Machine Learning projects
Ad

Recently uploaded (20)

PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Mega Projects Data Mega Projects Data
PDF
Fluorescence-microscope_Botany_detailed content
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
1_Introduction to advance data techniques.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
ISS -ESG Data flows What is ESG and HowHow
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Mega Projects Data Mega Projects Data
Fluorescence-microscope_Botany_detailed content
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
climate analysis of Dhaka ,Banglades.pptx
Database Infoormation System (DBIS).pptx
1_Introduction to advance data techniques.pptx
Clinical guidelines as a resource for EBP(1).pdf
Miokarditis (Inflamasi pada Otot Jantung)
Supervised vs unsupervised machine learning algorithms
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Qualitative Qantitative and Mixed Methods.pptx
Introduction to machine learning and Linear Models
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to Knowledge Engineering Part 1
oil_refinery_comprehensive_20250804084928 (1).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx

Version Control in AI/Machine Learning by Datmo