SlideShare a Scribd company logo
Version Control for
Machine Learning Projects
@mroutis
About me
- Name: Ramón Valles
- Handle: mroutis
- Work: Software developer
- Company: iterative.ai
@mroutis
Iterative AI
Time
&
Effort
@mroutis
(Data Version Control)
Why versioning is important?
@mroutis
Datasets evolve over time
@mroutis
Different models
@mroutis
https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html
“Yea, we have version control”
@mroutis
https://www.reddit.com/r/ProgrammerHumor/comments/63rcvc/yea_we_have_version_control/
How do we identify each version?
@mroutis
https://www.reddit.com/r/ProgrammerHumor/comments/99a9k8/version_control
@mroutis
Manual work is error prone.
@mroutis
Is versioning data / models enough?
@mroutis
Experimentación ágil de machine learning con DVC
Reproducibility Crisis
“I was recently chatting to a friend whose startup’s
machine learning models were so disorganized...
training the same model get different results!”
- Peter Warden (2018)
https://petewarden.com/2018/03/19/the-machine-learning-reproducibility-crisis/
@mroutis
@mroutis
What have been done
so far to address this?
@mroutis
@mroutis
Tracking experiments
Manual work is error prone.
@mroutis
@mroutis
Pipelines
- Chain processes together
- Reproduce in the correct order
- Make it easier to return to old work
- The output of one stage is the input of the next on
@mroutis
Pipelines
How does DVC address this problems?
@mroutis
@mroutis
@mroutis
Versioning with DVC
@mroutis
Pipelines with DVC
@mroutis
@mroutis
Experiment Tracking with DVC
@mroutis
Reproduce experiments
@mroutis
Collaborate
Want to learn more?
- Check out docs: https://dvc.org/doc
- Chat with us: https://dvc.org/chat
- Contribute on GitHub: https://github.com/iterative/dvc
- Follow us on Twitter: @DVCorg
@mroutis
Thanks for listening!
@mroutis
Thanks for Listening - Chris Thile
Thanks for the invitation!

More Related Content

PPT
Software Prototyping
PDF
[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...
PPTX
Managing and Versioning Machine Learning Models in Python
PDF
PyData Berlin 2018: dvc.org
PDF
SFScon 2020 - Hlib Babii - DVC version control your datasets and ML experiments
PDF
Data Versioning and Reproducible ML with DVC and MLflow
PDF
Rsqrd AI: ML Tooling at an AI-first Startup
PDF
DVC meetup
Software Prototyping
[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...
Managing and Versioning Machine Learning Models in Python
PyData Berlin 2018: dvc.org
SFScon 2020 - Hlib Babii - DVC version control your datasets and ML experiments
Data Versioning and Reproducible ML with DVC and MLflow
Rsqrd AI: ML Tooling at an AI-first Startup
DVC meetup

Similar to Experimentación ágil de machine learning con DVC (20)

PDF
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
PDF
DVC - Git-like Data Version Control for Machine Learning projects
PDF
Using dataset versioning in data science
PDF
Machine Learning to moderate ads in real world classified's business
PDF
PyData 2015 Keynote: "A Systems View of Machine Learning"
PDF
Reproducibility and experiments management in Machine Learning
PDF
Agile Deep Learning
PDF
DVC: O'Reilly Artificial Intelligence Conference 2019 - New York
PDF
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
PDF
Version Control in Machine Learning + AI (Stanford)
PDF
CI/CD for Machine Learning
PDF
On the code of data science
PPTX
DevOps for Machine Learning overview en-us
PDF
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
PDF
CD in Machine Learning Systems
PDF
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
PDF
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
PPTX
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
DVC - Git-like Data Version Control for Machine Learning projects
Using dataset versioning in data science
Machine Learning to moderate ads in real world classified's business
PyData 2015 Keynote: "A Systems View of Machine Learning"
Reproducibility and experiments management in Machine Learning
Agile Deep Learning
DVC: O'Reilly Artificial Intelligence Conference 2019 - New York
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
Version Control in Machine Learning + AI (Stanford)
CI/CD for Machine Learning
On the code of data science
DevOps for Machine Learning overview en-us
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
Using MLOps to Bring ML to Production/The Promise of MLOps
CD in Machine Learning Systems
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Ad

More from Software Guru (20)

PDF
Hola Mundo del Internet de las Cosas
PDF
Estructuras de datos avanzadas: Casos de uso reales
PPTX
Building bias-aware environments
PDF
El secreto para ser un desarrollador Senior
PDF
Cómo encontrar el trabajo remoto ideal
PDF
Automatizando ideas con Apache Airflow
PPTX
How thick data can improve big data analysis for business:
PDF
Introducción al machine learning
PDF
Democratizando el uso de CoDi
PDF
Gestionando la felicidad de los equipos con Management 3.0
PDF
Taller: Creación de Componentes Web re-usables con StencilJS
PPTX
El camino del full stack developer (o como hacemos en SERTI para que no solo ...
PDF
¿Qué significa ser un programador en Bitso?
PDF
Colaboración efectiva entre desarrolladores del cliente y tu equipo.
PDF
Pruebas de integración con Docker en Azure DevOps
PDF
Elixir + Elm: Usando lenguajes funcionales en servicios productivos
PDF
Así publicamos las apps de Spotify sin stress
PPTX
Achieving Your Goals: 5 Tips to successfully achieve your goals
PDF
Acciones de comunidades tech en tiempos del Covid19
PDF
De lo operativo a lo estratégico: un modelo de management de diseño
Hola Mundo del Internet de las Cosas
Estructuras de datos avanzadas: Casos de uso reales
Building bias-aware environments
El secreto para ser un desarrollador Senior
Cómo encontrar el trabajo remoto ideal
Automatizando ideas con Apache Airflow
How thick data can improve big data analysis for business:
Introducción al machine learning
Democratizando el uso de CoDi
Gestionando la felicidad de los equipos con Management 3.0
Taller: Creación de Componentes Web re-usables con StencilJS
El camino del full stack developer (o como hacemos en SERTI para que no solo ...
¿Qué significa ser un programador en Bitso?
Colaboración efectiva entre desarrolladores del cliente y tu equipo.
Pruebas de integración con Docker en Azure DevOps
Elixir + Elm: Usando lenguajes funcionales en servicios productivos
Así publicamos las apps de Spotify sin stress
Achieving Your Goals: 5 Tips to successfully achieve your goals
Acciones de comunidades tech en tiempos del Covid19
De lo operativo a lo estratégico: un modelo de management de diseño
Ad

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Machine learning based COVID-19 study performance prediction
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Encapsulation theory and applications.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
sap open course for s4hana steps from ECC to s4
Machine learning based COVID-19 study performance prediction
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MIND Revenue Release Quarter 2 2025 Press Release
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Encapsulation theory and applications.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Programs and apps: productivity, graphics, security and other tools
Digital-Transformation-Roadmap-for-Companies.pptx

Experimentación ágil de machine learning con DVC