SlideShare a Scribd company logo
ML Production Pipelines: A Classification Model
© Data Insights, 2020 2
ML PRODUCTION PIPELINES:
A CLASSIFICATION MODEL
Python, MLflow, and Git for ML in Production
© Data Insights, 2020
Agenda
1. About Adri & Data Insights
2. How everything started
3. Our Approach
4. How databricks helped
5. Challenges and improvements
© Data Insights, 2020 4
About me
Adriana Menegozzo
Senior Data Scientist
Data Insights GmbH
Tumblingerstr. 12
D-80337 München
E-Mail: adriana@datainsights.de
© Data Insights, 2020 5
DATA INSIGHTS: WHAT WE DO
01 – DATA INSIGHTS
We cooperate with premium
partners, but also offer
technology-independent
consulting to individually
address a wide range of
requirements.
Tailored Solutions
Data Insights is an agile,
dynamic and fast-growing
start-up company offering IT
data consultancy based in
Munich.
Our experts advise and
support you comprehensively
in the implementation of Big
Data, Data Science & Cloud
projects.
Our range of services covers
the entire life cycle.
We support our customers -
from the initial idea to the
implementation and
operation.
From Idea to ProductionStrategy & TechnologyAgile
6© Data Insights, 2020
WHEN EVERYTHING
STARTED…..
01
© Data Insights, 2020 7
Classification model
§ Predictive Maintenance
§ Churn Prediction
§ Product Recommendation
WHEN EVERYTHING STARTED…..
© Data Insights, 2020 8
TO GO BAD ….
© Data Insights, 2020 9
Proof of Concept
• Single Jupyter notebook
• Fairly small Dataset
• Local environment
HOW EVERYTHING STARTED
© Data Insights, 2020 10
From PoC to Production what do we need?
• constant development
• Software development standards
• Shared environment
• Monitoring system
• Model Versioning
• Security standards
HOW EVERYTHING STARTED
11© Data Insights, 2020
OUR APPROACH..
02
© Data Insights, 2020 12
From notebook to library
HOW EVERYTHING STARTED
Configurations via YAML
ML code structure
Library structure
© Data Insights, 2020 13
Great achievement!
Structure code into Functions
Add requirements
Versioning
Testable
Runnable scripts
HOW DATABRICKS HELPED …
© Data Insights, 2020 14
Something is still missing…
• Shared environment
• Security layer for production data
• production-ready architecture
• CI/CD
HOW EVERYTHING STARTED
Wait a minute!
15© Data Insights, 2020
HOW DATABRICKS
HELPED …
03
© Data Insights, 2020 16
HOW DATABRICKS HELPED …
Our Architecture
© Data Insights, 2020 17
HOW DATABRICKS HELPED …
Connect Git to Databricks
© Data Insights, 2020 18
Our Pipelines
HOW DATABRICKS HELPED …
© Data Insights, 2020 19
Great achievement!
Shared environment
Security layer for production data
Constant quality checks on data
CI/CD
HOW DATABRICKS HELPED …
© Data Insights, 2020 20
Something is still missing…
• Tracking of ML experiments
• Model versioning
• Model storage
• Model serving
HOW DATABRICKS HELPED …
Wait another
minute!
21© Data Insights, 2020
MLFLOW AND
MODEL SERVING
03
© Data Insights, 2020 22
Model registry
MLFLOW AND MODEL SERVING
© Data Insights, 2020 23
What we achieved!
• Read the production model from S3
• Create features and force input
schema
• Have an up to date cluster
• Using simple pipelines
Next steps:
• Alerting system
• Orchestration
• Use the new one node cluster
MLFLOW AND MODEL SERVING
That’s cool!
Data Insights GmbH ‫׀‬ Tumblingerstr. 12 ‫׀‬ 80337 München ‫׀‬ Germany ‫׀‬ +49 (0) 89 24 21 74 44 ‫׀‬ info@datainsights.de
Thank You!
Adriana Menegozzo
24
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

PPTX
DBT ELT approach for Advanced Analytics.pptx
PDF
Gender Prediction with Databricks AutoML Pipeline
PDF
Edge Computing M&A Analysis
PPTX
PDF
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
PDF
Webinar Data Mesh - Part 3
PDF
Future of Data Engineering
PDF
Enabling a Data Mesh Architecture with Data Virtualization
DBT ELT approach for Advanced Analytics.pptx
Gender Prediction with Databricks AutoML Pipeline
Edge Computing M&A Analysis
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Webinar Data Mesh - Part 3
Future of Data Engineering
Enabling a Data Mesh Architecture with Data Virtualization

What's hot (20)

PDF
Vector Databases 101 - An introduction to the world of Vector Databases
PDF
Time to Talk about Data Mesh
PPTX
Agile Data Mining with Data Vault 2.0 (english)
PDF
Data Mesh Part 4 Monolith to Mesh
PDF
ClickHouse Defense Against the Dark Arts - Intro to Security and Privacy
PDF
Data Mesh 101
PDF
Data Platform Architecture Principles and Evaluation Criteria
PDF
PDF
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
PDF
Meraki vs. Viptela: Which Cisco SD-WAN Solution Is Right for You?
PDF
Introducing Databricks Delta
PPTX
Introduction to Data Engineering
PDF
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
PDF
3D: DBT using Databricks and Delta
PDF
Intro to Delta Lake
PDF
Databricks Delta Lake and Its Benefits
PPTX
Introduction to Data Engineering
PPTX
Federated Learning
PDF
Improving Data Literacy Around Data Architecture
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Vector Databases 101 - An introduction to the world of Vector Databases
Time to Talk about Data Mesh
Agile Data Mining with Data Vault 2.0 (english)
Data Mesh Part 4 Monolith to Mesh
ClickHouse Defense Against the Dark Arts - Intro to Security and Privacy
Data Mesh 101
Data Platform Architecture Principles and Evaluation Criteria
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
Meraki vs. Viptela: Which Cisco SD-WAN Solution Is Right for You?
Introducing Databricks Delta
Introduction to Data Engineering
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
3D: DBT using Databricks and Delta
Intro to Delta Lake
Databricks Delta Lake and Its Benefits
Introduction to Data Engineering
Federated Learning
Improving Data Literacy Around Data Architecture
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Ad

Similar to ML Production Pipelines: A Classification Model (20)

PDF
C2_W1---.pdf
PPTX
Why do the majority of Data Science projects never make it to production?
PDF
Productionalizing Models through CI/CD Design with MLflow
PPTX
A machine learning and data science pipeline for real companies
PPTX
DevOps for Machine Learning overview en-us
PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
PDF
DevOps Days Rockies MLOps
PPTX
Databricks for MLOps Presentation (AI/ML)
PDF
Week 3 data journey and data storage
PDF
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
PDF
Productionising Machine Learning Models
PDF
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
PPTX
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
PDF
State-of-data-ai-report pertaining to Global Data center
PDF
Machine learning at scale challenges and solutions
PDF
CD in Machine Learning Systems
PDF
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
PDF
Ml ops intro session
PPTX
Azure Databricks for Data Scientists
PDF
MLOps implemented - how we combine the cloud & open-source to boost data scie...
C2_W1---.pdf
Why do the majority of Data Science projects never make it to production?
Productionalizing Models through CI/CD Design with MLflow
A machine learning and data science pipeline for real companies
DevOps for Machine Learning overview en-us
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
DevOps Days Rockies MLOps
Databricks for MLOps Presentation (AI/ML)
Week 3 data journey and data storage
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Productionising Machine Learning Models
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
State-of-data-ai-report pertaining to Global Data center
Machine learning at scale challenges and solutions
CD in Machine Learning Systems
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Ml ops intro session
Azure Databricks for Data Scientists
MLOps implemented - how we combine the cloud & open-source to boost data scie...
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Reliability_Chapter_ presentation 1221.5784
Qualitative Qantitative and Mixed Methods.pptx
Business Acumen Training GuidePresentation.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Supervised vs unsupervised machine learning algorithms
Miokarditis (Inflamasi pada Otot Jantung)
Clinical guidelines as a resource for EBP(1).pdf
.pdf is not working space design for the following data for the following dat...
Introduction-to-Cloud-ComputingFinal.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Knowledge Engineering Part 1
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Business Ppt On Nestle.pptx huunnnhhgfvu

ML Production Pipelines: A Classification Model

  • 2. © Data Insights, 2020 2 ML PRODUCTION PIPELINES: A CLASSIFICATION MODEL Python, MLflow, and Git for ML in Production
  • 3. © Data Insights, 2020 Agenda 1. About Adri & Data Insights 2. How everything started 3. Our Approach 4. How databricks helped 5. Challenges and improvements
  • 4. © Data Insights, 2020 4 About me Adriana Menegozzo Senior Data Scientist Data Insights GmbH Tumblingerstr. 12 D-80337 München E-Mail: adriana@datainsights.de
  • 5. © Data Insights, 2020 5 DATA INSIGHTS: WHAT WE DO 01 – DATA INSIGHTS We cooperate with premium partners, but also offer technology-independent consulting to individually address a wide range of requirements. Tailored Solutions Data Insights is an agile, dynamic and fast-growing start-up company offering IT data consultancy based in Munich. Our experts advise and support you comprehensively in the implementation of Big Data, Data Science & Cloud projects. Our range of services covers the entire life cycle. We support our customers - from the initial idea to the implementation and operation. From Idea to ProductionStrategy & TechnologyAgile
  • 6. 6© Data Insights, 2020 WHEN EVERYTHING STARTED….. 01
  • 7. © Data Insights, 2020 7 Classification model § Predictive Maintenance § Churn Prediction § Product Recommendation WHEN EVERYTHING STARTED…..
  • 8. © Data Insights, 2020 8 TO GO BAD ….
  • 9. © Data Insights, 2020 9 Proof of Concept • Single Jupyter notebook • Fairly small Dataset • Local environment HOW EVERYTHING STARTED
  • 10. © Data Insights, 2020 10 From PoC to Production what do we need? • constant development • Software development standards • Shared environment • Monitoring system • Model Versioning • Security standards HOW EVERYTHING STARTED
  • 11. 11© Data Insights, 2020 OUR APPROACH.. 02
  • 12. © Data Insights, 2020 12 From notebook to library HOW EVERYTHING STARTED Configurations via YAML ML code structure Library structure
  • 13. © Data Insights, 2020 13 Great achievement! Structure code into Functions Add requirements Versioning Testable Runnable scripts HOW DATABRICKS HELPED …
  • 14. © Data Insights, 2020 14 Something is still missing… • Shared environment • Security layer for production data • production-ready architecture • CI/CD HOW EVERYTHING STARTED Wait a minute!
  • 15. 15© Data Insights, 2020 HOW DATABRICKS HELPED … 03
  • 16. © Data Insights, 2020 16 HOW DATABRICKS HELPED … Our Architecture
  • 17. © Data Insights, 2020 17 HOW DATABRICKS HELPED … Connect Git to Databricks
  • 18. © Data Insights, 2020 18 Our Pipelines HOW DATABRICKS HELPED …
  • 19. © Data Insights, 2020 19 Great achievement! Shared environment Security layer for production data Constant quality checks on data CI/CD HOW DATABRICKS HELPED …
  • 20. © Data Insights, 2020 20 Something is still missing… • Tracking of ML experiments • Model versioning • Model storage • Model serving HOW DATABRICKS HELPED … Wait another minute!
  • 21. 21© Data Insights, 2020 MLFLOW AND MODEL SERVING 03
  • 22. © Data Insights, 2020 22 Model registry MLFLOW AND MODEL SERVING
  • 23. © Data Insights, 2020 23 What we achieved! • Read the production model from S3 • Create features and force input schema • Have an up to date cluster • Using simple pipelines Next steps: • Alerting system • Orchestration • Use the new one node cluster MLFLOW AND MODEL SERVING That’s cool!
  • 24. Data Insights GmbH ‫׀‬ Tumblingerstr. 12 ‫׀‬ 80337 München ‫׀‬ Germany ‫׀‬ +49 (0) 89 24 21 74 44 ‫׀‬ info@datainsights.de Thank You! Adriana Menegozzo 24
  • 25. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.