SlideShare a Scribd company logo
Predicting customer churn
Using Azure Databricks, Sklearn and Mlflow
Introduction. Machine learning
Data Preprocess Training Model Predictions
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Train/te
st
split
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
Train
model
SQL
Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
Train
model
Evaluate
model
SQL
Introduction. Databricks
Web-based platform for spark «
Automated cluster management «
IPython-style notebooks «
Collaborative environment «
Mlflow – model management repository «
SQL
Introduction. Customer churn
Source: https://www.kaggle.com/blastchar/telco-customer-churn
Features: customer services data, account and demographic information
Target: Churner – customer that has left the company’s service
Goal: Predict probability that customer will churn
Business outcome: Actions that improve customer retention
Data preparation. Loading the data.
Data preparation. Train / test split
Data
Train
data
Test
data
SPLIT
Data preparation. Scale & one hot
One Hot Encoding Feature scaling
Encode
ID Color
123 Blue
235 Red
312 Red
455 Green
ID Blue Re
d
Green
123 1 0 0
235 0 1 0
312 0 1 0
455 0 0 1
Age Income
45 50000
20 35000
35 60000
65 45000
Scale
Age Income
0.69 0.83
0 0.58
0.54 1
1 0.75
Data preparation. Sampling
» Data sampling is needed when data classes are highly unbalanced.
» Upsampling – artificially create instances of smaller class.
» Downsampling – remove some instances from bigger class.
» We found upsampling (ADASYN or SMOTE) to work best.
Pipeline
The Machine learning pipeline consists of 3 steps:
» Preprocessor
» Sampler
» Classifier
Hyperparameter optimization
Finding the optimal set of hyperparameters to maximize performance
Training and evaluating the model
Train model on training data
Test the model to make
predictions on test data.
Calculate evaluation metrics
Classification evaluation metrics
𝑓1 = 2 ×
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
Actual
Positive Negative
Predicted
Positive True Positive False Positive
Negative False Negative True Negative
Action Preferred False Positive cost False Negative cost
Phone call High precision High Low
Sending an email High recall Low High
Mlflow
» Track training parameters
» Track performance metrics
» Store models
» Store graphs or other data
» Easily package and deploy
Mlflow GUI
Mlflow GUI
Mlflow GUI
Scoring. Getting the best model
» Initialize Mlflow client
» Fetch experiment run ids
» Fetch/store the run info
» Get the best run id
» Load (the best) model
Scoring. Predict and serve
» Predict probabilities
» Save the results
Can we explain these predictions?
SHAP (SHapley Additive exPlanations)
“A unified approach to explain the output of any machine learning model”
SHAP Plots. Summary plot
SHAP Plots. Force plot (individual)
Customer A
Customer B
THE END
Questions?

More Related Content

PPTX
Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...
PDF
Bootstrapping of PySpark Models for Factorial A/B Tests
PPTX
Projects Based on MATLAB Research Thesis Topics
PPTX
Azure machine learning service
PPTX
Manoj Shanmugasundaram - Agile Machine Learning Development
PPTX
Database Performance Analysis with Time Series
PDF
Azure Machine Learning tutorial
PPTX
An introduction to Machine Learning with scikit-learn (October 2018)
Marios Michailidis & Mathias Muller, H2O.ai - Time Series with H2O Driverless...
Bootstrapping of PySpark Models for Factorial A/B Tests
Projects Based on MATLAB Research Thesis Topics
Azure machine learning service
Manoj Shanmugasundaram - Agile Machine Learning Development
Database Performance Analysis with Time Series
Azure Machine Learning tutorial
An introduction to Machine Learning with scikit-learn (October 2018)

What's hot (19)

PDF
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
PDF
NLP Text Recommendation System Journey to Automated Training
PDF
Scalable Automatic Machine Learning in H2O
PPTX
Production ready big ml workflows from zero to hero daniel marcous @ waze
PDF
[2C2]PredictionIO
PDF
Machine learning for java developers
PPTX
Introduction to PredictionIO
PDF
SigOpt for Machine Learning and AI
PDF
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
PDF
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
PDF
Apache mahout - introduction
PDF
Ds for finance day 3
PDF
Machine Learning Software Design Pattern with PredictionIO
PDF
Anomaly Detection at Scale!
PPTX
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
PDF
AI Modernization at AT&T and the Application to Fraud with Databricks
PDF
An introduction to predictionIO
PDF
Productionizing Machine Learning in Our Health and Wellness Marketplace
PDF
Building Data Products with Python (Georgetown)
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
NLP Text Recommendation System Journey to Automated Training
Scalable Automatic Machine Learning in H2O
Production ready big ml workflows from zero to hero daniel marcous @ waze
[2C2]PredictionIO
Machine learning for java developers
Introduction to PredictionIO
SigOpt for Machine Learning and AI
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Apache mahout - introduction
Ds for finance day 3
Machine Learning Software Design Pattern with PredictionIO
Anomaly Detection at Scale!
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
AI Modernization at AT&T and the Application to Fraud with Databricks
An introduction to predictionIO
Productionizing Machine Learning in Our Health and Wellness Marketplace
Building Data Products with Python (Georgetown)
Ad

Similar to Presentation (20)

PDF
Customer choice probabilities
PDF
Pragmatic Machine Learning @ ML Spain
PDF
Intro to machine learning for web folks @ BlendWebMix
PDF
Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...
PPTX
E-Commerce Order PredictionShraddha Kamble.pptx
PDF
From data to AI with the Machine Learning Canvas by Louis Dorard Slides
PDF
Machine learning systems for engineers
PPTX
Machine_Learning_Overview_Presentation_1.pptx
PDF
Machine_Learning_Overview_Presentation_1.pdf
PDF
Machine_Learning_Overview_Presentation_1.pdf
PDF
From Data to AI with the Machine Learning Canvas
PDF
Using Data Science to Build an End-to-End Recommendation System
PDF
Guiding through a typical Machine Learning Pipeline
PDF
How ml can improve purchase conversions
PDF
CUSTOMER CHURN PREDICTION
PDF
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
PPTX
Machine learning at scale - Webinar By zekeLabs
PDF
Choosing a Machine Learning technique to solve your need
PDF
AI meets Big Data
PPTX
Bank Customer Churn Prediction- Saurav Singh.pptx
Customer choice probabilities
Pragmatic Machine Learning @ ML Spain
Intro to machine learning for web folks @ BlendWebMix
Merchant Churn Prediction Using SparkML at PayPal with Chetan Nadgire and Ani...
E-Commerce Order PredictionShraddha Kamble.pptx
From data to AI with the Machine Learning Canvas by Louis Dorard Slides
Machine learning systems for engineers
Machine_Learning_Overview_Presentation_1.pptx
Machine_Learning_Overview_Presentation_1.pdf
Machine_Learning_Overview_Presentation_1.pdf
From Data to AI with the Machine Learning Canvas
Using Data Science to Build an End-to-End Recommendation System
Guiding through a typical Machine Learning Pipeline
How ml can improve purchase conversions
CUSTOMER CHURN PREDICTION
Live predictions with schemaless data at scale. MLMU Kosice, Exponea
Machine learning at scale - Webinar By zekeLabs
Choosing a Machine Learning technique to solve your need
AI meets Big Data
Bank Customer Churn Prediction- Saurav Singh.pptx
Ad

Recently uploaded (20)

PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
Managing Community Partner Relationships
DOCX
Factor Analysis Word Document Presentation
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Introduction to Data Science and Data Analysis
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Leprosy and NLEP programme community medicine
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
CYBER SECURITY the Next Warefare Tactics
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Introduction to Inferential Statistics.pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
modul_python (1).pptx for professional and student
PDF
annual-report-2024-2025 original latest.
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Business Analytics and business intelligence.pdf
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Pilar Kemerdekaan dan Identi Bangsa.pptx
Managing Community Partner Relationships
Factor Analysis Word Document Presentation
retention in jsjsksksksnbsndjddjdnFPD.pptx
Introduction to Data Science and Data Analysis
DU, AIS, Big Data and Data Analytics.ppt
STERILIZATION AND DISINFECTION-1.ppthhhbx
Leprosy and NLEP programme community medicine
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
CYBER SECURITY the Next Warefare Tactics
ISS -ESG Data flows What is ESG and HowHow
Introduction to Inferential Statistics.pptx
[EN] Industrial Machine Downtime Prediction
modul_python (1).pptx for professional and student
annual-report-2024-2025 original latest.
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Business Analytics and business intelligence.pdf
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...

Presentation

  • 1. Predicting customer churn Using Azure Databricks, Sklearn and Mlflow
  • 2. Introduction. Machine learning Data Preprocess Training Model Predictions
  • 3. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions SQL
  • 4. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data SQL
  • 5. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Train/te st split SQL
  • 6. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Preprocess Train/te st split SQL
  • 7. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Preprocess Train/te st split Train model SQL
  • 8. Introduction. Machine learning. Process. Collect raw data Curate data Train & Score Take Insights Into Actions Load data Preprocess Train/te st split Train model Evaluate model SQL
  • 9. Introduction. Databricks Web-based platform for spark « Automated cluster management « IPython-style notebooks « Collaborative environment « Mlflow – model management repository « SQL
  • 10. Introduction. Customer churn Source: https://www.kaggle.com/blastchar/telco-customer-churn Features: customer services data, account and demographic information Target: Churner – customer that has left the company’s service Goal: Predict probability that customer will churn Business outcome: Actions that improve customer retention
  • 12. Data preparation. Train / test split Data Train data Test data SPLIT
  • 13. Data preparation. Scale & one hot One Hot Encoding Feature scaling Encode ID Color 123 Blue 235 Red 312 Red 455 Green ID Blue Re d Green 123 1 0 0 235 0 1 0 312 0 1 0 455 0 0 1 Age Income 45 50000 20 35000 35 60000 65 45000 Scale Age Income 0.69 0.83 0 0.58 0.54 1 1 0.75
  • 14. Data preparation. Sampling » Data sampling is needed when data classes are highly unbalanced. » Upsampling – artificially create instances of smaller class. » Downsampling – remove some instances from bigger class. » We found upsampling (ADASYN or SMOTE) to work best.
  • 15. Pipeline The Machine learning pipeline consists of 3 steps: » Preprocessor » Sampler » Classifier
  • 16. Hyperparameter optimization Finding the optimal set of hyperparameters to maximize performance
  • 17. Training and evaluating the model Train model on training data Test the model to make predictions on test data. Calculate evaluation metrics
  • 18. Classification evaluation metrics 𝑓1 = 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 Actual Positive Negative Predicted Positive True Positive False Positive Negative False Negative True Negative Action Preferred False Positive cost False Negative cost Phone call High precision High Low Sending an email High recall Low High
  • 19. Mlflow » Track training parameters » Track performance metrics » Store models » Store graphs or other data » Easily package and deploy
  • 23. Scoring. Getting the best model » Initialize Mlflow client » Fetch experiment run ids » Fetch/store the run info » Get the best run id » Load (the best) model
  • 24. Scoring. Predict and serve » Predict probabilities » Save the results Can we explain these predictions?
  • 25. SHAP (SHapley Additive exPlanations) “A unified approach to explain the output of any machine learning model”
  • 27. SHAP Plots. Force plot (individual) Customer A Customer B