Presentation

Predicting customer churn
Using Azure Databricks, Sklearn and Mlflow

Introduction. Machine learning
Data Preprocess Training Model Predictions

Introduction. Machine learning. Process.
Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
SQL

Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
SQL

Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Train/te
st
split
SQL

Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
SQL

Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
Train
model
SQL

Collect raw
data
Curate data
Train &
Score
Take Insights
Into Actions
Load
data
Preprocess
Train/te
st
split
Train
model
Evaluate
model
SQL

Introduction. Databricks
Web-based platform for spark «
Automated cluster management «
IPython-style notebooks «
Collaborative environment «
Mlflow – model management repository «
SQL

Introduction. Customer churn
Source: https://www.kaggle.com/blastchar/telco-customer-churn
Features: customer services data, account and demographic information
Target: Churner – customer that has left the company’s service
Goal: Predict probability that customer will churn
Business outcome: Actions that improve customer retention

Data preparation. Loading the data.

Data preparation. Train / test split
Data
Train
data
Test
data
SPLIT

Data preparation. Scale & one hot
One Hot Encoding Feature scaling
Encode
ID Color
123 Blue
235 Red
312 Red
455 Green
ID Blue Re
d
Green
123 1 0 0
235 0 1 0
312 0 1 0
455 0 0 1
Age Income
45 50000
20 35000
35 60000
65 45000
Scale
Age Income
0.69 0.83
0 0.58
0.54 1
1 0.75

Data preparation. Sampling
» Data sampling is needed when data classes are highly unbalanced.
» Upsampling – artificially create instances of smaller class.
» Downsampling – remove some instances from bigger class.
» We found upsampling (ADASYN or SMOTE) to work best.

Pipeline
The Machine learning pipeline consists of 3 steps:
» Preprocessor
» Sampler
» Classifier

Hyperparameter optimization
Finding the optimal set of hyperparameters to maximize performance

Training and evaluating the model
Train model on training data
Test the model to make
predictions on test data.
Calculate evaluation metrics

Classification evaluation metrics
𝑓1 = 2 ×
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
Actual
Positive Negative
Predicted
Positive True Positive False Positive
Negative False Negative True Negative
Action Preferred False Positive cost False Negative cost
Phone call High precision High Low
Sending an email High recall Low High

Mlflow
» Track training parameters
» Track performance metrics
» Store models
» Store graphs or other data
» Easily package and deploy

Scoring. Getting the best model
» Initialize Mlflow client
» Fetch experiment run ids
» Fetch/store the run info
» Get the best run id
» Load (the best) model

Scoring. Predict and serve
» Predict probabilities
» Save the results
Can we explain these predictions?

SHAP (SHapley Additive exPlanations)
“A unified approach to explain the output of any machine learning model”

SHAP Plots. Force plot (individual)
Customer A
Customer B

Presentation

More Related Content

What's hot (19)

Similar to Presentation (20)

Recently uploaded (20)

Presentation