SlideShare a Scribd company logo
2
Most read
3
Most read
Using the data collected from existing
customers, building of a model that will help
the marketing team to have high (or
increased) hit ratio
~Pranov Shobhan Mishra
Application of Ensemble Techniques: Predict likely
customers for a bank product
- A comparative study of linear models & tree
based models
Refer to github link for the code
https://github.com/Pranov1984/Application-of-Ensemble-Techniques-Predict-likely-customers-for-a
Executive Summary
Overview:
A bank is trying to increase the number of customers who subscribe to it’s term deposit product. The marketing team is interested in
utilizing it’s resources judiciously by targeting those customers who have high probability of subscribing. There is historical data of
the past campaigns along with the corresponding customer details existing with the bank.
Problem Statement:
Currently the effort to increase the number of subscribers to term deposit is manual. The hit ratio is abysmal and the resources
utilized is very high for very little gain. The marketing team requires help with identification of customers who have a higher chance
of subscribing when contacted.
Goal Statement:
Using the collected from existing customers, build a model that will help the marketing team identify potential customers who are
relatively more likely to subscribe term deposit and thus increase their hit ratio.
Approach:
•Exploration & Visualization
•Class imbalance noticed in
the dependent variable
EDA
•Train Decision Tree, GLM &
Ensembles on original data
•Tune the hyper parameters
•Use Minority Oversampling
and train models
Model
Approach
•Assign appropriate data
types
•One Hot encoding
•Minority Oversampling as
contrast
Data
Preparation
•Finalize appropriate
Evaluation metric
•AUC, TPR, TNR, F1score
Evaluation
Metric
Model Building
Decision Tree
Gradient Boosting
XGboost
Stacked Ensemble Models
 Extensive experimentation with more than 10 different models was done to identify the best model that could predict customer behaviour so that the bank can
take proactive steps to increase the number of subscriptions for term deposit.
 The data was slightly imbalanced (Majority class: Minority Class = 89:11) and hence appropriate model evaluation metric was required to be chosen. A
combination of Harmonic mean (F1 score), Sensitivity and Area Under the Curve (AUC) was used to finalize the best model.
 Models tried to arrive at the best are
 Simple Models like Logistic Regression different thresholds for classification
 Decision Tree followed by various ensemble models were tried on the original dataset and the results compared
 Since the data was imbalanced, minority oversampling was used to improve the ratio to 67:33.
Results:
 Default classification achieved through Logistic regression did not give satisfactory results as the recall (sensitivity/true positive rate) and F1-scores were
too poor to proceed with the model. However when the thresholds were tuned (to 25%) the performance improved to a great extent (recall=61% and F1-
score=57%). This was better than any of the tree based models tried.
 The tree based algorithms, on the original data, gave poor results in terms of recall (sensitivity/true positive rate) achieved. The results were poorer than
the linear model (Logistic Regression) attempted.
 The accuracy scores for the tree based models were better than logistic regression indicating the imbalance in target variable was impacting the
performance of tree based models. The tree based models needed more examples of the minority class to learn better and generalize well on unseen
data.
 The recall and F1-scores for Decision tree and bagging algorithms were the worst indicating individual trees and correlated trees can give
unstable/unreliable results. The ensemble models (with de-correlated trees) generalize better .
 When the class imbalance was treated by improving the ratio of Majority class: Minority Class to 67:33, all the tree based algorithms outperformed
logistic regression results by a big margin indicating the relationship between X and Y was not linear and when the tree based algorithms were provided
with adequate number of observations, the learning was robust.
 With the balanced data random forest and stacked ensemble models gave the best results. Check the comparison in the next slide and also in the jupyter
notebook.
Executive Summary
Executive Summary

More Related Content

PDF
Data Science Use cases in Banking
PDF
Churn Prediction in Practice
PPTX
Portuguese Bank - Direct Marketing Campaign
PPTX
Term deposit subscription
PDF
Predicting Bank Customer Churn Using Classification
PDF
Data mining (lecture 1 & 2) conecpts and techniques
PDF
Data strategy demistifying data
PDF
Churn in the Telecommunications Industry
Data Science Use cases in Banking
Churn Prediction in Practice
Portuguese Bank - Direct Marketing Campaign
Term deposit subscription
Predicting Bank Customer Churn Using Classification
Data mining (lecture 1 & 2) conecpts and techniques
Data strategy demistifying data
Churn in the Telecommunications Industry

What's hot (20)

PDF
Ways to Reduce the Customer Churn Rate
PPTX
Data science in finance industry
PDF
Credit Scoring
PDF
Customer Churn Prevention Powerpoint Presentation Slides
PPTX
Knowledge discovery process
PDF
Fraud detection with Machine Learning
PPTX
In memory computing
PDF
Churn prediction data modeling
PDF
Implementing Effective Data Governance
PPT
2.1 Data Mining-classification Basic concepts
PPT
3.2 partitioning methods
PPT
2.4 rule based classification
PPT
Chapter 1. Introduction.ppt
PDF
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
ODP
Web Content Mining
PDF
Real-World Data Governance: Data Governance Expectations
PPTX
Data Mining Technique - SEMMA
PDF
Data quality management Basic
PPTX
Data mining concepts and work
PPTX
Data clustring
Ways to Reduce the Customer Churn Rate
Data science in finance industry
Credit Scoring
Customer Churn Prevention Powerpoint Presentation Slides
Knowledge discovery process
Fraud detection with Machine Learning
In memory computing
Churn prediction data modeling
Implementing Effective Data Governance
2.1 Data Mining-classification Basic concepts
3.2 partitioning methods
2.4 rule based classification
Chapter 1. Introduction.ppt
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
Web Content Mining
Real-World Data Governance: Data Governance Expectations
Data Mining Technique - SEMMA
Data quality management Basic
Data mining concepts and work
Data clustring
Ad

Similar to Prediction of potential customers for term deposit (20)

PDF
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
PDF
Predicting reaction based on customer's transaction using machine learning a...
PDF
PosterPresentations.Henock_Makumbu.pdf
PPTX
Competition16
PDF
Oversampling technique in student performance classification from engineering...
PPTX
R204585L. RMABIKA. Customer Churn Prediction Presentation 2.pptx
PPTX
Employee Churn Prediction: Artificial Intelligence Project Presentation
DOCX
FINAL (1)
PDF
Predicting rainfall using ensemble of ensembles
PPTX
Cerdit card
PPTX
data mining applications in assigment for PHd
PPTX
User Payment Prediction in Free-to-Play
PPTX
Diabetes prediction using Machine Leanring and Data Preprocessing techniques
PDF
TRENDS IN FINANCIAL RISK MANAGEMENT SYSTEMS IN 2020
PDF
Empirical analysis of ensemble methods for the classification of robocalls in...
PDF
Machine learning project
PPTX
Decoding Loan Approval: Predictive Modeling in Action
PDF
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
PPTX
Exploration of the Hidden Influential Factors on Crime.pptx
PDF
Dmml report final
IRJET- Improving Prediction of Potential Clients for Bank Term Deposits using...
Predicting reaction based on customer's transaction using machine learning a...
PosterPresentations.Henock_Makumbu.pdf
Competition16
Oversampling technique in student performance classification from engineering...
R204585L. RMABIKA. Customer Churn Prediction Presentation 2.pptx
Employee Churn Prediction: Artificial Intelligence Project Presentation
FINAL (1)
Predicting rainfall using ensemble of ensembles
Cerdit card
data mining applications in assigment for PHd
User Payment Prediction in Free-to-Play
Diabetes prediction using Machine Leanring and Data Preprocessing techniques
TRENDS IN FINANCIAL RISK MANAGEMENT SYSTEMS IN 2020
Empirical analysis of ensemble methods for the classification of robocalls in...
Machine learning project
Decoding Loan Approval: Predictive Modeling in Action
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
Exploration of the Hidden Influential Factors on Crime.pptx
Dmml report final
Ad

More from Pranov Mishra (6)

PDF
Automation of IT Ticket Automation using NLP and Deep Learning
PPTX
Sales Performance Deep Dive and Forecast: A ML Driven Analytics Solution
PPTX
Reduction in customer complaints - Mortgage Industry
PPTX
Prediction of customer propensity to churn - Telecom Industry
PPTX
Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...
PPTX
Recommendations for Preventive Maintenance - A Machine Learning Project
Automation of IT Ticket Automation using NLP and Deep Learning
Sales Performance Deep Dive and Forecast: A ML Driven Analytics Solution
Reduction in customer complaints - Mortgage Industry
Prediction of customer propensity to churn - Telecom Industry
Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...
Recommendations for Preventive Maintenance - A Machine Learning Project

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Computer network topology notes for revision
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Global journeys: estimating international migration
PDF
Introduction to Business Data Analytics.
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Lecture1 pattern recognition............
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Computer network topology notes for revision
Galatica Smart Energy Infrastructure Startup Pitch Deck
Global journeys: estimating international migration
Introduction to Business Data Analytics.
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Lecture1 pattern recognition............
Introduction to Knowledge Engineering Part 1
Business Acumen Training GuidePresentation.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Database Infoormation System (DBIS).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
IB Computer Science - Internal Assessment.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Data_Analytics_and_PowerBI_Presentation.pptx

Prediction of potential customers for term deposit

  • 1. Using the data collected from existing customers, building of a model that will help the marketing team to have high (or increased) hit ratio ~Pranov Shobhan Mishra Application of Ensemble Techniques: Predict likely customers for a bank product - A comparative study of linear models & tree based models Refer to github link for the code https://github.com/Pranov1984/Application-of-Ensemble-Techniques-Predict-likely-customers-for-a
  • 2. Executive Summary Overview: A bank is trying to increase the number of customers who subscribe to it’s term deposit product. The marketing team is interested in utilizing it’s resources judiciously by targeting those customers who have high probability of subscribing. There is historical data of the past campaigns along with the corresponding customer details existing with the bank. Problem Statement: Currently the effort to increase the number of subscribers to term deposit is manual. The hit ratio is abysmal and the resources utilized is very high for very little gain. The marketing team requires help with identification of customers who have a higher chance of subscribing when contacted. Goal Statement: Using the collected from existing customers, build a model that will help the marketing team identify potential customers who are relatively more likely to subscribe term deposit and thus increase their hit ratio. Approach: •Exploration & Visualization •Class imbalance noticed in the dependent variable EDA •Train Decision Tree, GLM & Ensembles on original data •Tune the hyper parameters •Use Minority Oversampling and train models Model Approach •Assign appropriate data types •One Hot encoding •Minority Oversampling as contrast Data Preparation •Finalize appropriate Evaluation metric •AUC, TPR, TNR, F1score Evaluation Metric Model Building Decision Tree Gradient Boosting XGboost Stacked Ensemble Models
  • 3.  Extensive experimentation with more than 10 different models was done to identify the best model that could predict customer behaviour so that the bank can take proactive steps to increase the number of subscriptions for term deposit.  The data was slightly imbalanced (Majority class: Minority Class = 89:11) and hence appropriate model evaluation metric was required to be chosen. A combination of Harmonic mean (F1 score), Sensitivity and Area Under the Curve (AUC) was used to finalize the best model.  Models tried to arrive at the best are  Simple Models like Logistic Regression different thresholds for classification  Decision Tree followed by various ensemble models were tried on the original dataset and the results compared  Since the data was imbalanced, minority oversampling was used to improve the ratio to 67:33. Results:  Default classification achieved through Logistic regression did not give satisfactory results as the recall (sensitivity/true positive rate) and F1-scores were too poor to proceed with the model. However when the thresholds were tuned (to 25%) the performance improved to a great extent (recall=61% and F1- score=57%). This was better than any of the tree based models tried.  The tree based algorithms, on the original data, gave poor results in terms of recall (sensitivity/true positive rate) achieved. The results were poorer than the linear model (Logistic Regression) attempted.  The accuracy scores for the tree based models were better than logistic regression indicating the imbalance in target variable was impacting the performance of tree based models. The tree based models needed more examples of the minority class to learn better and generalize well on unseen data.  The recall and F1-scores for Decision tree and bagging algorithms were the worst indicating individual trees and correlated trees can give unstable/unreliable results. The ensemble models (with de-correlated trees) generalize better .  When the class imbalance was treated by improving the ratio of Majority class: Minority Class to 67:33, all the tree based algorithms outperformed logistic regression results by a big margin indicating the relationship between X and Y was not linear and when the tree based algorithms were provided with adequate number of observations, the learning was robust.  With the balanced data random forest and stacked ensemble models gave the best results. Check the comparison in the next slide and also in the jupyter notebook. Executive Summary