SlideShare a Scribd company logo
GROUP – 7
Dawei Ye
Hrushikesh Basavanahalli
Jobil Joseph
Ryan Curtis
Shu-Feng Tsao
Yijing Liang
Predicting
Credit Card Defaults
OPIM 5640 – Predictive Modeling – Final Assignment
Data Source from
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA
 The Business Problem
 Data - Data source used
 Modeling Methodology Adapted
 Detailing Major steps in the methodology
 Models
 WHY and HOW we choose this model
 Conclusion
Agenda
2
The Business Problem
Can she JUMP?
3
We took a dataset with 30,000 records of c
redit card borrowers with details
about their demography and behavior and
payment patterns
Dataset Source…Kaggle…..
The goal is to build a Predictive
Model that would
predict if a credit card user would
default on the Upcoming payment
with acceptable accuracy
4
A Doctor
a data doctor… with a
diagnostic approach
5
- Data
- Data
- Data
- Data
- Data
Visualize Data
For Initial
Diagnostics
Build Baseline
Model
Logistic Regression Model
Revised the Model
After Each Iteration
Checking improvement
in accuracy
Data
Cleansing
Data
Pre-processing
Exploratory Data
Analytics
Feature / Engineering
Selection
Trying Different Models
6
Data Dictionary
Data contains a binary variable, default payment (Yes = 1, No = 0), as the response variable. Total
25 variables in the dataset including response variable.
ID: ID of each client
LIMIT_BAL: Amount of the given credit (NT dollar): it includes both the individual consumer credit a
nd his/her family (supplementary) credit.
SEX: Gender (1 = male; 2 = female).
EDUCATION: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
MARRIAGE: Marital status (1 = married; 2 = single; 3 = others).
AGE: Age (year).
PAY_0 – PAY_6: History of past payment. The measurement scale for
the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for t
wo months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and
above.
BILL_AMT1-BILL_AMT6: Amount of bill statement (NT dollar). BILL_AMT1 = amount of bill statemen
t in September, 2005; BILL_AMT2 = amount of bill statement in August, 2005; . . .; BILL_AMT6 = am
ount of bill statement in April, 2005.
PAY_AMT1-PAY_AMT6: Amount of previous payment (NT dollar). PAY_AMT1 = amount paid in Sep
tember, 2005; PAY_AMT2 = amount paid in August, 2005; . . .;PAY-AMT6 = amount paid in April, 20
05.
7
Data Overview for baseline model
Data summary
Total 30000 records
Summary Details about Training and V
alidation dataset used for the model.
We used a 70:30 split using stratified
random sampling.
Total number of records in training
data: 21000
Total positive cases(default/1) in traini
ng data: 4666
Total negative cases(non-default/0) in
training data: 16334
Total number of records in validatio
n data: 9000
Total positive cases(default/1) in valida
tion data: 1970
Total negative cases(non-default/0) in
validation data: 7030
8
Baseline model-Logistic Regression
Baseline Evaluation
AUC on ROC
Curve Benchmark
Training 72.27%
Validation 72.70%
9
Data Cleansing & Preprocessing
Variable Initial Type Type Changed to
1 SEX: numerical continuous character nominal
2 EDUCATION: numerical continuous character nominal
3 MARRIAGE: numerical continuous character nominal
4 PAY_0: numerical continuous character nominal
5 PAY_2: numerical continuous character nominal
6 PAY_3: numerical continuous character nominal
7 PAY_4: numerical continuous character nominal
8 PAY_5: numerical continuous character nominal
9 PAY_6: numerical continuous character nominal
Variable Transformation
10 BILL_AMT1 to BILL_AMT6 Standardized the Variable to scale
11 PAY_AMT1 to PAY_AMT6 Standardized the Variable to scale
12 LIMIT_BAL Standardized the Variable to scale
13 Age Standardized the Variable to scale
10
Benchmark Check-Post Preprocessing
AUC on RO
C Curve Benchmark
Pre
Processing
Gain on
Benchmark
Training 72.27% 77.06% 4.79%
Validation 72.70% 77.68% 4.98%
11
Data Visualization / Exploration
Default by Education Level
Imputed
Grad
School
Bachelors
High School
Other
Other
Other
12
Default by Age
21-27
27-31
31-37
37-43
43-79
Data Visualization / Exploration
13
Default by Marriage Status
Married
Single
Other
Imputed
@source
Data Visualization / Exploration
14
Scatter plot across Bill Amount and Pay Amount
There is HIGH Correlation between Bill Amount
(Value of monthly bill) across SIX months
However, there is LOW correlation between
Payment pattern across SIX months
Data Visualization / Exploration
15
Feature Engineering
Three Categories [pay_*, bill_amt*, pay_amt*] of variables shows behavioral patterns
across SIX months
To extract the aggregated pattern across SIX months, we derived FOUR new Variables
from the above Three Categories of variables.
Field Name Description
AMT_OWED
Running or cumulative sum of bill amount - payment amount fo
r each individual
AVG_6MTH_OWED Mean value of AMT_OWED over a 6-month period
MISSED_PAYMENTS
Maximum number of missed payments recorded for the individu
al
BALANCE_TO_LIMIT_RATIO
Average 6-month balance divided by the individual’s credit limit;
note anything <= 0.3 is considered good
16
Missed Payment by Gender
Data Visualization / Exploration…..
17
Missed Payments by Education Level
Data Visualization / Exploration
18
To understand the underlying structure of data, we
did a cluster analysis using hierarchical method
Six different cluster
groups were identified
The Cluster value was
added as NEW
variable to the data
In total FIVE new variable
s were added to the
dataset
FOUR derived variables –
which were standardized.
One Variable representing
SIX clusters – type casted
to character nominal
variable
Data Visualization - Clustering
19
Revised Benchmark
AUC on R
OC Curve
Benchma
rk
Pre
Processing
Feature
Engineering
Gain on
Benchma
rk
Training 72.27% 77.06% 77.15% 4.85%
Validatio
n 72.70% 77.68% 77.78% 4.99%
20
Dimensionality Reduction-PCA
Top TEN principal components -
adding up to 96.32% of variance
in DATA – was considered instea
d of numerical variables
AUC on
ROC
Curve
Benchm
ark
Pre
Processi
ng
Feature
Engineering PCA
Gain on
Benchmark
Training 72.27% 77.06% 77.15% 77.12% 4.85%
Validation 72.70% 77.68% 77.78% 77.69% 4.99%
Revised Benchmark
Dimensionality reduction using PCA method is d
ecreasing AUC value
from previous step. Therefore, we
Decided NOT to consider principal
components in the modeling
21
Trying-Different Models..
Following models were
considered for Analysis.
Stepwise Regression
Bootstrap Forest
Neural Networks
Evaluation Criteria:
Evaluation of each of the models were done based on
AUC under ROC value
Lift Ratio
Misclassification Rate
Accuracy of Positive Cases
Lift Ratio, Misclassification Rate and Accuracy of
Positive Cases were calculated at Probability Cutoff of 0.5
However, in some business context we may have to focus
on other evaluation matrices – like minimum misclassifica
tion
Rate or maximum sensitivity etc., which may lead to a
different model.
To explain this, we have considered an additional
evaluation criteria where we considered the
minimum misclassification rate on validation set.
22
Trying-Different Models..
Stepwise Regression
23
Trying-Different Models..
Bootstrap Forest
24
Trying-Different Models..
Neural Networks
25
Model Evaluation
At Probability cutoff - 0.5
At Probability Cutoff
with minimum
Misclassification Rate
Model AUC Un
der
ROC
Lift
Ratio
Misclassification
Rate
Accuracy of
Positive Cas
es
Threshold
(cutoff)
Lift
Ratio
Misclassification
Rate
Accuracy of
Positive Cas
es
Threshold
(cutoff)
Stepwise Regression
Training 77.40% 3.06 18.02% 67.93% 0.5 3.02 17.95% 67.18%
0.4
Validation 78.01% 3.08 17.78% 67.39% 0.5 2.96 17.71% 64.83%
0.4
Bootstrap Forest
Training 85.20% 3.15 17.44% 69.99% 0.5 3.02 17.01% 67.15%
0.42
Validation 78.80% 3.11 17.53% 68.08% 0.5 3.16 17.56% 69.16%
0.42
Neural Networks
Training 79.30% 3.11 17.53% 69.17% 0.5 3.10 17.57% 68.80%
0.49
Validation 78.27% 3.00 18.03% 65.62% 0.5 3.15 17.51% 68.87%
0.49
Models Comparison
Bootstrap Forest seems fare better across most evaluation metrics. Final model gave a gain of 6.1% AUC under
ROC curve from the initial baseline model benchmark 26
Model Evaluation
Model Analysis- Column Contribution
Analyzing – Column
Contributions in the Model,
Pay_0 the recent repayment s
tatus is the most influencing
factor in this model
27
Model Evaluation
Model Analysis- Column Contribution
Whenever Pay_0 has value 2 (payment delayed
for two months), chance of correctly identifying
default cases re Higher.
For any other value of Pay_0 the
chances of incorrect predictions are
higher.
28
Conclusion
Bootstrap model seems to work better for this problem and context. However for
the same problem with a different context or criterion may lead to a different
model.
Extending the utility of this model beyond this dataset to wider credit card industry
“The Model with sufficient refinement and learning should be able to predict default
trends in the industry and help regulators formulate policies and take preemptive
actions in interest of both USERS and BANK”
29
30

More Related Content

PPTX
prediction of default payment next month using a logistic approach
PDF
Machine Learning Project - Default credit card clients
PPTX
Default payment prediction system
PPTX
Construction of a robust prediction model to forecast the likelihood of a cre...
PPTX
Credit Risk Evaluation Model
DOCX
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
PDF
Taiwanese Credit Card Client Fraud detection
PDF
Project 01 - Data Exploration and Reporting
prediction of default payment next month using a logistic approach
Machine Learning Project - Default credit card clients
Default payment prediction system
Construction of a robust prediction model to forecast the likelihood of a cre...
Credit Risk Evaluation Model
Week 2 Individual Assignment 2 Quantitative Analysis of Credit - .docx
Taiwanese Credit Card Client Fraud detection
Project 01 - Data Exploration and Reporting

Similar to OPIM 5604 predictive modeling presentation group7 (20)

PPTX
Pds assignment 2 presentation
DOCX
Credit Card Marketing Classification Trees Fr.docx
PPTX
Mining Credit Card Defults
PPTX
Loan default prediction with machine language
PPTX
Credit risk scoring model final
PDF
Default of Credit Card Payments
PPTX
Credit defaulter analysis
PPTX
Introduction to predictive modeling v1
PPTX
exploratory data analysis on german credit data
PDF
German credit score shivaram prakash
PDF
Credit Card Default Risk
PDF
Forecasting P2P Credit Risk based on Lending Club data
PDF
Forecasting peer to_peer_lending_risk
PPTX
Risk Based Loan Approval Framework
PPTX
Purchase Prediction for Insurance Company
PDF
fast publication journals
PDF
Machine learning project
Pds assignment 2 presentation
Credit Card Marketing Classification Trees Fr.docx
Mining Credit Card Defults
Loan default prediction with machine language
Credit risk scoring model final
Default of Credit Card Payments
Credit defaulter analysis
Introduction to predictive modeling v1
exploratory data analysis on german credit data
German credit score shivaram prakash
Credit Card Default Risk
Forecasting P2P Credit Risk based on Lending Club data
Forecasting peer to_peer_lending_risk
Risk Based Loan Approval Framework
Purchase Prediction for Insurance Company
fast publication journals
Machine learning project
Ad

More from Shu-Feng Tsao (7)

PDF
OPIM 5272 group project
PPTX
OPIM 5270 Team 2 Presentation
PPTX
Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...
PDF
Data Analytics for Business Certificate for Shu tsao
PPTX
Case Study: Increasing Product Returns at Amaron, Inc.
PPTX
Forecasting case study: Chiboodle inc
PPTX
Older Adults FV Intake & FM shopping
OPIM 5272 group project
OPIM 5270 Team 2 Presentation
Nontraditional Industries for UConn MS BAPM students' Job/Internship Consider...
Data Analytics for Business Certificate for Shu tsao
Case Study: Increasing Product Returns at Amaron, Inc.
Forecasting case study: Chiboodle inc
Older Adults FV Intake & FM shopping
Ad

Recently uploaded (20)

PPTX
Maths science sst hindi english cucumber
PPT
Chap 1PP.ppt introductory micro economics
PPTX
PPT-Lesson-2-Recognize-a-Potential-Market-2-3.pptx
PDF
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
PPTX
FL INTRODUCTION TO AGRIBUSINESS CHAPTER 1
PPTX
The discussion on the Economic in transportation .pptx
PDF
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
PDF
5a An Age-Based, Three-Dimensional Distribution Model Incorporating Sequence ...
PDF
Principal of magaement is good fundamentals in economics
PPTX
OAT_ORI_Fed Independence_August 2025.pptx
PDF
The Role of Islamic Faith, Ethics, Culture, and values in promoting fairness ...
PDF
Blockchain Pesa Research by Samuel Mefane
PDF
Statistics for Management and Economics Keller 10th Edition by Gerald Keller ...
PPTX
kyc aml guideline a detailed pt onthat.pptx
PPTX
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
PDF
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
PPTX
Module5_Session1 (mlzrkfbbbbbbbbbbbz1).pptx
PPTX
Grp C.ppt presentation.pptx for Economics
PDF
Financial discipline for educational purpose
PDF
Lecture1.pdf buss1040 uses economics introduction
Maths science sst hindi english cucumber
Chap 1PP.ppt introductory micro economics
PPT-Lesson-2-Recognize-a-Potential-Market-2-3.pptx
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
FL INTRODUCTION TO AGRIBUSINESS CHAPTER 1
The discussion on the Economic in transportation .pptx
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
5a An Age-Based, Three-Dimensional Distribution Model Incorporating Sequence ...
Principal of magaement is good fundamentals in economics
OAT_ORI_Fed Independence_August 2025.pptx
The Role of Islamic Faith, Ethics, Culture, and values in promoting fairness ...
Blockchain Pesa Research by Samuel Mefane
Statistics for Management and Economics Keller 10th Edition by Gerald Keller ...
kyc aml guideline a detailed pt onthat.pptx
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
Module5_Session1 (mlzrkfbbbbbbbbbbbz1).pptx
Grp C.ppt presentation.pptx for Economics
Financial discipline for educational purpose
Lecture1.pdf buss1040 uses economics introduction

OPIM 5604 predictive modeling presentation group7

  • 1. GROUP – 7 Dawei Ye Hrushikesh Basavanahalli Jobil Joseph Ryan Curtis Shu-Feng Tsao Yijing Liang Predicting Credit Card Defaults OPIM 5640 – Predictive Modeling – Final Assignment Data Source from Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA
  • 2.  The Business Problem  Data - Data source used  Modeling Methodology Adapted  Detailing Major steps in the methodology  Models  WHY and HOW we choose this model  Conclusion Agenda 2
  • 4. We took a dataset with 30,000 records of c redit card borrowers with details about their demography and behavior and payment patterns Dataset Source…Kaggle….. The goal is to build a Predictive Model that would predict if a credit card user would default on the Upcoming payment with acceptable accuracy 4
  • 5. A Doctor a data doctor… with a diagnostic approach 5
  • 6. - Data - Data - Data - Data - Data Visualize Data For Initial Diagnostics Build Baseline Model Logistic Regression Model Revised the Model After Each Iteration Checking improvement in accuracy Data Cleansing Data Pre-processing Exploratory Data Analytics Feature / Engineering Selection Trying Different Models 6
  • 7. Data Dictionary Data contains a binary variable, default payment (Yes = 1, No = 0), as the response variable. Total 25 variables in the dataset including response variable. ID: ID of each client LIMIT_BAL: Amount of the given credit (NT dollar): it includes both the individual consumer credit a nd his/her family (supplementary) credit. SEX: Gender (1 = male; 2 = female). EDUCATION: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). MARRIAGE: Marital status (1 = married; 2 = single; 3 = others). AGE: Age (year). PAY_0 – PAY_6: History of past payment. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for t wo months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. BILL_AMT1-BILL_AMT6: Amount of bill statement (NT dollar). BILL_AMT1 = amount of bill statemen t in September, 2005; BILL_AMT2 = amount of bill statement in August, 2005; . . .; BILL_AMT6 = am ount of bill statement in April, 2005. PAY_AMT1-PAY_AMT6: Amount of previous payment (NT dollar). PAY_AMT1 = amount paid in Sep tember, 2005; PAY_AMT2 = amount paid in August, 2005; . . .;PAY-AMT6 = amount paid in April, 20 05. 7
  • 8. Data Overview for baseline model Data summary Total 30000 records Summary Details about Training and V alidation dataset used for the model. We used a 70:30 split using stratified random sampling. Total number of records in training data: 21000 Total positive cases(default/1) in traini ng data: 4666 Total negative cases(non-default/0) in training data: 16334 Total number of records in validatio n data: 9000 Total positive cases(default/1) in valida tion data: 1970 Total negative cases(non-default/0) in validation data: 7030 8
  • 9. Baseline model-Logistic Regression Baseline Evaluation AUC on ROC Curve Benchmark Training 72.27% Validation 72.70% 9
  • 10. Data Cleansing & Preprocessing Variable Initial Type Type Changed to 1 SEX: numerical continuous character nominal 2 EDUCATION: numerical continuous character nominal 3 MARRIAGE: numerical continuous character nominal 4 PAY_0: numerical continuous character nominal 5 PAY_2: numerical continuous character nominal 6 PAY_3: numerical continuous character nominal 7 PAY_4: numerical continuous character nominal 8 PAY_5: numerical continuous character nominal 9 PAY_6: numerical continuous character nominal Variable Transformation 10 BILL_AMT1 to BILL_AMT6 Standardized the Variable to scale 11 PAY_AMT1 to PAY_AMT6 Standardized the Variable to scale 12 LIMIT_BAL Standardized the Variable to scale 13 Age Standardized the Variable to scale 10
  • 11. Benchmark Check-Post Preprocessing AUC on RO C Curve Benchmark Pre Processing Gain on Benchmark Training 72.27% 77.06% 4.79% Validation 72.70% 77.68% 4.98% 11
  • 12. Data Visualization / Exploration Default by Education Level Imputed Grad School Bachelors High School Other Other Other 12
  • 13. Default by Age 21-27 27-31 31-37 37-43 43-79 Data Visualization / Exploration 13
  • 14. Default by Marriage Status Married Single Other Imputed @source Data Visualization / Exploration 14
  • 15. Scatter plot across Bill Amount and Pay Amount There is HIGH Correlation between Bill Amount (Value of monthly bill) across SIX months However, there is LOW correlation between Payment pattern across SIX months Data Visualization / Exploration 15
  • 16. Feature Engineering Three Categories [pay_*, bill_amt*, pay_amt*] of variables shows behavioral patterns across SIX months To extract the aggregated pattern across SIX months, we derived FOUR new Variables from the above Three Categories of variables. Field Name Description AMT_OWED Running or cumulative sum of bill amount - payment amount fo r each individual AVG_6MTH_OWED Mean value of AMT_OWED over a 6-month period MISSED_PAYMENTS Maximum number of missed payments recorded for the individu al BALANCE_TO_LIMIT_RATIO Average 6-month balance divided by the individual’s credit limit; note anything <= 0.3 is considered good 16
  • 17. Missed Payment by Gender Data Visualization / Exploration….. 17
  • 18. Missed Payments by Education Level Data Visualization / Exploration 18
  • 19. To understand the underlying structure of data, we did a cluster analysis using hierarchical method Six different cluster groups were identified The Cluster value was added as NEW variable to the data In total FIVE new variable s were added to the dataset FOUR derived variables – which were standardized. One Variable representing SIX clusters – type casted to character nominal variable Data Visualization - Clustering 19
  • 20. Revised Benchmark AUC on R OC Curve Benchma rk Pre Processing Feature Engineering Gain on Benchma rk Training 72.27% 77.06% 77.15% 4.85% Validatio n 72.70% 77.68% 77.78% 4.99% 20
  • 21. Dimensionality Reduction-PCA Top TEN principal components - adding up to 96.32% of variance in DATA – was considered instea d of numerical variables AUC on ROC Curve Benchm ark Pre Processi ng Feature Engineering PCA Gain on Benchmark Training 72.27% 77.06% 77.15% 77.12% 4.85% Validation 72.70% 77.68% 77.78% 77.69% 4.99% Revised Benchmark Dimensionality reduction using PCA method is d ecreasing AUC value from previous step. Therefore, we Decided NOT to consider principal components in the modeling 21
  • 22. Trying-Different Models.. Following models were considered for Analysis. Stepwise Regression Bootstrap Forest Neural Networks Evaluation Criteria: Evaluation of each of the models were done based on AUC under ROC value Lift Ratio Misclassification Rate Accuracy of Positive Cases Lift Ratio, Misclassification Rate and Accuracy of Positive Cases were calculated at Probability Cutoff of 0.5 However, in some business context we may have to focus on other evaluation matrices – like minimum misclassifica tion Rate or maximum sensitivity etc., which may lead to a different model. To explain this, we have considered an additional evaluation criteria where we considered the minimum misclassification rate on validation set. 22
  • 26. Model Evaluation At Probability cutoff - 0.5 At Probability Cutoff with minimum Misclassification Rate Model AUC Un der ROC Lift Ratio Misclassification Rate Accuracy of Positive Cas es Threshold (cutoff) Lift Ratio Misclassification Rate Accuracy of Positive Cas es Threshold (cutoff) Stepwise Regression Training 77.40% 3.06 18.02% 67.93% 0.5 3.02 17.95% 67.18% 0.4 Validation 78.01% 3.08 17.78% 67.39% 0.5 2.96 17.71% 64.83% 0.4 Bootstrap Forest Training 85.20% 3.15 17.44% 69.99% 0.5 3.02 17.01% 67.15% 0.42 Validation 78.80% 3.11 17.53% 68.08% 0.5 3.16 17.56% 69.16% 0.42 Neural Networks Training 79.30% 3.11 17.53% 69.17% 0.5 3.10 17.57% 68.80% 0.49 Validation 78.27% 3.00 18.03% 65.62% 0.5 3.15 17.51% 68.87% 0.49 Models Comparison Bootstrap Forest seems fare better across most evaluation metrics. Final model gave a gain of 6.1% AUC under ROC curve from the initial baseline model benchmark 26
  • 27. Model Evaluation Model Analysis- Column Contribution Analyzing – Column Contributions in the Model, Pay_0 the recent repayment s tatus is the most influencing factor in this model 27
  • 28. Model Evaluation Model Analysis- Column Contribution Whenever Pay_0 has value 2 (payment delayed for two months), chance of correctly identifying default cases re Higher. For any other value of Pay_0 the chances of incorrect predictions are higher. 28
  • 29. Conclusion Bootstrap model seems to work better for this problem and context. However for the same problem with a different context or criterion may lead to a different model. Extending the utility of this model beyond this dataset to wider credit card industry “The Model with sufficient refinement and learning should be able to predict default trends in the industry and help regulators formulate policies and take preemptive actions in interest of both USERS and BANK” 29
  • 30. 30

Editor's Notes

  • #4: To Predict if a borrower would default or NOT Team’s goal is to predict weather a borrower would default on his / her credit card due or NOT This would help Banks decide on RISK the bank is taking up while issue a Credit Card or deciding on the - Default NOT necessarily mean bad for bank IF borrower recovers and pays up all necessary fee! - But very important for bank to assess the RISK they are carrying while approving a revolving credit for the borrower.
  • #6: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #7: Analogy of Doctor diagnosing a patient… and improving his health so that… we are confident that he would be able to jump… Data Doctor accepted the data Visualized the data to do initial diagnostics Data Cleansing Data Pre-processing Exploratory Data Analysis Feature / Engineering Selection Try different Model details Evaluation of the Model Publishing the model
  • #8: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #9: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #10: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #11: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #12: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #13: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #14: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #15: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #16: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #17: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #18: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #19: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #20: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #21: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #22: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #23: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #24: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #25: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #26: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #27: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #28: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #29: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #30: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….
  • #31: To Go to a Data Doctor to resolve the modeling problem.. To check if the borrower is Fit enough to pay or NOT….