SlideShare a Scribd company logo
Estimating the probability
of default: Credit Risk
Mohamed Arsalan Qadri
Sarvesh Saurabh
Mohit Ravi
Summary
• Credit risk – The probability of default
• Data Cleansing
• Logistic Regression
• Linear Discriminant Analysis
• Comparison of the LR and LDA
• Factor Analysis
Credit Risk
What is it?
• The risk of default on a debt that may arise from a borrower failing to make
required payments.
Impact on the lender?
• Lost principal and interest, disruption to cash flows, and increased collection
costs.
How to estimate it?
• Credit risk arises from the potential that a borrower or counterparty will fail to perform
on an obligation
Sources of risk?
• For most banks, loans are the largest and most obvious source of credit risk.
• There are other sources of credit risk both on and off the balance sheet including
letters of credit unfunded loan commitments, and lines of credit.
• Other products, activities, and services that expose a bank to credit risk are credit
derivatives, foreign exchange, and cash management services.
Credit Risk
Credit Scoring vs Risk
Estimation of risk?
• The risk posed by the borrower is inversely proportional to the credit score.
• A statistically derived numeric expression of a person's creditworthiness that is used by
lenders to access the likelihood that a person will repay his or her debts.
• A credit score is based on, among other things, a person's past credit history (300-850)
Credit Scoring
• Consumers can typically keep their credit scores high by maintaining a long history of
always paying their bills on time and not having too much debt.
• A FICO score is the most widely used credit scoring system.
• A credit score is primarily based on a credit report information typically sourced from
credit bureaus.
Data Cleaning
Data Cleaning
• Serious Delinquency in two years. (Make a Pi chart for this)
• Revolving Utilization Of Unsecured Lines
Data Cleaning
• Age
Data Cleaning
• Number Of Time 30-59 Days Past Due Not Worse
Data Cleaning
• Number Of Time 60-89 Days Past Due Not Worse
Data Cleaning
• Number Of Times 90 Days Late
Data Cleaning
• Monthly Income
• Replaced with Mean
Data Cleaning
Data Cleaning
• Monthly Income
• Ran Multiple Linear Regression
on Missing Values
Data Cleaning
• Monthly Income
• The Histogram after running
Multiple Linear Regression
on Missing Values
Data Cleaning
• Debt Ratio
• We found that the Debt Ratio was extremely high in many cases.
• Upon Closer inspection, we found out that high debt ratio was present for those records
whose Monthly Income was unknown.
• From this we inferred that the Debt Ratio could most probably be the Debt.
Data Cleaning
• Debt Ratio
• We replaced the high values of debt ratio by dividing it by the predicted values of the
monthly income.
• The new mean after replacement was 0.67
Data Cleaning
• Number of Dependents
Data Modelling
• Split the dataset into Training data (70%) and Test Data (30%).
• Computed Co-relation Matrix among Independent variables.
• The variables had very less Co-relation amongst themselves.
• Ran Logistic Regression by using Stepwise selection.
• Ran Linear Discriminant Analysis.
• Compared both the models by measuring their accuracy of prediction.
• Ran both models on significant Factors using Factor Analysis.
Logistic Regression
Logistic Regression
• Ran Logistic Regression separately for each variable.
• Computed the ROC curve for each variable and compared the AUC value.
Stepwise Selection
• Overall Model was Significant.
• All the variables were included in the
model.
• The model built on the Training data
was tested on the Test data.
• Probability of default > 0.7 was coded
as 1, and Probability of default <0.7
was coded as 0.
Logistic Regression on Test Data
Overall Accuracy = (41374+291)/(41374+291+175+2661)
= 93.6 %
True Positive Rate = TP / (TP+FN)
= 9.85%
True Negative Rate = TN / (TN+FP)
= 99.5%
Predicted Values Actual Values
Confusion Matrix
ROC curve for Test Data
• AUC Value = 0.8557
Discriminant Analysis
Discriminant Analysis
Overall accuracy =(38134+1717)/Total
=89.5 %
True Positive Rate = TP / (TP+FN)
= 58%
True Negative Rate = TN / (TN+FP)
= 91.7%
Predicted Predicted
0 1
Actual
0
Actual
1
38134 3415
1235 1717
Serious
Deliquen
Comparison of Models
Linear Discriminant Analysis
Overall accuracy =89.5 %
Predicted Predicted
0 1
Actual
0
Actual
1
38134 3415
1235 1717
Serious
Deliquen
Logistic Regression
Overall Accuracy = 93.6 %
Normality of variables
Factor Analysis
Factor Analysis
Factor Pattern
Factor1 Factor2 Factor3 Factor4
NumberOfTimes90DaysLate 0.54684 0.28062 0.26286 -0.0429
Factor 1 NumberOfTime60_89DaysPastDueNot 0.50016 0.3943 0.37949 -0.0015
RevolvingUtilizationOfUnsecured 0.60945 0.24942 -0.1861 -0.0285
NumberOfOpenCreditLinesAndLoans -0.5203 0.5275 0.1922 0.15051
NumberRealEstateLoansOrLines -0.4698 0.61529 -0.0292 0.09694
Factor 2 NumberOfDependents_num 0.03058 0.46357 -0.6034 -0.008
Monthlyincome_debt -0.4298 0.5044 -0.09 -0.1628
NumberOfTime30_59DaysPastDueNot 0.40861 0.49901 0.31943 0.05977
Factor 3 age -0.4301 -0.1476 0.65733 -0.0396
Factor 4 DebtRatio 0.05584 -0.0712 -0.0331 0.97112
Conclusion
• 80% time spent on Data cleaning
• Logistic Regression gives better results when data is not normal as compared to LDA
• Factors can be grouped for a logical understanding, with Debt Ratio and age explaining high
variance.
Thank you

More Related Content

PDF
Credit Risk Management
PPTX
Credit Risk Model Building Steps
PPTX
PPTX
Credit management chapter no 2
PPT
Credit Management Chap 8
PPTX
Liquidity Risk Measurement
PPTX
Fraud risk management in banks
PPT
ALCO Process - Liquidity Risk Management
Credit Risk Management
Credit Risk Model Building Steps
Credit management chapter no 2
Credit Management Chap 8
Liquidity Risk Measurement
Fraud risk management in banks
ALCO Process - Liquidity Risk Management

What's hot (20)

PPTX
Credit risk scoring model final
PPTX
"Credit Risk-Probabilities Of Default"
PPT
Credit risk
PPTX
Treasury management 1
PPTX
INTEREST RATE RISK
PPTX
CF_8 UNIT4 Risk Reporting & Risk Mgt
PPT
Liquidity Risk
PDF
Toward Credit Portfolio Management
PPTX
Credit management
PPTX
Credit Risk Evaluation Model
PPT
2. types of risks
PDF
Credit scorecard
PDF
Credit monitoring
PPTX
Counterparty credit risk. general review
PPT
Treasury Management
PPTX
Risk Management in Banking Sectors.
PDF
Credit Analysis 101
PPTX
Bank lendings and loans ppt
PDF
Banking credit concentration management -limiting setting
PPTX
Operational risk ppt
Credit risk scoring model final
"Credit Risk-Probabilities Of Default"
Credit risk
Treasury management 1
INTEREST RATE RISK
CF_8 UNIT4 Risk Reporting & Risk Mgt
Liquidity Risk
Toward Credit Portfolio Management
Credit management
Credit Risk Evaluation Model
2. types of risks
Credit scorecard
Credit monitoring
Counterparty credit risk. general review
Treasury Management
Risk Management in Banking Sectors.
Credit Analysis 101
Bank lendings and loans ppt
Banking credit concentration management -limiting setting
Operational risk ppt
Ad

Viewers also liked (20)

PDF
Sound Credit Risk Experience Sharing Vietnam Fsa And Bank
PPTX
Onno de vrij (sas) better decision making 12-10
PPTX
Logistic regression
PPTX
Logistic regression
PPT
Optimization strategy for Amazon's Uber like delivery service
PPTX
Introduction to Default
PPT
Credit+risk+estimation(2)
PDF
Altman Z-Score+
PPT
Credit Risk Modelling Primer
PDF
KMV model
PDF
H2O World - GBM and Random Forest in H2O- Mark Landry
PDF
Higgs Boson Machine Learning Challenge - Kaggle
PPTX
classification_methods-logistic regression Machine Learning
PPTX
Bankruptcy prediction models (2)
PPT
Credit Risk Management Primer
PDF
Forecasting P2P Credit Risk based on Lending Club data
PDF
Consumer Credit Scoring Using Logistic Regression and Random Forest
PPT
Credit risk models
PPT
Z-Scores
PPTX
Logistic regression with low event rate (rare events)
Sound Credit Risk Experience Sharing Vietnam Fsa And Bank
Onno de vrij (sas) better decision making 12-10
Logistic regression
Logistic regression
Optimization strategy for Amazon's Uber like delivery service
Introduction to Default
Credit+risk+estimation(2)
Altman Z-Score+
Credit Risk Modelling Primer
KMV model
H2O World - GBM and Random Forest in H2O- Mark Landry
Higgs Boson Machine Learning Challenge - Kaggle
classification_methods-logistic regression Machine Learning
Bankruptcy prediction models (2)
Credit Risk Management Primer
Forecasting P2P Credit Risk based on Lending Club data
Consumer Credit Scoring Using Logistic Regression and Random Forest
Credit risk models
Z-Scores
Logistic regression with low event rate (rare events)
Ad

Similar to Estimation of the probability of default : Credit Rish (20)

PPTX
Credit defaulter analysis
PPT
Canadian Banking Basics - Risk management
PDF
credit-scoring-vs-probability-of-default
PDF
FICO Credit Risk Data
PDF
09.2 credit scoring
PPTX
Apanps5210 - final presentation
PDF
Default-Forecasting Project
PDF
FICO Credit Risk Data
PPTX
exploratory data analysis on german credit data
PDF
A New Approach to Consumer Credit
PPT
81_8997_497880.ppt
PDF
Predicting Delinquency-Give me some credit
PDF
Kaggle "Give me some credit" challenge overview
PDF
Cr risk model
PPTX
LOAN PREDICTION BASED ON CUSTOMER BEHAVIOR.pptx
PDF
Predicting Loan Approval: A Data Science Project
PDF
fast publication journals
PPTX
Default payment prediction system
PDF
Machine Learning Application: Credit Scoring
PDF
Credit Audit's Use of Data Analytics in Examining Consumer Loan Portfolios
Credit defaulter analysis
Canadian Banking Basics - Risk management
credit-scoring-vs-probability-of-default
FICO Credit Risk Data
09.2 credit scoring
Apanps5210 - final presentation
Default-Forecasting Project
FICO Credit Risk Data
exploratory data analysis on german credit data
A New Approach to Consumer Credit
81_8997_497880.ppt
Predicting Delinquency-Give me some credit
Kaggle "Give me some credit" challenge overview
Cr risk model
LOAN PREDICTION BASED ON CUSTOMER BEHAVIOR.pptx
Predicting Loan Approval: A Data Science Project
fast publication journals
Default payment prediction system
Machine Learning Application: Credit Scoring
Credit Audit's Use of Data Analytics in Examining Consumer Loan Portfolios

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Database Infoormation System (DBIS).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Quality review (1)_presentation of this 21
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Computer network topology notes for revision
PDF
Mega Projects Data Mega Projects Data
Clinical guidelines as a resource for EBP(1).pdf
Reliability_Chapter_ presentation 1221.5784
Supervised vs unsupervised machine learning algorithms
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Moving the Public Sector (Government) to a Digital Adoption
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Database Infoormation System (DBIS).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Knowledge Engineering Part 1
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Launch Your Data Science Career in Kochi – 2025
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Quality review (1)_presentation of this 21
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Computer network topology notes for revision
Mega Projects Data Mega Projects Data

Estimation of the probability of default : Credit Rish

  • 1. Estimating the probability of default: Credit Risk Mohamed Arsalan Qadri Sarvesh Saurabh Mohit Ravi
  • 2. Summary • Credit risk – The probability of default • Data Cleansing • Logistic Regression • Linear Discriminant Analysis • Comparison of the LR and LDA • Factor Analysis
  • 3. Credit Risk What is it? • The risk of default on a debt that may arise from a borrower failing to make required payments. Impact on the lender? • Lost principal and interest, disruption to cash flows, and increased collection costs. How to estimate it? • Credit risk arises from the potential that a borrower or counterparty will fail to perform on an obligation
  • 4. Sources of risk? • For most banks, loans are the largest and most obvious source of credit risk. • There are other sources of credit risk both on and off the balance sheet including letters of credit unfunded loan commitments, and lines of credit. • Other products, activities, and services that expose a bank to credit risk are credit derivatives, foreign exchange, and cash management services. Credit Risk
  • 5. Credit Scoring vs Risk Estimation of risk? • The risk posed by the borrower is inversely proportional to the credit score. • A statistically derived numeric expression of a person's creditworthiness that is used by lenders to access the likelihood that a person will repay his or her debts. • A credit score is based on, among other things, a person's past credit history (300-850)
  • 6. Credit Scoring • Consumers can typically keep their credit scores high by maintaining a long history of always paying their bills on time and not having too much debt. • A FICO score is the most widely used credit scoring system. • A credit score is primarily based on a credit report information typically sourced from credit bureaus.
  • 8. Data Cleaning • Serious Delinquency in two years. (Make a Pi chart for this)
  • 9. • Revolving Utilization Of Unsecured Lines Data Cleaning
  • 11. • Number Of Time 30-59 Days Past Due Not Worse Data Cleaning
  • 12. • Number Of Time 60-89 Days Past Due Not Worse Data Cleaning
  • 13. • Number Of Times 90 Days Late Data Cleaning
  • 14. • Monthly Income • Replaced with Mean Data Cleaning
  • 15. Data Cleaning • Monthly Income • Ran Multiple Linear Regression on Missing Values
  • 16. Data Cleaning • Monthly Income • The Histogram after running Multiple Linear Regression on Missing Values
  • 17. Data Cleaning • Debt Ratio • We found that the Debt Ratio was extremely high in many cases. • Upon Closer inspection, we found out that high debt ratio was present for those records whose Monthly Income was unknown. • From this we inferred that the Debt Ratio could most probably be the Debt.
  • 18. Data Cleaning • Debt Ratio • We replaced the high values of debt ratio by dividing it by the predicted values of the monthly income. • The new mean after replacement was 0.67
  • 19. Data Cleaning • Number of Dependents
  • 20. Data Modelling • Split the dataset into Training data (70%) and Test Data (30%). • Computed Co-relation Matrix among Independent variables. • The variables had very less Co-relation amongst themselves. • Ran Logistic Regression by using Stepwise selection. • Ran Linear Discriminant Analysis. • Compared both the models by measuring their accuracy of prediction. • Ran both models on significant Factors using Factor Analysis.
  • 22. Logistic Regression • Ran Logistic Regression separately for each variable. • Computed the ROC curve for each variable and compared the AUC value.
  • 23. Stepwise Selection • Overall Model was Significant. • All the variables were included in the model. • The model built on the Training data was tested on the Test data. • Probability of default > 0.7 was coded as 1, and Probability of default <0.7 was coded as 0.
  • 24. Logistic Regression on Test Data Overall Accuracy = (41374+291)/(41374+291+175+2661) = 93.6 % True Positive Rate = TP / (TP+FN) = 9.85% True Negative Rate = TN / (TN+FP) = 99.5% Predicted Values Actual Values Confusion Matrix
  • 25. ROC curve for Test Data • AUC Value = 0.8557
  • 27. Discriminant Analysis Overall accuracy =(38134+1717)/Total =89.5 % True Positive Rate = TP / (TP+FN) = 58% True Negative Rate = TN / (TN+FP) = 91.7% Predicted Predicted 0 1 Actual 0 Actual 1 38134 3415 1235 1717 Serious Deliquen
  • 28. Comparison of Models Linear Discriminant Analysis Overall accuracy =89.5 % Predicted Predicted 0 1 Actual 0 Actual 1 38134 3415 1235 1717 Serious Deliquen Logistic Regression Overall Accuracy = 93.6 %
  • 31. Factor Analysis Factor Pattern Factor1 Factor2 Factor3 Factor4 NumberOfTimes90DaysLate 0.54684 0.28062 0.26286 -0.0429 Factor 1 NumberOfTime60_89DaysPastDueNot 0.50016 0.3943 0.37949 -0.0015 RevolvingUtilizationOfUnsecured 0.60945 0.24942 -0.1861 -0.0285 NumberOfOpenCreditLinesAndLoans -0.5203 0.5275 0.1922 0.15051 NumberRealEstateLoansOrLines -0.4698 0.61529 -0.0292 0.09694 Factor 2 NumberOfDependents_num 0.03058 0.46357 -0.6034 -0.008 Monthlyincome_debt -0.4298 0.5044 -0.09 -0.1628 NumberOfTime30_59DaysPastDueNot 0.40861 0.49901 0.31943 0.05977 Factor 3 age -0.4301 -0.1476 0.65733 -0.0396 Factor 4 DebtRatio 0.05584 -0.0712 -0.0331 0.97112
  • 32. Conclusion • 80% time spent on Data cleaning • Logistic Regression gives better results when data is not normal as compared to LDA • Factors can be grouped for a logical understanding, with Debt Ratio and age explaining high variance.

Editor's Notes

  • #23: ROC curve measures how well your binary classifier is performing. It is comparing the rate at which the classifier is making correct prediction vs the rate at which the classifier is making wrong predictions. The diagonal line in the middle represents the classifier making random guess. Which means it is right 50% of the time and wrong the other 50% of the time. Here we have the ROC curve for Monthly income. From this ROC curve, we can calculate the area under this curve. In this case 0.8508. The higher the AUC value, the better is the model. On the right, we have the AUC values for all the variables. Monthly Income has the best AUC value of 0.8508. Most of the other variables fall below 0.7 and debt ratio does worse than 0.5