SlideShare a Scribd company logo
2
Most read
3
Most read
5
Most read
Machine Learning Application:
Credit Scoring
Programming Techniques
Professor Carlos Costa
Master in Mathematical Finance
Federico Innocenti 53251
Miguel Albergaria 48547
Claudio Napoli 53358
Iacopo Fiorentino 53315 Lisbon, December 11th
2019
Context
► The data is collected from Thomson Reuters from firms
included in the main stock indexes.
► The goal is to set a score of a company to decide
whether to give a loan or not to that firm based on a
client’s probability of default.
► For that we compute many ratios and at the end we
want to “differentiate winners from losers”.
Data preparation
► Importing data, checking the type of data and
clearing missing values;
► Correlation matrix;
► See how the data is distributed through graphs;
► Rearranging the data clearing very low values
and very high values, i.e., outliers.
► After all of that, we did the correlation matrix
and graphs again to compare them and to have
a better view of our results.
Modelling data
► Our data doesn´t have a probability of default, so we need to create one.
► In order to compute the machine learning approach we use:
► Supervised learning: logistic regression and random forest
► Unsupervised learning: clustering K-mean
► We decided to use a financial scorecard, in order to give a certain score to
different ratios.
Setting the score
► Relevant ratios: current ratio, debt ratio, equity to asset ratio, debt to
equity ratio, return on asset, return on equity, long term coverage ratio and
asset turnover ratio.
► The company’s goal is to obtain the highest score that we compute in the
way showed before. An example of the code is shown here:
► The final score is set by adding all of the “ratios’ scores”.
Evaluation
► For the evaluation of our model we compute a confusion matrix in order to
see the result and have an easier first parametre to compare the three
models.
► After setting the score we binarize the score being 1 the lowest probability
of default and 0 the highest. We chose as threshold a score of 500 points and
then we proceed to the evaluation.
Logistic Regression
► We leave the set of the logistic
regression in default mode with
a test size of 0.7.
► The final result is good with a
AUC of 0.75, which means that
it is a good model distinguishing
the given classes.
► But there is a problem!
► The model has a type 2 error. In
other words, it predicts 1 but
actually is 0.
► So the F1 score (measure of
accuracy) is 0.68.
Random Forest
► In order to optimize the process we put the “number of jobs” 150 and the
“number of estimators” is 1 since it is a binomial classification.
► This model achieved a really high AUC: 0.87 and a good F1-Score.
► High precision and high recall means low probability of error type I and II.
K-Mean
► We increased the number of iterations to 400 times in order to optimize this
model and to try to get more stable results.
► The main problem with the K-mean clustering model is that it suffers from a low
precision predicting the default cases (type I error).
► On the other hand it has an acceptable F1-Score and a AUC of 0.80.
Conclusions
► The standardization of the ratio and the cleaning of the data gets the models
to have a high AUC on the three models.
► The better model is the Random Forest, getting a better AUC result.
► We confirm that machine learning algorithms are really powerful in analysing
data and it can be helpful to solve this specific problem.

More Related Content

PPTX
Housing price prediction
PPTX
House Price Prediction An AI Approach.
PPT
introduction to Numerical Analysis
PPTX
Machine learning algorithms
PPTX
Introduction to Machine Learning
PDF
Lecture 2 (Machine Learning)
DOC
Strayer mat 540 week 2 quiz 1 set 3 questions new
PPTX
Machine learning algorithms and business use cases
Housing price prediction
House Price Prediction An AI Approach.
introduction to Numerical Analysis
Machine learning algorithms
Introduction to Machine Learning
Lecture 2 (Machine Learning)
Strayer mat 540 week 2 quiz 1 set 3 questions new
Machine learning algorithms and business use cases

What's hot (18)

DOC
Strayer mat 540 week 2 quiz 1 set 3 questions new
PPTX
Supervised learning
PDF
PDF
@elemorfaruk
PDF
Machine learning
PPTX
Machine learning session6(decision trees random forrest)
PPTX
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
PPTX
Statistical Learning on Credit Data
PPTX
Multiple reg presentation
PPTX
Use Of Calculus In Programming
PDF
Bank loan purchase modeling
PDF
Array sheet
PPTX
House price prediction
ODP
Linear Regression Ex
PDF
Telecom customer churn prediction
PDF
Employee mode of commuting
Strayer mat 540 week 2 quiz 1 set 3 questions new
Supervised learning
@elemorfaruk
Machine learning
Machine learning session6(decision trees random forrest)
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Statistical Learning on Credit Data
Multiple reg presentation
Use Of Calculus In Programming
Bank loan purchase modeling
Array sheet
House price prediction
Linear Regression Ex
Telecom customer churn prediction
Employee mode of commuting
Ad

Similar to Machine Learning Application: Credit Scoring (20)

PDF
Accurate Campaign Targeting Using Classification Algorithms
PPTX
Py data19 final
PPTX
Week14_Business Simulation Modeling MSBA.pptx
PDF
Machine learning in credit risk modeling : a James white paper
PPT
MIS637_Final_Project_Rahul_Bhatia
PDF
Detection of credit card fraud
PPTX
Computational Finance Introductory Lecture
PPTX
Statistical Learning and Model Selection (1).pptx
PDF
Logistic_regression_ML.pdf
PPT
Chapter 04
PPTX
Decision Tree and Bayesian Classification
PPTX
Core Excel Functions for Financial Modeling.pptx
PPTX
Machine Learning Approach.pptx
PPTX
Lecture 3.1_ Logistic Regression powerpoint
PPTX
Machine-Learning-Overview a statistical approach
PPT
cas_washington_nov2010_web
PDF
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
PPT
Ch08 ci estimation
PPTX
When Models Meet Data: From ancient science to todays Artificial Intelligence...
PPTX
Selected Topics in CS-CHapter-twooo.pptx
Accurate Campaign Targeting Using Classification Algorithms
Py data19 final
Week14_Business Simulation Modeling MSBA.pptx
Machine learning in credit risk modeling : a James white paper
MIS637_Final_Project_Rahul_Bhatia
Detection of credit card fraud
Computational Finance Introductory Lecture
Statistical Learning and Model Selection (1).pptx
Logistic_regression_ML.pdf
Chapter 04
Decision Tree and Bayesian Classification
Core Excel Functions for Financial Modeling.pptx
Machine Learning Approach.pptx
Lecture 3.1_ Logistic Regression powerpoint
Machine-Learning-Overview a statistical approach
cas_washington_nov2010_web
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithm
Ch08 ci estimation
When Models Meet Data: From ancient science to todays Artificial Intelligence...
Selected Topics in CS-CHapter-twooo.pptx
Ad

More from eurosigdoc acm (20)

PPTX
Blockchain e o Futuro do Setor Financeiro
PPTX
No code – Caso Prático no App Inventor - BroTrip
PPTX
The oracle problem nos smart contracts
PPTX
Robotic process automation
PPTX
Robotic Process Automation: caso de estudo Delloite
PPT
Projeção do Crowdfunding em Portugal: a plataforma ppl
PPTX
Implementação de uma aplicação em Power Apps – Low Code
PPTX
Proteção de dados e redes sociais
PDF
CLOUD COMPUTING E SUSTENTABILIDADE EMPRESARIAL
PDF
CROWDFUNDING: IMPACTO DA GAMIFICAÇÃO NAS PLATAFORMAS DE CROWDFUNDING
PPTX
Low code: O futuro do desenvolvimento de aplicações
PPTX
Robotic Process Automation
PPTX
Crowdsourcing: DEFINIÇÕES E APLICAÇÕES NA ÁREA DA SAÚDE
PPTX
Business Intelligence e o Desporto
PPTX
Blockchain
PPTX
Blockchain: viável ou em luta com o meio ambiente?
PPTX
Cloud Computing e a sua Implementação na Educação no Contexto de Pandemia COV...
PPTX
Viabilidade das NFT’s a Longo Prazo
PPTX
Outsystems e o Universo do Low-Code
Blockchain e o Futuro do Setor Financeiro
No code – Caso Prático no App Inventor - BroTrip
The oracle problem nos smart contracts
Robotic process automation
Robotic Process Automation: caso de estudo Delloite
Projeção do Crowdfunding em Portugal: a plataforma ppl
Implementação de uma aplicação em Power Apps – Low Code
Proteção de dados e redes sociais
CLOUD COMPUTING E SUSTENTABILIDADE EMPRESARIAL
CROWDFUNDING: IMPACTO DA GAMIFICAÇÃO NAS PLATAFORMAS DE CROWDFUNDING
Low code: O futuro do desenvolvimento de aplicações
Robotic Process Automation
Crowdsourcing: DEFINIÇÕES E APLICAÇÕES NA ÁREA DA SAÚDE
Business Intelligence e o Desporto
Blockchain
Blockchain: viável ou em luta com o meio ambiente?
Cloud Computing e a sua Implementação na Educação no Contexto de Pandemia COV...
Viabilidade das NFT’s a Longo Prazo
Outsystems e o Universo do Low-Code

Recently uploaded (20)

PPTX
Who’s winning the race to be the world’s first trillionaire.pptx
PDF
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
PDF
Predicting Customer Bankruptcy Using Machine Learning Algorithm research pape...
PPTX
How best to drive Metrics, Ratios, and Key Performance Indicators
PDF
Mathematical Economics 23lec03slides.pdf
PPTX
Session 14-16. Capital Structure Theories.pptx
PDF
Bladex Earnings Call Presentation 2Q2025
PDF
Q2 2025 :Lundin Gold Conference Call Presentation_Final.pdf
PPTX
Session 11-13. Working Capital Management and Cash Budget.pptx
PPTX
4.5.1 Financial Governance_Appropriation & Finance.pptx
PPTX
The discussion on the Economic in transportation .pptx
PDF
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
PDF
Copia de Minimal 3D Technology Consulting Presentation.pdf
PPTX
Globalization-of-Religion. Contemporary World
PPTX
Unilever_Financial_Analysis_Presentation.pptx
PDF
illuminati Uganda brotherhood agent in Kampala call 0756664682,0782561496
PDF
Understanding University Research Expenditures (1)_compressed.pdf
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
Corporate Finance Fundamentals - Course Presentation.pdf
Who’s winning the race to be the world’s first trillionaire.pptx
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
Predicting Customer Bankruptcy Using Machine Learning Algorithm research pape...
How best to drive Metrics, Ratios, and Key Performance Indicators
Mathematical Economics 23lec03slides.pdf
Session 14-16. Capital Structure Theories.pptx
Bladex Earnings Call Presentation 2Q2025
Q2 2025 :Lundin Gold Conference Call Presentation_Final.pdf
Session 11-13. Working Capital Management and Cash Budget.pptx
4.5.1 Financial Governance_Appropriation & Finance.pptx
The discussion on the Economic in transportation .pptx
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
Copia de Minimal 3D Technology Consulting Presentation.pdf
Globalization-of-Religion. Contemporary World
Unilever_Financial_Analysis_Presentation.pptx
illuminati Uganda brotherhood agent in Kampala call 0756664682,0782561496
Understanding University Research Expenditures (1)_compressed.pdf
ECONOMICS AND ENTREPRENEURS LESSONSS AND
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Corporate Finance Fundamentals - Course Presentation.pdf

Machine Learning Application: Credit Scoring

  • 1. Machine Learning Application: Credit Scoring Programming Techniques Professor Carlos Costa Master in Mathematical Finance Federico Innocenti 53251 Miguel Albergaria 48547 Claudio Napoli 53358 Iacopo Fiorentino 53315 Lisbon, December 11th 2019
  • 2. Context ► The data is collected from Thomson Reuters from firms included in the main stock indexes. ► The goal is to set a score of a company to decide whether to give a loan or not to that firm based on a client’s probability of default. ► For that we compute many ratios and at the end we want to “differentiate winners from losers”.
  • 3. Data preparation ► Importing data, checking the type of data and clearing missing values; ► Correlation matrix; ► See how the data is distributed through graphs; ► Rearranging the data clearing very low values and very high values, i.e., outliers. ► After all of that, we did the correlation matrix and graphs again to compare them and to have a better view of our results.
  • 4. Modelling data ► Our data doesn´t have a probability of default, so we need to create one. ► In order to compute the machine learning approach we use: ► Supervised learning: logistic regression and random forest ► Unsupervised learning: clustering K-mean ► We decided to use a financial scorecard, in order to give a certain score to different ratios.
  • 5. Setting the score ► Relevant ratios: current ratio, debt ratio, equity to asset ratio, debt to equity ratio, return on asset, return on equity, long term coverage ratio and asset turnover ratio.
  • 6. ► The company’s goal is to obtain the highest score that we compute in the way showed before. An example of the code is shown here: ► The final score is set by adding all of the “ratios’ scores”.
  • 7. Evaluation ► For the evaluation of our model we compute a confusion matrix in order to see the result and have an easier first parametre to compare the three models. ► After setting the score we binarize the score being 1 the lowest probability of default and 0 the highest. We chose as threshold a score of 500 points and then we proceed to the evaluation.
  • 8. Logistic Regression ► We leave the set of the logistic regression in default mode with a test size of 0.7. ► The final result is good with a AUC of 0.75, which means that it is a good model distinguishing the given classes. ► But there is a problem! ► The model has a type 2 error. In other words, it predicts 1 but actually is 0. ► So the F1 score (measure of accuracy) is 0.68.
  • 9. Random Forest ► In order to optimize the process we put the “number of jobs” 150 and the “number of estimators” is 1 since it is a binomial classification.
  • 10. ► This model achieved a really high AUC: 0.87 and a good F1-Score. ► High precision and high recall means low probability of error type I and II.
  • 11. K-Mean ► We increased the number of iterations to 400 times in order to optimize this model and to try to get more stable results. ► The main problem with the K-mean clustering model is that it suffers from a low precision predicting the default cases (type I error). ► On the other hand it has an acceptable F1-Score and a AUC of 0.80.
  • 12. Conclusions ► The standardization of the ratio and the cleaning of the data gets the models to have a high AUC on the three models. ► The better model is the Random Forest, getting a better AUC result. ► We confirm that machine learning algorithms are really powerful in analysing data and it can be helpful to solve this specific problem.