Loan Prediction System Using Machine Learning Algorithms Project Report

1
LOAN PREDICTION SYSTEM USING
MACHINE LEARNING
A Report for the Evaluation of Project
Submitted by
SOUMA MAITI (27500120016)
TRIASHA SAMANTA (27500120005)
In partial fulfillment for the award of the degree
Of
BACHELOR OF TECHNOLOGY (B. TECH) IN
COMPUTER SCIENCE AND ENGINEERING
MAULANA ABUL KALAM AZAD UNIVERSITY OF
TECHNOLOGY
Under the Supervision of Dr. Dhrubajyoti Ghosh
DECEMBER-2023

2
OMDAYAL GROUP OF INSTITUTION
SCHOOL OF COMPUTING AND SCIENCE AND
ENGINEERING
BONAFIDE CERTIFICATE
Certified that this project report “LOAN
PREDICTION SYSTEM” is the bonafide work of
“SOUMA MAITI (27500120016)” & “TRIASHA
SAMANTA(27500120005)” who carried out the
project work under my supervision.
Dipankar Hazra
Teacher in charge
Computing Science &
Engineering Department
OMDAYAL GROUP OF
INSTITUTION.
Dr. Dhrubajyoti Ghosh
Assistant Professor
Computing Science &
Engineering Department.
OMDAYAL GROUP OF
INSTITUTION

3
ACKNOWLEDGEMENT
I am pleased to acknowledge my sincere thanks to Board
of Management of OMDAYAL GROUP OF INSTITUTION for
their kind encouragement in doing this project and for
completing it successfully. I am grateful to them.
I convey thanks to Dipankar Hazra, Head of the
Department, Department of Computer Science
Engineering for providing us the necessary support and
details at the right time during the progressive reviews.
I would like to express my sincere and deep sense of
gratitude to my Project Guide Dr Dhrubajyoti Ghosh,
Assistant Professor for her valuable guidance,
suggestions, and constant encouragement paved way for
the successful completion of my project.
I wish to express our thanks to all Teaching and Non-
teaching staff members of the Department of COMPUTER
SCIENCE AND ENGINEERING who were helpful in many
ways for the completion of the project.

4
LOAN PREDICTION
SYSTEM USING
MACHINE
LEARNING

5
CHAPTER
NO
TITLE PAGE
NO
1. Abstract of the Project 6
2. Literature Survey 7-10
3. Introduction: Machine Learning 11-12
3.1 How Machine Learning Works 12-13
3.2 Terminologies of Machine Learning 13
3.3 Machine Learning Types 14-16
4. Various Machine Learning Algorithm 17
4.1 Logistic Regression 16-18
4.2 Support Vector Classifier 18-20
4.3 Random Forest Algorithm 20
4.4 Naive Bayes 21-22
4.5 Decision Tree Classification 22-23
4.6 Gradient Boosting Algorithm 24-25
5. Implementation Of Model 26
5.1 Implementation Of Model: Existing System 26
5.2 Implementation Of Model: Proposed System 26-28
6. Requirement: Hardware & Software 29
6.1 Various Python Libraries Used 30
7. Architecture Design 31
7.1 Sequence Diagram & Use Case Diagram 32-33
7.2 Activity Diagram & Collaboration Diagram 34
8. Methodology 35-48
9. Source Code 49-56
10. Summary & Conclusion 57
11. References 58
TABLE OF CONTENTS

6
1) ABSTRACT OF THE PROJECT
Technology has boosted the existence of human kind the quality
of life they live. Every day we are planning to create something
new and different. We have a solution for every other problem
we have machines to support our lives and make us somewhat
complete in the banking sector candidate gets proofs/ backup
before approval of the loan amount. The application approved or
not approved depends upon the historical data of the candidate
by the system. Every day lots of people applying for the loan in
the banking sector but Bank would have limited funds. In this
case, the right prediction would be very beneficial using some
classes-function algorithm. An example the logistic regression,
random forest classifier, support vector machine classifier, etc.
A Bank's profit and loss depend on the amount of the loans that
is whether the Client or customer is paying back the loan.
Recovery of loans is the most important for the banking sector.
The improvement process plays an important role in the banking
sector. The historical data of candidates was used to build a
machine learning model using different classification algorithms.
The main objective of this paper is to predict whether a new
applicant granted the loan or not using machine learning models
trained on the historical data.

7
2) LITERATURE SURVEY
A literature review is a body of text that aims to review the critical
points of current knowledge on and/or methodological approaches to
a particular topic. It is secondary sources and discuss published
information in a particular subject area and sometimes information
in a particular subject area within a certain time period. Its ultimate
goal is to bring the reader up to date with current literature on a topic
and forms the basis for another goal, such as future research that
may be needed in the area and precedes a research proposal and may
be just a simple summary of sources. Usually, it has an
organizational pattern and combines both summary and synthesis. A
summary is a recap of important information about the source, but a
synthesis is a reorganization, reshuffling of information. It might
give a new interpretation of old material or combine new with old
interpretations or it might trace the intellectual progression of the
field, including major debates. Depending on the situation, the
literature review may evaluate the sources and advise the reader on
the most pertinent or relevant of them.
Review of Literature Survey:
1) Title: A benchmark of machine learning approaches for
credit score prediction.
Author: Vincenzo Moscato, Antonio Picariello, Giancarl Sperlí
Year : 2021
Credit risk assessment plays a key role for correctly supporting
financial institutes in defining their bank policies and commercial
strategies. Over the last decade, the emerging of social lending
platforms has disrupted traditional services for credit risk assessment.
Through these platforms, lenders and borrowers can easily interact
among them without any involvement of financial institutes. In
particular, they support borrowers in the fundraising process,
enabling the participation of any number and size of lenders.
However, the lack of lenders’ experience and missing or uncertain
information about 4 borrower’s credit history can increase risks in

8
social lending platforms, requiring an accurate credit risk scoring. To
overcome such issues, the credit risk assessment problem of
financial operations is usually modeled as a binary problem on the
basis of debt’s repayment and proper machine learning techniques
can be consequently exploited. In this paper, we propose a bench
marking study of some of the most used credit risk scoring models to
predict if a loan will be repaid in a P2P platform. We deal with a
class imbalance problem and leverage several classifiers among the
most used in the literature, which are based on different sampling
techniques. A real social lending platform (Lending Club) data-set,
composed by 877,956 samples, has been used to perform the
experimental analysis considering different evaluation metrics (i.e.
AUC, Sensitivity, Specificity), also comparing the obtained
outcomes with respect to the state-of-the-art approaches. Finally, the
three best approaches have also been evaluated in terms of their
explain-ability by means of different explainable Artificial
Intelligence (XAI) tools.
2) Title : An Approach for Prediction of Loan approval using
Machine Learning Algorithm.
Author: Mohammad Ahmad Sheikh, Amit Kumar Goel, Tapas
Kumar
Year : 2020
In our banking system, banks have many products to sell but main
source of income of any banks is on its credit line. So they can earn
from interest of those loans which they credits.A bank’s profit or a
loss depends to a large extent on loans i.e. whether the customers are
paying back the loan or defaulting. By predicting the loan defaulters,
the bank can reduce its Non Performing Assets. This makes the
study of this phenomenon very important. Previous research in this
era has shown that there are so many methods to study the problem
of controlling loan default. But as the right predictions are very
important for the maximization of profits, it is essential to study the
nature of the different methods and their comparison. A very
important approach in predictive analytic is used to study the
problem of predicting loan defaulters: The Logistic regression model.
The data is collected from the Kaggle for studying and prediction.

9
Logistic Regression models have been performed and the different
measures of performances are computed. The models are compared
on the basis of the performance measures such as sensitivity and 5
specificity. The final results have shown that the model produce
different results.Model is marginally better because it includes
variables (personal attributes of customer like age, purpose, credit
history, credit amount, credit duration, etc.) other than checking
account information (which shows wealth of a customer) that should
be taken into account to calculate the probability of default on loan
correctly. Therefore, by using a logistic regression approach, the
right customers to be targeted for granting loan can be easily
detected by evaluating their likelihood of default on loan. The model
concludes that a bank should not only target the rich customers for
granting loan but it should assess the other attributes of a customer
as well which play a very important part in credit granting decisions
and predicting the loan defaulters.
3) Title : Predict Loan Approval in Banking System Machine
Learning Approach for Cooperative Banks Loan Approval.
Author: Amruta S. Aphale, Dr. Sandeep R. Shinde.
Year : 2020
In today’s world, taking loans from financial institutions has become
a very common phenomenon. Everyday a large number of people
make application for loans, for a variety of purposes. But all these
applicants are not reliable and everyone cannot be approved. Every
year, we read about a number of cases where people do not repay
bulk of the loan amount to the banks due to which they suffers huge
losses. The risk associated with making a decision on loan approval
is immense. So the idea of this project is to gather loan data from
multiple data sources and use various machine learning algorithms
on this data to extract important information. This model can be used
by the organizations in making the right decision to approve or reject
the loan request of the customers. In this paper, we examine a real
bank credit data and conduct several machine learning algorithms on
the data for that determine credit worthiness of customers in order to
formulate bank risk automated system.

10
4) Title : Loan Approval Prediction Using Machine Learning
Author: Yash Divate, Prashant Rana, Pratik Chavan
Year : 2021
With the upgrade in the financial area loads of individuals are
applying for bank advances however the bank has its restricted
resources which it needs to allow to restricted individuals just, so
discovering to whom the credit can be conceded which will be a
more secure choice for the bank is a commonplace interaction. So in
this task we attempt to decrease this danger factor behind choosing
the protected individual in order to save bunches of bank endeavors
and resources. This is finished by mining the Data of the past
records of individuals to whom the advance was conceded
previously and based on these records/encounters the machine was
prepared utilizing the AI model which give the most precise
outcome. The principle objective of this paper is to anticipate
whether relegating the advance to specific individual will be
protected or not. This paper is separated into four areas (i)Data
Collection (ii) Comparison of AI models on gathered information (iii)
Training of framework on most encouraging model (iv) Testing.

11
3) INTRODUCTION
The immense increase in capitalism, the fast-paced development and instantaneous
changes in the lifestyle has us in awe. EMI, loans at nominal rate, housing loans, vehicle
loans, these are some of the few words which have skyrocketed from the past few years.
The needs, wants and demands have never been increased this before. People gets loan
from banks; however, it may be baffling for the bankers to judge who can pay back the
loan nevertheless the bank shouldn’t be in loss. Banks earn most of their profits through
the loan sanctioning. Generally, banks pass loan after completing the numerous
verification processes despite all these, it is still not confirmed that the borrower will pay
back the loan or not. To get over the dilemma, I have built up a prediction model which
says if the loan has been assigned in the safe hands or not. Government agencies like keep
under surveillance why one person got a loan and the other person could not. In Machine
Learning techniques which include classification and prediction can be applied to conquer
this to a brilliant extent. Machine learning has eased today’s world by developing these
prediction models. Here we will be using the fine techniques of machine learning –
Decision tree algorithm to build this prediction model for loan assessment. It is as so
because decision tree gives accuracy in the prediction and is often used in the industry for
these models.
Machine Learning :
Machine learning (ML) is a type of artificial intelligence (AI) focused on building
computer systems that learn from data. The broad range of techniques ML encompasses
enables software applications to improve their performance over time.
Machine learning algorithm are trained to find relationships and patterns in data. They use
historical data as input to make predictions, classify information, cluster data points,
reduce dimensionality and even help generate new content, as demonstrated by new ML-
fueled applications such as Chat GPT, Dall-E 2 and GitHub Copilot.
Machine learning is widely applicable across many industries. Recommendation System ,
for example, are used by e-commerce, social media and news organizations to suggest
content based on a customer's past behavior. Machine learning algorithms and machine
vision are a critical component of self-driving cars, helping them navigate the roads safely.
In healthcare, machine learning is used to diagnose and suggest treatment plans. Other

12
common ML use cases include fraud detection, spam filtering, malware threat detection,
predictive maintenance and business process automation.
While machine learning is a powerful tool for solving problems, improving business
operations and automating tasks, it's also a complex and challenging technology, requiring
deep expertise and significant resources. Choosing the right algorithm for a task calls for a
strong grasp of mathematics and statistics. Training machine learning algorithms often
involves large amounts of good quality data to produce accurate results. The results
themselves can be difficult to understand -- particularly the outcomes produced by
complex algorithms, such as the deep learning neural network patterned after the human
brain. And ML Models can be costly to run and tune.
Still, most organizations either directly or indirectly through ML-infused products are
embracing machine learning. According to the "2023 AI and Machine Learning Research
Report" from Rackspace Technology, 72% of companies surveyed said that AI and
machine learning are part of their IT and business strategies, and 69% described AI/ML as
the most important technology. Companies that have adopted it reported using it to
improve existing processes (67%), predict business performance and industry trends (60%)
and reduce risk (53%).
Tech Target's guide to machine learning is a primer on this important field of computer
science, further explaining what machine learning is, how to do it and how it is applied in
business. You'll find information on the various types of machine learning algorithms, the
challenges and best practices associated with developing and destroying ML Models, and
what the future holds for machine learning. Throughout the guide, there are hyperlinks to
related articles that cover the topics in greater depth.
3.1) How Machine Learning works:
Machine learning uses two types of techniques: supervised learning, which trains a model
on known input and output data so that it can predict future outputs, and unsupervised
learning, which finds hidden patterns or intrinsic structures in input data.
The Machine Learning process starts with inputting training data into the selected
algorithm. Training data being known or unknown data to develop the final Machine
Learning algorithm. The type of training data input does impact the algorithm, and that
concept will be covered further momentarily.

13
3.2) Terminologies of Machine Learning :
 Model : A model is a specific representation learned from data by applying some
machine learning algorithm. A model is also called hypothesis.
 Feature : A feature is an individual measurable property of our data. A set of
numeric features can be conveniently described by a feature vector. Feature
vectors are fed as input to the model. For example, in order to predict a fruit,
there may be features like color, smell, taste, etc.
 Target(Label): A target variable or label is the value to be predicted by our
model. For the fruit example discussed in the features section, the label with each
set of input would be the name of the fruit like apple, orange, banana, etc.
 Training: The idea is to give a set of inputs(features) and it’s expected
outputs(labels), so after training, we will have a model (hypothesis) that will then
map new data to one of the categories trained on.
 Prediction: Once our model is ready, it can be fed a set of inputs to which it will
provide a predicted output(label).
Fig 1- How machine learning works

14
3.3) Machine Learning Types:
Learning is, of course, a very wide domain. Consequently, the field of machine learning
has branched into several sub-fields dealing with
different types of learning tasks. We give a rough taxonomy of learning paradigms, aiming
to provide some perspective of where the content sits within the wide field of machine
learning.
Terms frequently used are:
 Labeled data : Data consisting of a set of training examples, where each example is a
pair consisting of an input and a desired output value (also called the supervisory signal,
labels, etc)
 Classification : The goal is to predict discrete values, e.g. {1,0}, {True, False},
{spam, not spam}.
 Regression : The goal is to predict continuous values, e.g. home prices.
There some variations of how to define the types of Machine Learning Algorithms but
commonly they can be divided into categories according to their purpose and the main
categories are the following:
 Supervised learning
 Unsupervised Learning
 Semi-supervised Learning
 Reinforcement Learning
3.3.1) Supervised learning:
Learned to perform that task. Supervised learning algorithms include classification and
regression. Classification algorithms are used when the outputs are restricted to a limited
set of values, and regression algorithms are used when the outputs may have any numerical
value within a range. Similarity learning is an area of supervised machine learning closely
related to regression and classification, but the goal is to learn from examples using a
similarity function that measures how similar or related two objects are. It has applications
in ranking, recommendation systems, visual identity tracking, face verification, and
speaker verification. 16 In the case of semi-supervised learning algorithms, some of the
training examples are missing training labels, but they can nevertheless be used to improve

15
the quality of a model. In weakly supervised learning, the training labels are noisy, limited,
or imprecise; however, these labels are often cheaper to obtain, resulting in larger effective
training sets.
List of Common Algorithms:
• Nearest Neighbour
• Naive Bayes
• Decision Trees
• Linear Regression
• Support Vector Machines (SVM)
• Neural Networks
3.3.2) Unsupervised learning:
Unsupervised learning algorithms take a set of data that contains only inputs, and find
structure in the data, like grouping or clustering of data points. The algorithms, therefore,
learn from test data that has not been labeled, classified or categorized. Instead of
responding to feedback, unsupervised learning algorithms identify commonalities in the
data and react based on the presence or absence of such commonalities in each new piece
of data. A central application of unsupervised learning is in the field of density estimation
in statistics, though unsupervised learning encompasses other domains involving
summarizing and explaining data features. Cluster analysis is the assignment of a set of
observations into subsets (called clusters) so that observations within the same cluster are
similar according to one or more pre designated criteria, while observations drawn from
different clusters are dissimilar. Different clustering techniques make different
assumptions on the structure of the data, often defined by some similarity metric and
evaluated, for example, by internal compactness, or the similarity between members of the
same cluster, and separation, the difference between clusters. Other methods are based on
estimated density and graph connectivity.
List of Common Algorithms: • K-means clustering ,Association Rules.

16
3.3.3) Semi-supervised learning:
Semi-supervised learning falls between unsupervised learning (without any labeled
training data) and supervised learning (with completely labeled training data). Many
machine-learning researchers have found that unlabeled data, when used in conjunction
with a small amount of labeled data, can produce a considerable improvement in learning
accuracy.
3.3.4) Reinforcement Learning:
Reinforcement Learning is a type of Machine Learning, and thereby also a branch of
Artificial Intelligence. It allows machines and software agents to automatically determine
the ideal behaviour within a specific context, in order to maximize its performance. Simple
reward feedback is required for the agent to learn its behaviour; this is known as the
reinforcement signal. There are many different algorithms that tackle this issue. As a
matter of fact, Reinforcement Learning is defined by a specific type of problem, and all its
solutions are classed as Reinforcement Learning algorithms. In the problem, an agent is
supposed decide the best action to select based on his current state. When this step is
repeated, the problem is known as a Markov Decision Process.
List of Common Algorithms:
• Q-Learning • Temporal Difference (TD) • Deep Adversarial Networks
Use cases: Some applications of the reinforcement learning algorithms are computer
played board games (Chess, Go), robotic hands, and self-driving cars.

17
4) Various Machine Learning Algorithms Widely Used :
4.1 ) Logistic regression:
Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent variables. Logistic regression predicts
the output of a categorical dependent variable. Therefore the outcome must be a
categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. But
instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie
between 0 and 1. Logistic regression is used for solving the classification problems
It uses a logistic function called a sigmoid function to map predictions and their
probabilities. The sigmoid function refers to an S- shaped curve that converts any real
value to a range between 0 and 1.
The logit function is mathematically represented as
Fig 2- logistic Regression

18
4.2) Support Vector Classifier:
A support vector machine (SVM) is a type of supervised machine learning algorithm used
in machine learning to solve classification and regression tasks; SVMs are particularly
good at solving binary classification problems, which require classifying the elements of a
data set into two groups.
The aim of a support vector machine algorithm is to find the best possible line, or decision
boundary, that separates the data points of different data classes. This boundary is called
a hyperplane when working in high-dimensional feature spaces. The idea is to maximize
the margin, which is the distance between the hyperplane and the closest data points of
each category, thus making it easy to distinguish data classes.
SVMs are useful for analyzing complex data that can't be separated by a simple straight
line. Called nonlinear SMVs, they do this by using a mathematical trick that transforms
data into higher-dimensional space, where it is easier to find a boundary.
How do support vector machines work?
The key idea behind SVMs is to transform the input data into a higher-dimensional feature
space.
This transformation makes it easier to find a linear separation or to more effectively
classify the data set.
To do this, SVMs use a kernel function. Instead of explicitly calculating the coordinates of
the transformed space, the kernel function enables the SVM to implicitly compute the dot
products between the transformed feature vectors and avoid handling expensive,
unnecessary computations for extreme cases.
SVMs can handle both linearly separable and non-linearly separable data. They do this by
using different types of kernel functions, such as the linear kernel, polynomial kernel or
radial basis function (RBF) kernel. These kernels enable SVMs to effectively capture
complex relationships and patterns in the data.
During the training phase, SVMs use a mathematical formulation to find the optimal
hyperplane in a higher-dimensional space, often called the kernel space. This hyperplane is
crucial because it maximizes the margin between data points of different classes, while
minimizing the classification errors.

19
The kernel function plays a critical role in SVMs, as it makes it possible to map the data
from the original feature space to the kernel space. The choice of kernel function can have
a significant impact on the performance of the SVM algorithm; choosing the best kernel
function for a particular problem depends on the characteristics of the data.
Some of the most popular kernel functions for SVMs are the following:
 Linear Kernel : This is the simplest kernel function, and it maps the data to a higher-
dimensional space, where the data is linearly separable.
 Polynomial Kernel: This kernel function is more powerful than the linear kernel, and
it can be used to map the data to a higher-dimensional space, where the data is non-
linearly separable.
 RBF Kernel: This is the most popular kernel function for SVMs, and it is effective for
a wide range of classification problems.
 Sigmoid Kernel: This kernel function is similar to the RBF kernel, but it has a
different shape that can be useful for some classification problems.
The choice of kernel function for an SVM algorithm is a trade-off between accuracy and
complexity. The more powerful kernel functions, such as the RBF kernel, can achieve
higher accuracy than the simpler kernel functions, but they also require more data and
Fig 3- Support Vector Classifier

20
computation time to train the SVM algorithm. But this is becoming less of an issue due to
technological advances.
Once trained, SVMs can classify new, unseen data points by determining which side of the
decision boundary they fall on. The output of the SVM is the class label associated with
the side of the decision boundary.
4.3) Random Forest Algorithm:
Random Forest is a popular machine learning algorithm that belongs to the supervised
learning technique. It can be used for both Classification and Regression problems in ML.
It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset." Instead of relying on one decision tree, the random
forest takes the prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem
of over fitting.
Fig 4 - Random Forest Algorithm

21
4.4) Naive Bayes Algorithm:
It is a classification technique based on Bayes’ Theorem with an independence assumption
among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature.
The Naive Bayes classifier is a popular supervised machine learning algorithm used for
classification tasks such as text classification. It belongs to the family of generative
learning algorithms, which means that it models the distribution of inputs for a given class
or category. This approach is based on the assumption that the features of the input data are
conditionally independent given the class, allowing the algorithm to make predictions
quickly and accurately. In statistics, naive Bayes classifiers are considered as simple
probabilistic classifiers that apply Bayes’ theorem. This theorem is based on the
probability of a hypothesis, given the data and some prior knowledge. The naive Bayes
classifier assumes that all features in the input data are independent of each other,
which is often not true in real-world scenarios. However, despite this simplifying
assumption, the naive Bayes classifier is widely used because of its efficiency and good
performance in many real-world applications.
Moreover, it is worth noting that naive Bayes classifiers are among the simplest Bayesian
network models, yet they can achieve high accuracy levels when coupled with kernel
density estimation. This technique involves using a kernel function to estimate the
probability density function of the input data, allowing the classifier to improve its
performance in complex scenarios where the data distribution is not well-defined. As a
result, the naive Bayes classifier is a powerful tool in machine learning, particularly in text
classification, spam filtering, and sentiment analysis, among others.
For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches
in diameter. Even if these features depend on each other or upon the existence of the other
features, all of these properties independently contribute to the probability that this fruit is
an apple and that is why it is known as ‘Naive’.
An NB model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification
methods.
Bayes theorem provides a way of computing posterior probability P(c|x) from P(c), P(x)
and P(x|c). Look at the equation below:

22
Above,
 P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of the predictor given class.
 P(x) is the prior probability of the predictor
4.5) Decision Tree Classification Algorithm:
Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a
tree-structured classifier, where internal nodes represent the features of a datasets, branches
represent the decision rules and each leaf node represents the outcome. In a Decision tree,
there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to
make any decision and have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branch.
Fig 5- Equation of Naive Bayes

23
Decision Tree Terminologies:
 Root Node: Root node is from where the decision tree starts. It represents the entire
datasets, which further gets divided into two or more homogeneous sets
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodes according to the given conditions
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the tree.
 Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.
Example: Suppose there is a candidate who has a job offer and wants to decide whether
he should accept the offer or Not. So, to solve this problem, the decision tree starts with the
root node (Salary attribute by ASM). The root node splits further into the next decision
node (distance from the office) and one leaf node based on the corresponding labels. The
next decision node further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
Fig 6- Decision Tree

24
4.6) Gradient Boosting Algorithm:
Gradient Boosting is a powerful boosting algorithm that combines several weak learners
into strong learners, in which each new model is trained to minimize the loss function such
as mean squared error or cross-entropy of the previous model using gradient descent. In
each iteration, the algorithm computes the gradient of the loss function with respect to the
predictions of the current ensemble and then trains a new weak model to minimize this
gradient. The predictions of the new model are then added to the ensemble, and the process
is repeated until a stopping criterion is met.
In contrast to Ada Boost, the weights of the training instances are not tweaked, instead,
each predictor is trained using the residual errors of the predecessor as labels. There is a
technique called the Gradient Boosted Trees whose base learner is CART (Classification
and Regression Trees). The below diagram explains how gradient-boosted trees are trained
for regression problems.
Fig 7- Gradient Boosting Classifier

25
The ensemble consists of M trees. Tree1 is trained using the feature matrix X and the
labels y. The predictions labeled y1(hat) are used to determine the training set residual
errors r1. Tree2 is then trained using the feature matrix X and the residual errors r1 of
Tree1 as labels. The predicted results r1(hat) are then used to determine the residual r2.
The process is repeated until all the M trees forming the ensemble are trained. There is an
important parameter used in this technique known as Shrinkage. Shrinkage refers to the
fact that the prediction of each tree in the ensemble is shrunk after it is multiplied by the
learning rate (eta) which ranges between 0 to 1. There is a trade-off between eta and the
number of estimators, decreasing learning rate needs to be compensated with increasing
estimators in order to reach certain model performance. Since all trees are trained now,
predictions can be made. Each tree predicts a label and the final prediction is given by the
formula
y(pred) = y1 + (eta * r1) + (eta * r2) + ....... + (eta * rN)

26
5) IMPLEMENTATION OF MODEL
5.1) EXISTING SYSTEM :
Banks need to analyze for the person who applies for the loan will repay the loan or not.
Sometime it happens that customer has provided partial data to the bank, in this case
person may get the loan without proper verification and bank may end up with loss.
Bankers cannot analyze the huge amounts of data manually, it may become a big headache
to check whether a person will repay its loan or not. It is very much necessary to know the
person getting loan is going in safe hand or not. So, it is pretty much important to have a
automated model which should predict the customer getting the loan will repay the loan or
not.
Disadvantage : To apply the loan we need to go to bank to apply it
The model will be able to predict whether a loan applicant will default on a given
loan.The system architecture is as below.
5.2) PROPOSED SYSTEM
The proposed model focuses on predicting the credibility of customers for loan repayment
by analyzing their details. The input to the model is the customer details collected. On the
output from the classifier, decision on whether to approve or reject the customer request
can be made. Using different data analytic tools loan prediction and there severity can be
forecast ed. In this process it is required to train the data using different algorithms and
then compare user data with trained data to predict the nature of loan. The training data set
is now supplied to machine learning model; on the basis of this data set the model is
trained. Every new applicant details filled at the time of application form acts as a test data
set. After the operation of testing, 8 model predict whether the new applicant is a fit case
for approval of the loan or not based upon the inference it conclude on the basis of the
training data sets. By providing real time input on the web app. In our project, Logistic
Regression gives high accuracy level compared with other algorithms. Finally, we are
predicting the result via data visualization and display the predicted output using web app
using flask.

27
The proposed model focuses on predicting the credibility of customers for loan repayment
by analyzing their details. The input to the model is the customer details collected. On the
output from the classifier, decision on whether to approve or reject the customer request can
be made. Using different data analytic tools loan prediction and there severity can be
forecast ed. In this process it is required to train the data using different algorithms and then
compare user data with trained data to predict the nature of loan. The training data set is
now supplied to machine learning model; on the basis of this data set the model is trained.
Every new applicant details filled at the time of application form acts as a test data set. After
the operation of testing, 8 model predict whether the new applicant is a fit case for approval
of the loan or not based upon the inference it conclude on the basis of the training data sets.
By providing real time input on the web app. In our project, Logistic Regression gives high
accuracy level compared with other algorithms. Finally, we are predicting the result via data
visualization and display the predicted output using web app using flask.
Advantage: No need to go to bank We can do the transaction from house, we can
consume the time doing from home.
Random
Forest
Loan
Train model Naive Bayes
Predict
best model Default 1
Logistic
Regression
0
Fig 8- Proposed Model

28
 Step 1: The Loan application goes through the trained model where the three
classification algorithms are applied.
 Step 2: The machine learning with the best performance in accuracy is selected.
 Step 3 : The machine learning algorithm is applied to the loan application.
 Step 4: The machine learning algorithm determines the probability of default. 1, being
true and 0 being false

29
6) REQUIREMENT SPECIFICATIONS
Prediction of modernized loan approval system based on machine learning approach is a
loan approval system from where we can know whether the loan will pass or not. In this
system, we take some data from the user like his monthly income, marriage status, loan
amount, loan duration, etc. Then the bank will decide according to its parameters whether
the client will get the loan or not. So there is a classification system, in this system, a
training set is employed to make the model and the classifier may classify the data items
into their appropriate class. A test datasets is created that trains the data and gives the
appropriate result that, is the client potential and can repay the loan. Prediction of a
modernized loan approval system is incredibly helpful for banks and also the clients. This
system checks the candidate on his priority basis. Customer can submit his application
directly to the bank so the bank will do the whole process, no third party or stockholder will
interfere in it. And finally, the bank will decide that the candidate is deserving or not on its
priority basis. The only object of this research paper is that the deserving candidate gets
straight forward and quick results.
HARDWARE AND SOFTWARE SPECIFICATION
HARDWARE REQUIREMENTS
● Hard disk : 500 GB and above. ● Processor : i3 and above.
● Ram : 4GB and above.
SOFTWARE REQUIREMENTS
● Operating System : Windows 10/11 ● Tools :Jupiter Note Book IDE
●Programming Language: Python 3 ● Streamlit App
● Visual Studio Code Editor

30
6.1) Python Libraries Used:
The machine learning models are implemented using python version 3.7 on
a Jupyter notebook with the listed libraries: numpy, pandas , matplotlib,
seaborn , and sklearn.
 Jupyter notebooks are a web-based interface in which you can
write, visualize, and execute python code in cells. It is good for
exploratory analysis and enable to run individual code cells.
 Numpy is a Python library that may be used to work with multi-
dimensional arrays, linear algebra, the Fourier transform, and matrices.
 Pandas is a data manipulation and analysis package written in
Python.
 Matplotlib is a Python package that allows you to create
static,animated, and interactive visualizations.
 Seaborn is a matplotlib-based python data visualization package. It
has a high-level interface for creating visually appealing and instructive
statistics visuals.
 Sklearn is a Python toolkit that allows you to create machine
learning and statistical models including clustering, classification, and
regression.

31
7) ARCHITECTURE DESIGN
7.1) Architecture Diagram:
Datasets Prepossessing
User Input
Web App

32
7.2) Sequence Diagram:
A Sequence diagram is a kind of interaction diagram that shows how
processes operate with one another and in what order. It is a construct of
Message Sequence diagrams are sometimes called event diagrams, event
sceneries and timing diagram..
Fig 9- Sequence Diagram

33
7.3) Use Case Diagram:
Unified Modeling Language (UML) is a standardized general-purpose
modeling language in the field of software engineering. The standard is
managed and was created by the Object Management Group. UML
includes a set of graphic notation techniques to create visual models of
software intensive systems. This language is used to specify, visualize,
modify, construct and document the artifacts of an object oriented
software intensive system under development.
Fig 10 - Use Case Diagram

34
7.4) Activity Diagram:
Activity diagram is a graphical representation of workflows of
stepwise activities and actions with support for choice, iteration and
concurrency. An activity diagram shows the overallflow of control.
7.5) Collaboration Diagram:
DATA
COLLECTION
DATA
PREPROCESSING
MACHINE
LEARNING
ALGORITHM
LOAN
PREDICTION
WEB
APPLICATION
●OUTPUT
DATA DATA DATA
DATA
DATA
Fig 11 - Activity Diagram
Fig 12 - Collaboration Diagram

35
8) METHEDOLOGY
DATA PREPROCESSING:
Data preprocessing is a process of preparing the raw data and making it
suitable for a machine learning model. It is the first and crucial step
while creating a machine learning model.
Data preprocessing is required tasks for cleaning the data and making it
suitable for a machine learning model which also increases the accuracy
and efficiency of a machine learning model.
It involves below steps:
 Getting the dataset
 Importing libraries
 Importing datasets
 Finding Missing Data
 Encoding Categorical Data
 Splitting dataset into training and test set
 Feature scaling
Data-set:
Datasets is provided to Machine Learning models on the basis of the
facts this version is trained.
We have collected the dataset from a website called Kaggle.
There are a total of 614 rows and 13 columns in the dataset.
There are columns for Loan_Id, Gender, Married or Not, No. Of
Dependents, Education Background of the loan seeker, Employment
status of the loan seeker, Income of Applicant, Income of Co-
applicant,Loan Amount, Credit History of the applicant and the loan
status of the applicant.

36
Importing Python libraries:
In order to perform data prepossessing using Python, we need to import
some predefined Python libraries. These libraries are used to perform
some specific jobs. There are three specific libraries that we will use for
data prepossessing.
We have imported python libraries like Pandas, Numpy, Seaborn, Sci-kit
Learn, matplotlib for our work.
Importing the Data set:
Now We have imported the dataset which we will use as historical data to
train the model.
Fig-13 - Data Set

37
Understanding the Data:
First of all we use the data.describe() method to shows the important
information from the data-set.
It provides the count, mean, standard deviation
(std), min, quartile and max in its output.
Another method is info () , This method show us the information about
the data set

38
As we can see in the output.
 There are 614 entries
 There are total 13 features (0 to 12)
 There are three types of datatype dtypes: float64(4), int64(1), object(8)
 It's Memory usage that is, memory usage: 62.5+ KB
 Also, We can check how many missing values available in the Non-
Null Count column
Exploratory Data Analysis:
In this section, We learn about extra information about data and it's
characteristics.

39
Data Cleaning:
In this step of data cleaning by checking we have eliminated all the
missing values because they affect the accuracy of the model. We have
achieved this by either filling the the missing values with a mean or mode
function or by dropping all missing values.
First we have checked the number of null values in each column of the
dataset.
Now we have checked the percentage of of missing values in each
column of the dataset.

40
Now we will handle the missing data entries in the data set. We can see
the number of missing values in four columns in the data set-
Gender,Dependents,Loan Amount, Loan Term are less than 5% so we
will drop the rows with the missing values.
Rest two columns- Self_Employed and Credit History which have greater
than 5% missing values we will use mode function to fill up the null
values.

41
Handling the categorical columns:
Loan_Status feature numeric values, we will replace the columns with
numeric values.
There are some values in the dependent column as 3+. we will replace it
by numeric value 4.
Feature Scaling:
Feature Selection is the method of reducing the input variable to your
model by using only relevant data and getting rid of noise in data.
It is the process of automatically choosing relevant features for your
machine learning model based on the type of problem you are trying to
solve. We do this by including or excluding important features without
changing them. It helps in cutting down the noise in our data and
reducing the size of our input data.

42
Splitting The Datasets Into The Training Set And Test Set
& Applying K-Fold Cross Validation:
Now we have split the datasets into two sets for training and testing. We
will apply cross validation and will check the accuracy of the various
models we have used in this work.
Implementing various machine learning models:
We will implement all the five machine learning algorithm- Logistic
Regression, Support Vector Classifier, Random Forest Classifier,
Decision Tree Classifier and Gradient Boosting Classifier and check the
accuracy of all the algorithms with average cross validation score

43
Logistic Regression:
So the accuracy of this model is 0.8018018018018
Support Vector Classifier:
Decision Tree Classifier:

44
Random Forest Classifier:
Gradient Boosting Classifier:

45
HYPERPARAMETER TUNING:
Hyper parameters tuning is the process of determining the right
combination of hyper parameters that maximizes the model performance.
It works by running multiple trials in a single training process. Each trial
is a complete execution of your training application with values for your
chosen hyper parameters, set within the limits you specify. This process
once finished will give you the set of hyper parameter values that are best
suited for the model to give optimal results.
We have used Random Search CV for the tuning.

46
After hyper parameter tuning the accuracy of the best 3 models are-
So we get the Random Forest Classifier model gives the best accuracy of
80.67 so we have choose this model for this work.

47
Model Deployment:
Finally, we are done so far. The last step is to deploy our model in
production map.
So we need to export our model and bind with web application API.
Using pickle we can export our model and store in to rf_model.pkl file, so
we can early access this file and calculate customize prediction using
Web App API.
User Interface:
The user interface of the app is made on Streamlit App. Streamlit is a free
and open-source framework to rapidly build and share beautiful machine
learning and data science web apps.
It is a Python-based library specifically designed for machine learning
engineers.
We have loaded the rf_model.pkl file in the streamlit app. code

48
Predicting Results:
We wil give some input in the app and app will give us the output
whether the loan is approved or not.

49
9) Source Code:
import pandas as pd
data = pd.read_csv('loan_prediction.csv')
# Loan_ID : Unique Loan ID
# Gender : Male/ Female
# Married : Applicant married (Y/N)
# Dependents : Number of dependents
# Education : Applicant Education (Graduate/ Under Graduate)
# Self_Employed : Self employed (Y/N)
# ApplicantIncome : Applicant income
# CoapplicantIncome : Coapplicant income
# LoanAmount : Loan amount in thousands of dollars
# Loan_Amount_Term : Term of loan in months
# Credit_History : Credit history meets guidelines yes or no
# Property_Area : Urban/ Semi Urban/ Rural
# Loan_Status : Loan approved (Y/N) this is the target variable
1. Display Top 5 Rows of The Dataset
data.head()
2. Check Last 5 Rows of The Dataset
data.tail()
3. Find Shape of Our Dataset (Number of Rows And Number of
Columns)
data.shape
print("Number of Rows",data.shape[0])print("Number of
Columns",data.shape[1])
4. Get Information About Our Dataset Like Total Number Rows,
Total Number of Columns, Datatypes of Each Column And Memory
Requirement
data.info()
5. Check Null Values In The Dataset
data.isnull().sum()
data.isnull().sum()*100 / len(data)
6. Handling The missing Values
data = data.drop('Loan_ID',axis=1)

50
data.head(1)
columns = ['Gender','Dependents','LoanAmount','Loan_Amount_Term']
data = data.dropna(subset=columns)
data['Self_Employed'].mode()[0]
data['Self_Employed']=data['Self_Employed'].fillna(data['Self_Employe
d'].mode()[0])
data['Gender'].unique()
data['Self_Employed'].unique()
data['Credit_History'].mode()[0]
ata['Credit_History']=data['Credit_History'].fillna(data['Credit_History']
mode()[0])
7. Handling Categorical Columns
data.sample(5)
data['Dependents']=data['Dependents'].replace(to_replace="3+",value=
4')
data['Dependents'].unique()
data['Loan_Status'].unique()
data['Gender'] =
data['Gender'].map({'Male':1,'Female':0}).astype('int')
data['Married'] =
data['Married'].map({'Yes':1,'No':0}).astype('int')data['Education'] =
data['Education'].map({'Graduate':1,'Not
Graduate':0}).astype('int')data['Self_Employed'] =
data['Self_Employed'].map({'Yes':1,'No':0}).astype('int')data['Property_
Area'] =
data['Property_Area'].map({'Rural':0,'Semiurban':2,'Urban':1}).astype('i
nt')
data['Loan_Status'] = data['Loan_Status'].map({'Y':1,'N':0}).astype('int')
data.head()
8. Store Feature Matrix In X And Response (Target) In Vector y
X = data.drop('Loan_Status',axis=1)
y = data['Loan_Status']
9. Feature Scaling

51
data.head()
cols =
['ApplicantIncome','CoapplicantIncome','LoanAmount','Loan_Amount_
Term']
from sklearn.preprocessing import StandardScalerst =
StandardScaler()X[cols]=st.fit_transform(X[cols])X
10. Splitting The Dataset Into The Training Set And Test Set &
Applying K-Fold Cross Validation
from sklearn.model_selection import train_test_splitfrom
sklearn.model_selection import cross_val_scorefrom sklearn.metrics
import accuracy_scoreimport numpy as np
model_df={}def model_val(model,X,y):
X_train,X_test,y_train,y_test=train_test_split(X,y,
test_size=0.20,
random_state=42)
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
print(f"{model} accuracy is {accuracy_score(y_test,y_pred)}")
score = cross_val_score(model,X,y,cv=5)
print(f"{model} Avg cross val score is {np.mean(score)}")
model_df[model]=round(np.mean(score)*100,2)
model_df
11. Logistic Regression
from sklearn.linear_model import LogisticRegressionmodel =
LogisticRegression()model_val(model,X,y)
LogisticRegression() accuracy is 0.8018018018018018
LogisticRegression() Avg cross val score is 0.8047829647829647
12. SVC
from sklearn import svmmodel = svm.SVC()model_val(model,X,y)
SVC() accuracy is 0.7927927927927928
SVC() Avg cross val score is 0.7938902538902539
13. Decision Tree Classifier
From sklearn.tree import DecisionTreeClassifiermodel =
DecisionTreeClassifier()model_val(model,X,y)
DecisionTreeClassifier() accuracy is 0.7117117117117117

52
DecisionTreeClassifier() Avg cross val score is 0.7089434889434889
14. Random Forest Classifier
from sklearn.ensemble import RandomForestClassifiermodel
=RandomForestClassifier()model_val(model,X,y)
RandomForestClassifier() accuracy is 0.7567567567567568
RandomForestClassifier() Avg cross val score is 0.7776412776412777
15. Gradient Boosting Classifier
from sklearn.ensemble import GradientBoostingClassifiermodel
=GradientBoostingClassifier()model_val(model,X,y)
GradientBoostingClassifier() accuracy is 0.7927927927927928
GradientBoostingClassifier() Avg cross val score is 0.774004914004914
16. Hyperparameter Tuning
from sklearn.model_selection import RandomizedSearchCV
Logistic Regression
log_reg_grid={"C":np.logspace(-4,4,20),
"solver":['liblinear']}
rs_log_reg=RandomizedSearchCV(LogisticRegression(),
param_distributions=log_reg_grid,
n_iter=20,cv=5,verbose=True)
rs_log_reg.fit(X,y)
rs_log_reg.best_score_
rs_log_reg.best_params_
SVC
svc_grid = {'C':[0.25,0.50,0.75,1],"kernel":["linear"]}
rs_svc=RandomizedSearchCV(svm.SVC(),
param_distributions=svc_grid,
cv=5,
n_iter=20,
verbose=True)
rs_svc.fit(X,y)
rs_svc.best_score_
rs_svc.best_params_

53
Random Forest Classifier
RandomForestClassifier()
rf_grid={'n_estimators':np.arange(10,1000,10),
'max_features':['auto','sqrt'],
'max_depth':[None,3,5,10,20,30],
'min_samples_split':[2,5,20,50,100],
'min_samples_leaf':[1,2,5,10]
}
rs_rf=RandomizedSearchCV(RandomForestClassifier(),
param_distributions=rf_grid,
cv=5,
n_iter=20,
verbose=True)
rs_rf.fit(X,y)
rs_rf.best_score_
17. Save The Model
X = data.drop('Loan_Status',axis=1)y = data['Loan_Status']
rf = RandomForestClassifier(n_estimators=270,
min_samples_split=5,
min_samples_leaf=5,
max_features='sqrt',
max_depth=5)
rf.fit(X,y)
RandomForestClassifier(max_depth=5, max_features='sqrt',
min_samples_leaf=5,
min_samples_split=5, n_estimators=270)
:
import joblib
joblib.dump(rf,'loan_status_predict')
['loan_status_predict']
model = joblib.load('loan_status_predict')
import pandas as pddf = pd.DataFrame({
'Gender':1,

54
'Married':1,
'Dependents':2,
'Education':0,
'Self_Employed':0,
'ApplicantIncome':2889,
'CoapplicantIncome':0.0,
'LoanAmount':45,
'Loan_Amount_Term':180,
'Credit_History':0,
'Property_Area':1},index=[0])
df
result = model.predict(df)
if result==1:
print("Loan Approved")else:
print("Loan Not Approved")
Graphical User Interface(GUI)
import numpy as np
import streamlit as st
import joblib
import pandas as pd
#Loading the model
model=joblib.load('C:/Users/Souma
Maiti/OneDrive/Desktop/Project/rf_model.pkl')
def loan_prediction(inputs):
input_as_np_array=np.array(inputs).reshape(1,-1)
prediction=model.predict(input_as_np_array)
print(prediction)
if (prediction[0]==0):
return 'THE LOAN IS NOT APPROVED'
else:
return 'THE LOAN IS APPROVED FOR YOU'
def main():

55
#give a title
st.title('Loan Status Prediction App')
#Getting the input from user
#GENDER
Gender = st.selectbox(
'Gender',('Male', 'Female'))
st.write('You Selected:', Gender)
if Gender=='Male':
Gender=1
else:
Gender=0
#MARRIED
Married = st.selectbox(
'Married',('Yes', 'No'))
st.write('You Selected:',Married)
if Married=='Yes':
Married=1
else:
Married=0
#DEPENDENTS
Dependents=st.slider('Dependents',0,10,1)
#EDUCATION
Education = st.selectbox(
'Education',('Graduate', 'Not Graduate'))
st.write('You Selected:',Education)
if Education=='Graduate':
Education=1
else:
Education=0
#SELF_EMPLOYED
Self_Employed = st.selectbox(
'Self_Employed',('Yes', 'No'))
st.write('You Selected:',Self_Employed)
if Self_Employed=='Yes':

56
Self_Employed=1
else:
Self_Employed=0
ApplicantIncome =st.text_input('Applicant Income')
CoapplicantIncome=st.text_input('Co-Applicant Income')
LoanAmount=st.text_input('Loan Amount')
Loan_Amount_Term=st.text_input('Loan Amount Terms')
#CREDIT HISTORY
Credit_History= st.selectbox(
'Credit History',('Yes', 'No'))
st.write('You Selected:',Credit_History)
if Credit_History=='Yes':
Credit_History=1
else:
Credit_History=0
#PROPERTY AREA
Property_Area = st.selectbox(
'Property Area',('Rural', 'Semi Urban','Urban'))
st.write('You Selected:',Property_Area)
if Property_Area=='Rural':
Property_Area=0
if Property_Area=='Semi Urban':
Property_Area=1
else:
Property_Area=2
#Code for prediction
pred = ''
if st.button('Predict'):
pred=loan_prediction([Gender,Married,Dependents,Education,Self_Empl
oyed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_
Term,Credit_History,Property_Area])
st.success(pred)
if __name__== '__main__':
main()

57
10) SUMMARY AND CONCLUSION
SUMMARY:
This project objective is to predict the Loan Approval of the
user. So this online banking loan approval system will reduce
the paper work and reduce the wastage of bank asserts and
efforts and also saves the valuable time of the customer.
In our work, a total of five machine learning algorithms which
includes Logistic Regression, Decision Tree, Random forest
classification,Support Vectior Classifier and Gradient Boosting
Classifier are applied to predict the loan approval of customers.
The experimental results conclude that the accuracy of Random
Forest Classification machine learning algorithm is better
compared to others machine learning approach.
CONCLUSION:
The analytical process started from data cleaning and processing,
missing value, exploratory analysis and finally model building
and evaluation. The best accuracy on public test set is higher
accuracy score has been found out. This application can help to
find the Prediction of Bank Loan Approval.
FUTURE WORK:
• Bank Loan Approval prediction to connect with cloud.
• To optimize the work to implement in Artificial Intelligence
environment.

58
11) REFERENCES:
[1] Amruta S. Aphale , Dr. Sandeep R. Shinde, 2020, Predict Loan
Approval in Banking System Machine Learning Approach for
Cooperative Banks Loan Approval, International Journal Of Engineering
Research & Technology (IJERT) Volume 09, Issue 08 (August 2020)
[2] Ashwini S. Kadam, Shraddha R. Nikam, Ankita A. Aher, Gayatri V.
Shelke, Amar S.Chandgude, “Prediction for Loan Approval using
Machine Learning Algorithm” (IRJET) Volume: 08 Issue: 04 | Apr 2021.
[3] M. A. Sheikh, A. K. Goel and T. Kumar, "An Approach for Prediction
of Loan Approval using Machine Learning Algorithm," 2020
International Conference on Electronics and Sustainable Communication
Systems (ICESC), 2020, pp. 490- 494, doi:
10.1109/ICESC48915.2020.9155614.
[4] Rath, Golak & Das, Debasish & Acharya, Biswaranjan. (2021).
Modern Approach for Loan Sanctioning in Banks Using Machine
Learning. Pages={179-188} 10.1007/978-981-15-5243-4_15.
[5] Vincenzo Moscato, Antonio Picariello, Giancarlo Sperlí, A
benchmark of machine learning approaches for credit score prediction,
Expert Systems with Applications, Volume 165, 2021, 113986, ISSN
0957-4174.
[6]Yash Divate, Prashant Rana, Pratik Chavan, “Loan Approval
Prediction Using Machine Learning” International Research Journal of
Engineering and Technology (IRJET) Volume: 08 Issue: 05 | May 2021
[7] WWW.JAVAPOINT.COM

Loan Prediction System Using Machine Learning Algorithms Project Report

More Related Content

What's hot (20)

Similar to Loan Prediction System Using Machine Learning Algorithms Project Report (20)

More from Souma Maiti (20)

Recently uploaded (20)

Loan Prediction System Using Machine Learning Algorithms Project Report