SlideShare a Scribd company logo
i
EXPLORATORY DATA ANALYSIS
IN LOAN APPLICANT APPROVAL
A PROJECT REPORT
Submitted by
JEGAN. S
NAVEEN. V
VIJAYA BARATH. D
in partial fulfillment for the award of the degree
of
BACHELOR OF ENGINEERING
in
COMPUTER SCIENCE AND ENGINEERING
KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY
(AUTONOMOUS)
ANNA UNIVERSITY :: CHENNAI 600 025
APRIL 2023
ii
KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY,
(AUTONOMOUS)
Tholurpatti (Po), Thottiam (Tk), Trichy (Dt) – 621 215
COLLEGE VISION & MISSION STATEMENT
VISION
“To become an Internationally Renowned Institution in Technical Education, Research and
Development by Transforming the Students into Competent Professionals with Leadership Skills
and Ethical Values.”
MISSION
• Providing the Best Resources and Infrastructure.
• Creating Learner centric Environment and continuous –Learning.
• Promoting Effective Links with Intellectuals and Industries.
• Enriching Employability and Entrepreneurial Skills.
• Adapting to Changes for Sustainable Development.
iii
COMPUTER SCIENCE AND ENGINEERING VISION & MISSION STATEMENT
VISION
To produce competent software professionals, academicians, researchers, and
entrepreneurs with moral values through quality education in the field of Computer Science and
Engineering.
MISSION
• Enrich the students' knowledge and computing skills through an innovative teaching-
learning process with state- of- art- infrastructure facilities.
• Endeavour the students to become an entrepreneur and employable through adequate
industry-institute interaction.
• Inculcating leadership skills, professional communication skills with moral and ethical
values to serve the society and focus on students' overall development
PROGRAM EDUCATIONAL OBJECTIVES (PEOs)
PEO I: Graduates shall be professionals with expertise in the fields of Software Engineering,
Networking, Data Mining, and Cloud Computing and shall undertake Software Development,
Teaching and Research.
PEO II: Graduates will analyze problems, design solutions, and develop programs with sound
Domain Knowledge.
PEO III: Graduates shall have professional ethics, team spirit, lifelong learning, good oral and
written communication skills, and adopt the corporate culture, core values and leadership skills.
PROGRAM SPECIFIC OUTCOMES (PSOs)
PSO1: Professional skills: Students shall understand, analyze and develop computer applications
in the field of Data Mining/Analytics, Cloud Computing, Networking, etc., to meet the
requirements of industry and society.
PSO2: Competency: Students shall qualify at the State, National, and International level
competitive examination for employment, higher studies, and research.
iv
PROGRAM OUTCOMES (PO’s)
1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem analysis: Identity, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using the first principles of
mathematics, natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet the specified needs with appropriate
consideration for public health and safety, and the ural, societal, and environmental
considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis, and interpretation of data, and synthesis of
the information to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
9. Individual and teamwork: Function effectively as an individual, and as a member or leader
in diverse teams, and multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
12. Life-Long Learning: Recognize the need for and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
v
KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY
(AUTONOMOUS)
ANNA UNIVERSITY : CHENNAI 600 025
BONAFIDE CERTIFICATE
Certified that this project report “EXPLORATORY DATA ANALYSIS IN LOAN
APPLICANT APPROVAL” is the bonafide work of “JEGAN S (621319104017),
NAVEEN V (621319104037), VIJAYABARATH D (621319104063)” who carried
out the project work under my supervision.
SIGNATURE SIGNATURE
Dr.C.Saravanabhavan, M.Tech., Ph.D., Mr.K.Karthick, M.E., (Ph.D.,)
HEAD OF THE DEPARTMENT SUPERVISOR
Assistant Professor,
Department of Computer Science and Department of Computer Science
Engineering, and Engineering,
Kongunadu College of Engineering and Kongunadu College of Engineering
Technology, Thottiam, Trichy. and Technology, Thottiam, Trichy.
Submitted for the Project Viva-Voce examination held on
Internal Examiner External Examiner
vi
ACKNOWLEDGEMENT
We wish to express our sincere thanks to our beloved Chairman
Dr.PSK.R.PERIASWAMY for providing immense facilities in our institution.
We proudly render our thanks to our Principal Dr.R.ASOKAN, M.S.,
M.Tech., Ph.D., for the facilities and the encouragement was given by him to
the progress and completion of our project.
We proudly render our immense gratitude and sincere thanks to our Head
of the Department of Computer Science and Engineering
Dr.C.SARAVANABHAVAN, M.Tech., Ph.D., for his effective leadership,
encouragement, and, guidance in the project.
We are highly indebted to provide our heart full thanks to our supervisor
Mr.K.KARTHICK, M.E., (Ph.D.,) for his valuable suggestion during
execution of our project work and for continued encouragement in conveying us
for making many constructive comments for improving comments the operation
of this project report.
We are highly indebted to provide our heart full thanks to our project
coordinator Mr.K.KARTHICK, M.E., (Ph.D.,) for his valuable ideas, constant
encouragement, and supportive guidance throughout the project.
We wish to extend our sincere thanks to all teaching and non-teaching staff
of the Computer Science and Engineering department for their valuable
suggestion, cooperation, and encouragement in the successful completion of this
project.
We wish to acknowledge the help received from various departments and
various individuals during the preparation and editing stages of the manuscript.
vii
ABSTRACT
The loan lending process is the manual evaluation of loan applications which
can lead to errors, discrimination, lack of transparency, and lack of fair lending
practices. The Solution in computerized system typically includes a user-friendly
interface for loan applicants to submit their information, and a set of algorithms
and rules to assess the applicant's creditworthiness and ability to repay the loan.
The system also integrates with external credit bureaus and financial institutions
to retrieve additional information about the applicant. The system can quickly
and accurately evaluate a large volume of loan applications, and provide a
decision (approval or rejection) in real-time using XGBoost Algorithm. This can
help lenders to reduce the cost and time for loan processing and also to decrease
the risk of loan defaults. In addition to the encryption of data, the system also
includes functionality to send approval/rejection emails to applicants. This helps
to streamline the communication process and keep applicants informed of the
status of their loan application. Overall, the combination of data encryption and
automated email communication helps to improve the security and efficiency of
the process and XGBoost Algorithm which results 85.7% Accuracy rate.
viii
TABLE OF CONTENTS
CHAPTER NO TITLE PAGE NO
ABSTRACT vii
LIST OF FIGURES x
LIST OF ABBREVIATIONS xii
1. INTRODUCTION 1
1.1 OVERVIEW 1
1.2 OBJECTIVES AND GOALS 2
1.3 APPLIED DATA SCIENCE (ADS) 3
1.3 MACHINE LEARNING (ML) 6
2. LITERATURE SURVEY 7
3. SYSTEM ANALYSIS 15
3.1 EXISTING SYSTEM 15
3.1.1 Disadvantages 15
3.2 PROPOSED SYSTEM 16
3.2.1 Advantages 17
4. SYSTEM REQUIREMENTS 18
4.1 HARDWARE REQUIREMENTS 18
4.2 SOFTWARE REQUIREMENTS 18
5. SYSTEM DESIGN 19
5.1 ARCHITECTURE DIAGRAM 19
6. SYSTEM IMPLEMENTATION 23
6.1 MODULES 23
6.2 MODULES DESCRIPTION 23
6.2.1 Data Collection 23
6.2.2 Data Preprocessing 24
6.2.3 Data Visualization 24
6.2.4 Exploratory Data Analysis 25
ix
6.2.5 Model Building 25
6.2.6 Model Deployment 27
6.2.5 Secured Environment Module 28
7. SYSTEM TESTING 32
7.1 TEST PLAN 32
7.2 TEST CASES 33
8. SIMULATION RESULTS 35
8.1 DATA COLLECTION AND
PACKAGE IMPORTING
35
8.2 DATA STORED 36
8.3 UI DEVELOPMENT (FLASK) 37
8.4 LOAN APPLICATION STATUS
(APPROVED/REJECTED)
39
8.5 PERFORMANCE METRICS 40
8.6 MAIL SENDER USING SMTP AND
MIME
41
8.7 AADHAR/PAN VERIFICATION 43
9. APPENDICES 44
9.1 SAMPLE CODE 44
9.2 SCREENSHOTS 54
10. CONCLUSION AND FUTURE
ENHANCEMENT
58
10.1 CONCLUSION 58
10.2 FUTURE ENHANCEMENT 58
REFERENCES 59
PUBLICATION 61
x
LIST OF FIGURES
FIGURE NO NAME OF THE FIGURE PAGE NO
5.1 Architecture diagram 19
5.2 Cipher Block Chaining (CBC)mode
encryption
20
5.3 Cipher Block Chaining (CBC) mode
decryption
21
5.4 Formulae based Algorithm 22
6.1 Data Collection (Train and Test Data) 23
6.2 Data Visualization 24
6.3 Loading Model to Interface 26
6.4 Cryptography Fernet Implementation 28
6.5 Split up Server Architecture 29
6.6 Servers maintained in Nodejs Servers 30
6.7 MySQL storage value with hashed
process.
30
6.8 Accuracy Score, Recall, Precision, F1
Scores
31
8.1 Importing Essential Libraries 35
8.2 Hashed Value Pair of plain text are stored
in MySQL
36
8.3 Key values of the Cipher Text 36
8.4 Four maintained NodeJs ports using
Apache server.
37
8.5 Login/Register Page Using cryptography
fernet
37
8.6 User Interface using HTML, CSS, Js,
Session
38
8.7 Loan Application Form 39
8.8 Application Status 40
8.9 Model Evaluation of test data 41
xi
8.10 Email Loan Application Status 41
8.11 Mail Sending Module Using SMTP
Protocol
42
8.12 Checking Aadhar from Database 43
8.13 Invalid Aadhar Intimation 43
9.1 Login/Register Page Using cryptography
fernet
54
9.2 Online Banking Website 54
9.3 Application Status Interface with flask 55
9.4 Performance Metrics 55
9.5 Fitting the Model with Train and Test
data
55
9.6 Composition of Loan Status by Education 56
9.7 Composition of Loan Status by
dependents
56
9.8 Loan Application Interface 57
xii
LIST OF ABBREVIATIONS
AES Advanced Encryption Standard
AI Artificial Intelligence
CBC Cipher Block Chaining
CSS Cascading Style Sheet
DL Deep Learning
HMAC Hash-based Message Authentication Code
HMAC-SHA256 Hash-based Message Authentication Code with
Secure Hash Algorithm 256-bit.
HTML Hypertext Markup Language
JS JavaScript
JSON JavaScript Object Notation
KDF Key Derivation Function
ML Machine Learning
NLTK Natural Language Toolkit
PAN Permanent Account Number
UI User Interface
XGBoost eXtreme Gradient Boosting Algorithm
1
1.1 OVERVIEW
CHAPTER 1
INTRODUCTION
In the loan application approval project, the goal is to conduct an Exploratory
Data Analysis (EDA) on loan applicant data to determine the factors that
influence loan approval. The data collected for the EDA will include information
about the loan applicant, such as personal and financial details, as well as
information about the loan, such as the loan amount and purpose. The data will be
cleaned and processed to ensure it is suitable for analysis, after which various data
visualizations and statistical methods will be used to explore the data and uncover
patterns and relationships. The results of the EDA will provide valuable insights
into the loan approval process and potentially improve its accuracy.
Data security and privacy are paramount concerns in any project involving
sensitive information. In the loan application approval project, sensitive
information from the loan applicants is encrypted using the Fernet cryptography
method to ensure privacy and protect against unauthorized access. This method
uses state-of-the-art encryption techniques to secure the sensitive information,
helping to meet regulations regarding data privacy and security and reducing the
risk of penalties.
The Fernet cryptography method not only ensures the privacy and security of
sensitive information, but it also helps to build trust with loan applicants. Loan
applicants are often hesitant to share sensitive information due to concerns about
its security and privacy. By using encryption to secure their information, loan
applicants can be assured that their information will be protected, increasing the
likelihood that they will be willing to share the information necessary for the loan
approval process.
2
1.2 OBJECTIVES AND GOALS
The primary objectives of the project "Exploratory Data Analysis in Loan
Applicant Approval" are:
1. To create an online loan approval system that is user-friendly and accessible
to both loan applicants and lenders.
2. To implement the Fernet algorithm for secure communication and data
transmission between the loan applicant and lender.
3. To automate the loan approval process and reduce the time and effort
required for manual processing.
4. To ensure the confidentiality and integrity of sensitive information such as
personal details and loan applications.
5. To provide a platform for loan applicants to easily apply for loans and
receive instant loan approval or rejection based on their creditworthiness.
Goals:
The Primary goals of the project "Exploratory Data Analysis in Loan
Applicant Approval" are:
1. To increase the accessibility and efficiency of the loan approval process.
2. To improve the security of sensitive information during communication and
data transmission.
3. To reduce the risk of fraud and identity theft.
4. To provide loan applicants with an improved user experience and a quick and
convenient way to apply for loans.
5. To benefit lenders by automating the loan approval process, reducing manual
processing time and effort, and providing a secure platform for loan
applications.
3
1.3 APPLIED DATA SCIENCE
Applied data science is the process of using statistical and computational
methods to analyze data, extract insights and knowledge, and apply them to real-
world problems or decision-making. It involves a combination of skills and
expertise in mathematics, statistics, programming, and domain-specific
knowledge.
The process of applied data science typically involves the following steps:
• Problem identification: Identify a problem or a question that can be addressed
through data analysis.
• Data collection: Gather relevant data from various sources, such as databases,
APIs, or surveys.
• Data preprocessing: Clean and transform the data into a suitable format for
analysis, including handling missing values, outliers, and encoding categorical
variables.
• Data exploration and visualization: Explore and visualize the data to gain
insights and identify patterns or trends.
• Statistical modeling: Apply statistical and machine learning techniques to
develop models that can explain or predict the target variable.
• Model evaluation: Evaluate the performance of the models using appropriate
metrics and compare them with each other.
• Deployment and monitoring: Deploy the model in production and monitor its
performance over time to ensure its effectiveness and accuracy.
Applied data science can be applied in various fields, such as finance,
healthcare, marketing, and social sciences. Its applications include fraud detection,
customer segmentation, disease diagnosis, image and speech recognition, and
many more.
4
More Relatively Applied Data Science provides
Communication and interpretation: Communicate the results of the data
analysis to stakeholders in a clear and understandable way. This involves
translating complex statistical concepts and results into actionable insights and
recommendations.
Data ethics: Consider ethical issues and potential biases that may arise from the
data or the models. For example, data privacy, fairness, and transparency are
important ethical considerations in data science.
Data engineering: Develop and maintain the infrastructure and tools for data
collection, storage, and processing. This involves working with big data
technologies such as Hadoop, Spark, and NoSQL databases.
Collaborative work: Data science projects often involve working in
interdisciplinary teams, including data scientists, domain experts, software
engineers, and business analysts. Effective communication and collaboration are
essential for the success of such projects.
Continuous learning: Data science is a rapidly evolving field, and staying up-to-
date with new techniques, tools, and trends is crucial for success. Continuous
learning and professional development are necessary to keep pace with the
changing landscape of data science.
Feature engineering: This involves selecting and creating relevant features (i.e.,
variables) from the raw data to improve the accuracy of machine learning models.
Feature engineering requires a deep understanding of the data and the problem at
hand.
Natural Language Processing (NLP): NLP is a subfield of data science that
focuses on understanding and analyzing human language. NLP techniques can be
used for tasks such as sentiment analysis, text classification, and language
translation.
5
Computer Vision: Computer vision is another subfield of data science that deals
with the interpretation of images and videos. Computer vision techniques can be
used for tasks such as object recognition, facial recognition, and image
segmentation.
Time series analysis: Time series analysis is a statistical technique that deals with
data that changes over time. Time series analysis can be used for tasks such as
forecasting future values, identifying trends and patterns, and detecting anomalies.
Cloud computing: Cloud computing has become increasingly important in data
science due to the need for large-scale data processing and storage. Cloud-based
platforms such as AWS, Google Cloud, and Microsoft Azure provide scalable and
cost-effective solutions for data science projects.
Data visualization: Data visualization is the process of creating visual
representations of data to communicate insights and findings. Effective data
visualization requires a balance between aesthetics and functionality, and it can
help stakeholders better understand complex data.
Deep learning: Deep learning is a subfield of machine learning that focuses on
training neural networks with multiple layers. Deep learning techniques have been
used for tasks such as image recognition, speech recognition, and natural language
processing.
6
1.4 MACHINE LEARNING
Machine learning is a subfield of artificial intelligence that focuses on the
development of algorithms and statistical models that enable computers to "learn"
from and make predictions or decisions without being explicitly programmed. It's
based on the idea that a computer program can learn from data, identify patterns,
and make decisions with minimal human intervention. Machine learning has
become increasingly popular in recent years, as the growth of big data and the
availability of powerful computing resources have made it possible to process
large amounts of data and develop complex models.
There are several types of machine learning, including supervised learning,
unsupervised learning, semi-supervised learning, and reinforcement learning. In
supervised learning, the algorithm is trained on a labeled dataset, where the
correct output is provided for each input. In unsupervised learning, the algorithm
is not given labeled data and must find patterns or structure in the data on its own.
Semi-supervised learning is a combination of supervised and unsupervised
learning, where the algorithm is provided with some labeled data and must find
structure in the remaining unlabeled data. In reinforcement learning, the algorithm
receives feedback in the form of rewards or penalties for its actions, and learns to
make decisions based on this feedback.
Machine learning is used in a variety of applications, including image
recognition, natural language processing, fraud detection, recommendation
systems, and predictive maintenance. Despite its many advantages, machine
learning also has its challenges, such as the potential for biased results, the
difficulty in interpreting complex models, and the need for large amounts of high-
quality training data. However, with continued advancements in the field, machine
learning is poised to become an even more integral part of our lives and
revolutionize the way we interact with technology.
7
CHAPTER 2
LITERATURE SURVEY
[1] TITLE: Prediction of Loan Behavior with Machine Learning Models for
Secure Banking (2022)
Author: Anand, Mayank, Arun Velu, and Pawan Whig
Given loan default prediction has such a large impact on earnings, it is one of
the most influential factor on credit score that banks and other financial
organizations face. There have been several traditional methods for mining
information about a loan application and some new machine learning methods of
which, most of these methods appear to be failing, as the number of defaults in
loans has increased. For loan default prediction, a variety of techniques such as
Multiple Logistic Regression, Decision Tree, Random Forests, Gaussian Naive
Bayes, Support Vector Machines, and other ensemble methods are presented in
this research work. The prediction is based on loan data from multiple internet
sources such as Kaggle, as well as data sets from the applicant's loan application.
Significant evaluation measures including Confusion Matrix, Accuracy, Recall,
Precision, F1- Score, ROC analysis area and Feature Importance has been
calculated and shown in the results section. It is found that Extra Trees Classifier
and Random Forest has highest Accuracy of using predictive modelling, this
research concludes effectual results for loan credit disapproval on vulnerable
consumers from a large number of loan applications.
Techniques: Extra Trees Classifier and Random Forest.
Merits: Loan default prediction using machine learning techniques can help
financial organizations to make more informed decisions about loan approval.
8
[2] TITLE: Comparative Analysis of Customer Loan Approval Prediction
using Machine Learning Algorithms. (2022)
Author: Tumuluru & Praveen
In today’s increasingly competitive market, estimating the risk involved in a
loan application is one of the most crucial challenges for banks’ survival and
profitability. The banks receive many loan applications from their customers and
other individuals daily. Not every applicant is accepted. Most banks employ their
credit scoring and risk assessment procedures to examine loan applications and
make credit approval decisions. Despite this, many incidents of people failing to
repay loans or defaulting on them occur every year, causing financial institutions
to lose a significant amount of money. In this study, Machine Learning (ML)
algorithms are used to extract patterns from a common loan-approved dataset and
retrieve patterns in forecasting future loan defaulters. Customers’ past data, such
as their age, income, loan amount, and tenure of work, will be used to conduct the
analysis. To determine the maximum relevant features, i.e. the factors that have
the most impact on the prediction outcome, various ML algorithms such as
Random Forest, Support Vector Machine, K-Nearest Neighbor and Logistic
Regression, were used. These mentioned algorithms are evaluated with the
standard metrics and compared with each other. The random forest algorithm
achieves better accuracy.
Techniques: Random Forest algorithm, K-Nearest Neighbor algorithm
Merits: Machine learning algorithms can extract patterns from large datasets that
are difficult for humans to recognize.
9
[3] TITLE: The biometric cardless transaction with shuffling keypad using
proximity sensor (2020)
Author: Adebiyi & Marion O.
Loan approval is an essential factor that decides the loss or gains a financial
institution would accrue at the end of a fiscal year. Banks are looking for ways to
ensure that these loans are paid back within the specified period. Therefore, this
study aims to develop a loan prediction system using Artificial Neural Network
that will determine whether a loan is a good or bad one and whether a loan is a
payable debt or bad debt. The system can also assist to predict whether a loan
applicant would default in repayment or not. The study used an Artificial Neural
Network algorithm to develop a loan prediction scheme. The system was designed
and implemented using Python as the programming language, Hypertext Mark-Up
Language (HTML), Cascading Style sheet (CSS) for the front end, and then PHP
as the backend. The system also used the confusion matrix as the performance
metrics to evaluate the system accuracy. The result shows that the system has 92%
accuracy which showed that the developed system predicted well and can predict
whether a loan applicant would default in repayment or not. The system can also
predict whether a loan is a bad debtor payment one. The system was finally
compared with other previous researches using the accuracy of the system and it
was concluded that the proposed system performed better than the previous
researches.
Techniques: Artificial Neural Network (ANN) algorithm was used to develop the
loan prediction system.
Merits: The system used the confusion matrix as the performance metrics to
evaluate the system's accuracy. This is a widely used method for evaluating the
accuracy of classification models.
10
[4] TITLE: OTP based cardless transction using ATM (2019)
Author: Kadam & Ashwini S
In this banking system, banks have many products to sell but main source of
income of any banks is on its credit line. So they can earn from interest of those
loans which they credits. A bank’s profit or a loss depends to a large extent on
loans i.e. whether the customers are paying back the loan or defaulting. By
predicting the loan defaulters, the bank can reduce its Non-performing Assets.
This makes the study of this phenomenon very important. Previous research in
this era has shown that there are so many methods to study the problem of
controlling loan default. But as the right predictions are very important for the
maximization of profits, it is essential to study the nature of the different methods
and their comparison. A very important approach in predictive analytics is used to
study the problem of predicting loan defaulters (i) Collection of Data, (ii) Data
Cleaning and (iii) Performance Evaluation. Experimental tests found that the
Naïve Bayes model has better performance than other models in terms of loan
forecasting.
Techniques: Naïve Bayes Model.
Merits: The study shows that the Naïve Bayes model performs better than other
models in predicting loan defaulters. This means that the model can accurately
predict whether an applicant is likely to default on their loan or not.
11
[5] TITLE: Loan Default Prediction using Decision Trees and Random
Forest: A Comparative Study (2021)
Author: Mehul Madaan, Aniket Kumar, Chirag Keshri, Rachna Jain and Preeti
Nagrath.
With the improving banking sector in recent times and the increasing trend of
taking loans, a large population applies for bank loans. But one of the major
problem banking sectors face in this ever-changing economy is the increasing rate
of loan defaults, and the banking authorities are finding it more difficult to
correctly assess loan requests and tackle the risks of people defaulting on loans.
The two most critical questions in the banking industry are (i) How risky is the
borrower? and (ii) Given the borrower's risk, should we lend him/her? In light of
the given problems, this paper proposes two machine learning models to predict
whether an individual should be given a loan by assessing certain attributes and
therefore help the banking authorities by easing their process of selecting suitable
people from a given list of candidates who applied for a loan. This paper does a
comprehensive and comparative analysis between two algorithms (i) Random
Forest, and (ii) Decision Trees. Both the algorithms have been used on the same
dataset and the conclusions have been made with results showing that the Random
Forest algorithm outperformed the Decision Tree algorithm with much higher
accuracy.
Techniques: Random Forest and Decision Trees.
Merits: The study addresses a critical problem faced by the banking industry: the
increasing rate of loan defaults and the need to assess loan requests and borrower
risk.
12
[6] TITLE: Prediction of Modernized Loan Approval System Based on
Machine Learning Approach (2021)
Author: Vishal Singh, Ayushman Yadav & Rajat Awasthi
Technology has boosted the existence of humankind the quality of life they
live. Every day we are planning to create something new and different. We have a
solution for every other problem we have machines to support our lives and make
us somewhat complete in the banking sector candidate gets proofs/ backup before
approval of the loan amount. The application approved or not approved depends
upon the historical data of the candidate by the system. Every day lots of people
applying for the loan in the banking sector but Bank would have limited funds. In
this case, the right prediction would be very beneficial using some classes-
function algorithm. An example the logistic regression, random forest classifier,
support vector machine classifier, etc. A Bank’s profit and loss depend on the
amount of the loans that is whether the Client or customer is paying back the loan.
Recovery of loans is the most important for the banking sector. The improvement
process plays an important role in the banking sector. The historical data of
candidates was used to build a machine learning model using different
classification algorithms. The main objective of this paper is to predict whether a
new applicant granted the loan or not using machine learning models trained on
the historical data set.
Techniques: logistic regression, random forest classifier, and support vector
machine classifier.
Merits: Machine learning algorithms can process large amounts of data quickly
and accurately, allowing banks to make informed decisions about loan
applications in a timely manner.
13
[7] TITLE: Accurate Loan Approval Prediction Based on Machine Learning
Approach (2020)
Author: J. Tejaswini, T. Mohana Kavya, R. Devi Naga Ramya, P. Sai Triveni,
Venkata Rao Maddumala.
ACT Loan approval is a very important process for banking organizations.
Banking Industry always needs a more accurate predictive modeling system for
many issues. Predicting credit defaulters is a difficult task for the banking
industry. The system approved or rejects the loan applications. Recovery of loans
is a major contributing parameter in the financial statements of a bank. It is very
difficult to predict the possibility of payment of loan by the customer. Machine
Learning (ML) techniques are very useful in predicting outcomes for large amount
of data. In this paper three machine learning algorithms, Logistic Regression
(LR), Decision Tree (DT) and Random Forest (RF) are applied to predict the loan
approval of customers. The experimental results conclude that the accuracy of
Decision Tree machine learning algorithm is better as compared to Logistic
Regression and Random Forest machine learning approaches.
Techniques: Logistic Regression, Decision Tree, and Random Forest.
Merits: Machine learning algorithms can analyze large amounts of data and
identify patterns that humans may not be able to detect. This leads to more
accurate predictions of loan approvals and defaults.
14
[8] TITLE: A Federated Learning Based Approach for Loan Defaults
Prediction (2020)
Author: Geet Shingi.,
The number of defaults in bank loans have recently been increasing in the
past years. However, the process of sanctioning the loan has still been done
manually in many of the banking organizations. Dependency on human
intervention and delay in results have been the biggest obstacles in this system.
While implementing machine learning models for banking applications, the
security of sensitive customer banking data has always been a crucial concern and
with strong legislative rules in place, sharing of data with other organizations is
not possible. Along with this, the loan dataset is highly imbalanced, there are very
few samples of defaults as compared to repaid loans. Hence, these problems make
the default prediction system difficult to learn the patterns of defaults and thus
difficult to predict them. We propose a federated learning-based approach for the
prediction of loan applications that are less likely to be repaid which helps in
resolving the above mentioned issues by sharing the weight of the model which
are aggregated at the central server. The federated system is coupled with
Synthetic Minority Over-sampling Technique (SMOTE) to solve the problem of
imbalanced training data. Further, the federated system is coupled with a weighted
aggregation based on the number of samples and performance of a worker on his
dataset to further augment the performance. The improved performance by this
model on publicly available real-world data further validates the same.
Techniques: Synthetic Minority Over-sampling Technique (SMOTE).
Merits: By using a machine learning model, the loan approval process can be
automated, reducing the dependency on human intervention to get faster results.
15
CHAPTER 3
SYSTEM ANALYSIS
3.1 EXISTING SYSTEM
The existing system for online loan lender approval typically involves a
manual process, where borrowers fill out paper or digital loan application forms,
submit the required documentation, and meet with a loan officer for an in-person
interview. The loan officer then evaluates the borrower's credit history,
employment status, income, and other factors to determine the risk involved in
granting the loan. If the loan is approved, the loan officer provides the borrower
with the necessary loan documents, which must be signed and returned. The loan
funds are then disbursed, usually after a background check and verification of the
borrower's information.
This manual loan approval process can result in long wait times for
borrowers. Additionally, there is a risk of human error, such as missing or
inaccurate information, in the manual loan approval process. Despite these
limitations, the manual loan approval process remains the most common method
used by many financial institutions today.
3.1.1Disadvantages
The existing manual loan approval system for online lenders has several
disadvantages, including:
1. Time-consuming: The manual loan approval process can be slow and take
several days or even weeks to complete. This can be frustrating for
borrowers who are in need of quick access to funds.
2. Increased risk of human error: The manual loan approval process
involves a significant amount of manual effort, which can result in human
error. For example, loan officers may miss important information or make
incorrect assessments of a borrower's creditworthiness.
16
3. Lack of transparency: The manual loan approval process can be opaque,
with little visibility into the factors that influence loan approval decisions.
This can make it difficult for borrowers to understand why their loan
applications were approved or rejected.
4. Limited loan options: The manual loan approval process can limit the
number of loan options available to borrowers, as loan officers may only be
able to offer a limited range of loan products.
3.2 PROPOSED SYSTEM
The proposed online loan lender approval system is designed to address the
limitations of the existing manual loan approval process. This system leverages
technology such as cryptography and machine learning algorithms to streamline
the loan approval process, increase transparency, and enhance security.
The proposed system will automate the loan approval process, allowing
borrowers to apply for loans online and receive near-instant loan decisions. This
will significantly reduce the risk of human error and minimize wait times for loan
approval. Additionally, the use of cryptography and encryption algorithms will
ensure the security and privacy of sensitive borrower information, providing
greater peace of mind for borrowers and reducing the risk of security breaches and
identity theft.
In addition to this, PAN Card verification can also be added as an additional
feature to determine the credit score of the loan applicants. This project aims to
implement the exploratory data analysis in the loan application approval system
using the Fernet cryptography process and PAN Card verification to improve the
accuracy and security of the loan approval process. The results of this project
show that the use of the Fernet cryptography process and PAN Card verification
has improved the security and accuracy of the loan approval process.
17
3.2.1Advantages
The proposed online loan lender approval system offers several advantages
over the existing manual loan approval process, including:
1. Automation: The loan approval process will be fully automated, reducing
the need for manual effort and minimizing the risk of human error.
Borrowers will be able to apply for loans online and receive near-instant
loan decisions.
2. Increased transparency: The use of cryptography and encryption
algorithms will ensure the security and privacy of sensitive borrower
information.
3. Improved user experience: The proposed system improves the loan
approval process by reducing wait times and enhancing the user experience
for borrowers.
4. Increased loan options: The proposed system will allow lenders to offer a
wider range of loan products to borrowers, giving them greater flexibility
and choice in their loan options.
5. Enhanced security: The use of cryptography and encryption algorithms
will protect sensitive borrower information and reduce the risk of security
breaches and identity theft.
18
CHAPTER 4
SYSTEM REQUIREMENTS
4.1 HARDWARE REQUIREMENTS
• Processor : Multi-core processor with a clock speed of at least 2.5 GHz
or higher.
• RAM : 8GB
• Hard disk : Solid-State Drives (SSDs)- 500GB or higher
• Keyboard : Standard keyboard and mouse
• Monitor : LCD or LED displays with at least 1920 x 1080 resolution
4.2 SOFTWARE REQUIREMENTS
• TOOL : Visual Studio Code
• Frontend : HTML, CSS, Js, PHP
• Framework : Python Flask Framework
• Languages : Python 3 (Python 3.7 and above must be installed)
• Operating system: Windows 10/Mac OS/Linux
• Technologies : Python 3.7, Flask, XGBoost, JSON, NodeJs, PHP
19
CHAPTER 5
SYSTEM DESIGN
5.1 ARCHITECTURE DIAGRAM
Fig 5.1 Architecture diagram
The Visual Representation shows the different parts of a loan application
approval system. Users enter loan application data in an Excel sheet, which is then
analyzed and pre-processed by the Exploratory Data Analysis (EDA) module.
The data is then encrypted using the Fernet Cryptography module. The
XGBoost Algorithm uses the encrypted data to build a machine learning model to
predict loan approvals. The Model Training module trains the model, and the Data
Visualization module generates visualizations of the model's performance. The
Model Deployment module deploys the trained model using FLASK and PKL
formats, and the user receives an approval or rejection decision based on their
loan application data. The system ensures security by encrypting sensitive data
before using it to build the machine learning model. Overall, the loan application
approval system architecture has four main components: the user interface, the
database, the loan approval system, and the cryptography module.
20
User Interface:
This component is responsible for providing a graphical user interface (GUI)
that enables borrowers to interact with the system. This may include a web-based
or mobile-based interface that provides forms and fields for borrowers to input
information and request loan approvals.
Database:
This component is responsible for storing the data generated by the system.
This may include a relational database management system (RDBMS) such as
MySQL or PostgreSQL, or a NoSQL database such as MongoDB.
Loan Approval System:
This component is responsible for evaluating loan requests and determining
whether to approve or reject a loan application. This may include a combination
of manual review by loan officers and automated decision-making algorithms that
analyze credit scores, income, and other factors.
Cryptography Module:
This component is responsible for securely encrypting and decrypting
sensitive data, such as borrowers' personal information and financial details.
Fig 5.2 Cipher Block Chaining (CBC)mode encryption.
21
Fig 5.3 Cipher Block Chaining (CBC) mode decryption.
This may include the use of the Fernet algorithm, which is a symmetric encryption
algorithm that uses secure key to encrypt and decrypt data.
XGBoost Algorithm:
The online loan lender approval system can also incorporate a machine
learning model that is trained using the XGBoost algorithm to improve the
accuracy and efficiency of the loan approval process. When using gradient
boosting for regression, the weak learners are regression trees, and each regression
tree maps an input data point to one of its leafs that contains a continuous score.
XGBoost minimizes a regularized (L1 and L2) objective function that combines a
convex loss function (based on the difference between the predicted and target
outputs) and a penalty term for model complexity (in other words, the regression
tree functions). The training proceeds iteratively, adding new trees that predict the
residuals or errors of prior trees that are then combined with previous trees to
make the final prediction. It's called gradient boosting because it uses a gradient
descent algorithm to minimize the loss when adding new models. XGBoost
(eXtreme Gradient Boosting) is a powerful and widely used machine learning
algorithm that provides robust and scalable solutions for regression and
classification problems.
22
Fig 5.4 Formulae based Algorithm
In the context of the loan approval system, the XGBoost algorithm can be
used to train a model on historical loan data, such as loan amount, loan term,
credit score, and other relevant factors, to determine the probability of loan
default.
The specific design of the architecture will depend on the specific
requirements of the project, such as the scale of the project, the complexity of the
loan approval process, and the security and privacy requirements for the system.
23
CHAPTER 6
SYSTEM IMPLEMENTATION
6.1 MODULES
To effectively implement a Loan Lending system, it is important to
categorize the different stages involved in the process. The different segments of
the system implementation can be categorized as follows:
• Data Collection
• Data Preprocessing
• Data visualization
• Exploratory Data Analysis (EDA)
• Model Building
• Model Deployment (FLASK & PKL Model)
• Secured Environment Module
6.2 MODULE DESCRIPTION
6.2.1.Data Collection
The loan approval prediction system can get loan application data from
Kaggle, a platform that offers many machine learning datasets. This data includes
loan application details, borrower information, and loan outcomes.
Fig 6.1 Data Collection (Train and Test Data)
24
Kaggle offers a wide range of loan application data, which can improve the
accuracy of the system's predictions. Kaggle also provides valuable information
and feedback that can help users make better decisions.
6.2.2.Data Preprocessing
Once the data has been collected, it needs to be pre-processed to ensure that
it is clean and usable for exploratory data analysis. This step involves handling
missing or erroneous data, converting categorical data into numerical features, and
normalizing or scaling the data.
6.2.3.Data Visualization
Data visualization is a crucial aspect of exploratory data analysis that helps in
understanding the patterns and trends in the data. In this loan application approval
system project, we have used several visualization techniques to gain insights into
the data and to identify the important features for model building.
Fig 6.2 Data Visualization
Firstly, we have used a correlation matrix to identify the correlation between
different features of the dataset. This helped us in identifying the most important
features for loan approval. Secondly, we have created bar graphs and histograms
to visualize the distribution of loan amounts and loan terms.
25
Finally, we have created a confusion matrix to visualize the performance of
the model. This helped us in identifying the number of true positives, false
positives, true negatives, and false negatives, which is crucial for evaluating the
performance of the model.
6.2.4.Exploratory Data Analysis
The next step involves the exploratory data analysis process, which involves
analyzing various features of the loan applications, such as the loan amount, the
loan purpose, and the applicant's credit score. The analysis may include
visualizations such as scatter plots, histograms, and heatmaps to identify patterns
and correlations in the data.
The development of a loan application approval model that takes an Excel
sheet as input and predicts the likelihood of loan approval. The model will analyse
various features of the loan applications such as the loan amount, credit score,
income, and employment history, and provide a prediction on whether the loan
should be approved or not. The output will be presented in the form of an
approval section, which will display the predicted results for each loan application
in the Excel sheet.
6.2.5.Model Building
1. Data preparation: The first step in building a model using XGBoost is to
prepare the data. This includes cleaning, pre-processing, and transforming
the data to make it suitable for use in the model.
2. Splitting the Data: The data is divided into training and testing sets. The
training set is used to develop the model, while the testing set is used to
evaluate its performance.
3. Setting the Parameters: The XGBoost algorithm has several parameters
that need to be set before building the model, such as the maximum depth
of the tree, the learning rate, and the number of trees to build. The
parameters are set based on the nature of the problem and the characteristics
of the data.
26
def ValuePredictor(data = pd.DataFrame):
# Model name
model_name = 'bin/xgboostModel.pkl'
# Directory where the model is stored
model_dir = os.path.join(current_dir, model_name)
# Load the model
loaded_model = joblib.load(open(model_dir, 'rb'))
# Predict the data
result = loaded_model.predict(data)
return result[0]
4. Training the Model: The XGBoost model is trained on the training dataset.
During the training process, the algorithm builds a set of decision trees
based on the input features and the target variable.
5. Evaluating the Model: After training the model, it is evaluated on the
testing dataset to assess its performance. This is done using metrics such as
accuracy, precision, recall, and F1 score.
6. Tuning the Parameters: The XGBoost model has several hyperparameters
that need to be tuned to optimize its performance. This involves selecting
the best combination of hyperparameters using techniques such as grid
search or random search.
7. Finalizing the Model: Once the optimal hyperparameters have been
identified, the XGBoost model is retrained on the entire dataset using the
optimal hyperparameters. This is done to finalize the model that will be
used for predictions.
8. Saving the Model: The final model is saved to a file using a library like
Pickle so that it can be loaded and used later for making predictions on new
data.
Fig 6.3 Loading Model to Interface
Overall, building a model using XGBoost involves several important steps,
including data preparation, setting parameters, training and evaluating the model,
tuning hyperparameters, finalizing the model, and saving the model. These steps
are crucial for building a robust and accurate machine learning model.
27
6.2.6.Model Deployment
Model deployment is an important step in the data science project life cycle
that involves making the model available for use by end-users. Flask is a popular
Python web framework that can be used for deploying machine learning models.
Steps involved in deploying a model using Flask and pickle:
1. Export the trained model: Once the model has been trained, it needs to be
exported to a file format that can be loaded by Flask. The pickle library in
Python can be used to serialize the model object and save it as a .pkl file.
2. Set up the Flask app: Create a new Flask app and define the endpoints that
will handle incoming requests from the user.
3. Load the model: In the Flask app, load the trained model from the .pkl file
using the pickle library.
4. Define the prediction function: Define a function that takes in the user
input, preprocesses it, and uses the loaded model to make a prediction.
5. Create the API endpoint: Create an endpoint that will receive incoming
requests from the user, preprocess the input, and use the prediction function
to return the predicted output.
6. Test the endpoint: Test the API endpoint by sending sample requests and
checking if the predicted output matches the expected output.
7. Deploy the Flask app: Finally, deploy the Flask app on a server that can be
accessed by end-users.
Once the model has been deployed using Flask and pickle, users can access it
through a web interface or API endpoint, and use it to make predictions on new
data. It's important to monitor the performance of the model in production and
update it periodically to ensure that it remains accurate and reliable.
28
def encrypt(message):
data=[]
key = Fernet.generate_key()
print(key)
data.append(key)
fernet = Fernet(key)
encMessage = fernet.encrypt(message.encode())
print(encMessage)
data.append(encMessage)
return data
data=encrypt(r1)
password=sep(data[1])
key=sep(data[0])
def postdata(a,b,c):
response=requests.post(a,{
"email":e, "password":b,
"key":c })
datainsert(e)
for i in range(4):
postdata(a[i],password[i],key[i])
6.2.7.Secured Environment Module
To protect the privacy and security of the loan application data, this project
utilizes Fernet cryptography encryption.
Fernet Cryptography Encryption:
This step involves encrypting the loan application data before storing it on
the servers. Fernet is a symmetric encryption algorithm that uses a shared secret
key to encrypt and decrypt data. This step helps to ensure that the data is secure
and cannot be accessed by unauthorized parties.
Fernet encryption uses symmetric encryption and message authentication
codes (MACs) to provide secure data transmission. The encryption process
involves the following steps:
1. Generate a random secret key K.
2. Generate a message authentication code (MAC) for the data to be encrypted
using the HMAC-SHA256 algorithm: MAC = HMAC-SHA256(K, data)
3. Encrypt the data using the AES-CBC algorithm with the secret key K:
encrypted_data = AES-CBC.encrypt(K, data)
4. Combine the encrypted data and the MAC into a token: token = base64.url
safe_b64encode(encrypted_data + MAC)
Fig 6.4 Cryptography Fernet Implementation
29
The decryption process involves the following steps:
1. Decode the token back to its original form: original = base64.urlsafe_b64
decode(token)
2. Extract the encrypted data and the MAC from the token: encrypted_data,
MAC = original[:-32], original[-32:]
3. Verify the MAC using the HMAC-SHA256 algorithm and the secret key K:
HMAC-SHA256(K, encrypted_data) == MAC
4. Decrypt the encrypted data using the AES-CBC algorithm and the secret
key K: data = AES-CBC.decrypt(K, encrypted_data)
The Fernet encryption process uses a combination of symmetric encryption
and message authentication codes.
Distributed Server System:
The Fernet encryption process uses a combination of symmetric encryption
and message authentication codes to ensure that the data remains secure and has
not been tampered with during transmission.
Fig 6.5 Split up Server Architecture
This project also utilizes split-up servers to maintain the hashed data from
users. This step involves dividing the data into separate parts and storing each part
on a different server. By doing so, the system makes it more difficult for hackers
to access all of the data at once. Each server maintains a hashed version of the
original data and keys, and these are used to reconstitute the original data.
30
The split-up servers were designed to work in parallel, which means that they
can process the data simultaneously and independently of each other. This enables
the system to handle a high volume of data and provide fast and accurate results.
Fig 6.6 Servers maintained in Nodejs Servers
Overall, the split-up servers plays a critical role in the loan application
approval system, ensuring that the data is processed efficiently, accurately, and
securely.
Database Storage:
Fig 6.7 MySQL storage value with hashed process
At the end of the process, the hashed data is stored in a MySQL database.
The database will contain the hashed version of the original data and keys, making
it difficult for hackers to access the data. The database can also be queried to
extract specific information about loan applications.
def checkpass(a1,b):
password=[]
key=[]
a=["http://localhost:8084/","http://localhost:8081/","http://localhost:8082/","http://lo
calhost:8083/"]
for i in a:
x = requests.get(i,data=a1)
r=x.text.split(" ")
print(x.text)
password.append(r[0])
key.append(r[1])
password="".join(password).encode()
key="".join(key).encode()
f=Fernet(key)
rp=f.decrypt(password).decode()
if (rp==b):
return 1
else:
return 0
31
Performance metrics:
The accuracy score of 85.7% indicates that the loan application approval
system has a relatively high level of accuracy in predicting loan approvals.
However, it is important to evaluate the system's performance using other metrics
as well.
One useful metric to consider is precision, which measures the proportion of
predicted loan approvals that are actually correct. This is important because false
positive predictions (incorrect approvals) can result in financial losses for the
lender. To calculate precision, we divide the number of true positives (correct
approvals) by the sum of true positives and false positives (incorrect approvals).
Fig 6.8 Accuracy Score, Recall, Precision, F1 Scores
In addition, we can plot a receiver operating characteristic (ROC) curve to
evaluate the model's performance at different threshold values. The area under the
curve (AUC) can then be calculated to provide an overall measure of the model's
performance. Overall, while the accuracy score of 85.7% is a good starting point,
it is important to evaluate the system's performance using a range of performance
metrics to get a more complete picture of its strengths and weaknesses.
32
CHAPTER 7
SYSTEM TESTING
7.1 TEST PLAN
The purpose of this test plan is to describe the approach and procedures that will
be used to test the exploratory data analysis in loan applicant approval using
cryptography fernet process with distributed servers for encrypted data split.
Functional Testing:
• Testing encryption and decryption process using Fernet.
• Testing the splitting of data across the four servers.
• Testing the communication between the servers to ensure that data is
transmitted correctly.
• Testing the ability of the system to handle different types of loan
applications.
Security Testing:
• Testing the authentication process to ensure that only authorized users can
access the system.
• Testing the authorization process to ensure that users have the appropriate
level of access to data.
• Testing the encryption process to ensure that all data is encrypted and
stored securely.
• Test system for potential vulnerabilities or weaknesses.
Performance Testing:
• Testing system response time under different load conditions.
• Testing system's ability to handle a high volume of loan applications.
• Testing system's ability to handle multiple users simultaneously.
33
7.2 TEST CASES
For checking the loan application, We have two testing aspects
• Eligible
• Not eligible
This is based on the training and testing the model we used in this application.
This eligibility can be checked by using the details entered by the users. This
includes the details like
• Gender
• Status
• Dependants
• Education
• Employ
• Income
• Co-income (additional income)
• Loan amount
• Loan amount term (in days)
• Credit history
• Aadhar/PAN Verification
• Property area (type of location)
Testing system's existing functionality to ensure that it has not been affected by
any changes.
Test Cases in this Lending Application:
Test Case 1: Functional Testing – Secured Data Storage.
Input : Loan application’s input data with various attributes such as
Loan Terms, Annual Income, age, income, and credit score,
etc..
34
Expected Output : The data should be successfully stored in the database with
all the attributes and values retained.
Actual Output : The data was successfully stored in the database with all the
attributes and values retained, and can be retrieved for
further analysis.
Test Case 2: Performance Testing - Prediction of Result using XGBoost
Input : A set of loan application records with various attributes such
as age, income, and credit score.
Expected Output : The XGBoost model should predict the loan approval status
of each application accurately based on the input attributes.
Actual Output : The XGBoost model predicted the loan approval status of
each application accurately based on the input attributes, and
the predictions can be used to inform lending decisions.
Test Case 3: Security Testing - Distributed Server for Data Transaction
Input : A set of loan application records stored in a distributed server
environment with multiple nodes.
Expected Output : The data should be able to be transferred seamlessly between
the nodes and the main database with no loss or corruption
of data.
Actual Output : The data was transferred seamlessly between the nodes and
the main database with no loss or corruption of data,
ensuring the integrity and security of the loan application
data.
35
CHAPTER 8
SIMULATION RESULTS
The simulation results of this project indicate that the implemented loan approval
system has a high accuracy rate in predicting the loan status of loan applications.
The exploratory data analysis conducted on the loan application data showed that
the majority of approved applications had a higher income, lower debt-to-income
ratio, and a longer credit history compared to the rejected loan applications.
8.1 DATA COLLECTION AND PACKAGE IMPORTING
The data collection process for this loan application project involved gathering
information from various sources, including public datasets, private financial
institutions, and individual loan applicants. We collected data on various factors
such as employment status, income, credit history, loan amount requested, and loan
purpose.
Fig 8.1 Importing Essential Libraries
36
Additionally, we also ensured compliance with relevant data privacy packages
and security regulations to protect the confidentiality and integrity of the collected
data. Overall, the data collection process was crucial in building a robust and reliable
loan application system that can accurately assess creditworthiness and provide fair
lending practices.
8.2 DATA STORED
The data stored in the database of MySQL after the user data has been split into
4 key and 4 cipher texts is crucial for the success of the project. The splitting of the
user data into 4 key and 4 cipher texts ensures that the user data is secure and
protected from any unauthorized access or manipulation. The use
Fig 8.2 Hashed Value Pair of plain text are stored in MySQL
Fig 8.3 Key values of the Cipher Text
of cryptography Fernet further enhances the security of the data by encrypting the
plain text user data and generating a secret key for decryption. This approach
ensures that only authorized users can access and analyze the user data.
37
Fig 8.4 Four maintained NodeJs ports using Apache server
The storage of the split user data in the database of MySQL further ensures that
the data is easily accessible for analysis. The use of Node.js servers further enhances
the efficiency of data transfer and storage, making it easier to access the data for
further analysis.
8.3 UI DEVELOPMENT (FLASK)
Fig 8.5 Login/Register Page Using cryptography fernet
Fernet algorithm is a symmetric encryption algorithm that generates a key
from a user-supplied passphrase to encrypt and decrypt data.
38
It is a secure and easy-to-use algorithm that is widely used for secure
communication and authentication systems.
Fig 8.6 User Interface using HTML, CSS, Js, Session
The landing account page is the first page that the user sees after successfully
logging in.
Fig 8.7 the loan application form developed for this project utilizes XGBoost
algorithm to predict the likelihood of loan approval. The form captures various
factors such as applicant's income, credit score, loan amount, and loan term to
generate an accurate prediction. This prediction can help financial institutions in
making informed decisions about loan approvals.
39
Fig 8.7 Loan Application Form
8.4 LOAN APPLICATION STATUS (APPROVED OR REJECTED)
The application status page is designed to display the loan application status
to the user. The XGBoost pickle model is used to evaluate the accuracy of the
loan prediction. The model analyzes the loan application data provided by the user
and returns a prediction of the likelihood of the loan being approved or rejected.
40
Fig 8.8 Application Status
The application status page displays this prediction to the user along with a
message indicating whether their loan application has been approved or rejected
based on the prediction. This page also displays other relevant information such as
the loan amount, interest rate, and repayment term.
8.5 PERFORMANCE METRICS
The accuracy score of 85.7% indicates that the loan application approval
system has a relatively high level of accuracy in predicting loan approvals.
However, it is important to evaluate the system's performance using other metrics
as well.
One useful metric to consider is precision, which measures the proportion of
predicted loan approvals that are actually correct. This is important because false
positive predictions (incorrect approvals) can result in financial losses for the
lender. To calculate precision, we divide the number of true positives (correct
approvals) by the sum of true positives and false positives (incorrect approvals).
41
Fig 8.9 Model Evaluation of test data:
Overall, while the accuracy score of 85.7% is a good starting point, it is
important to evaluate the system's performance using a range of performance
metrics to get a more complete picture of its strengths and weaknesses.
8.6 MAIL SENDER USING SMTP AND MIME
Fig 8.10 Email Loan Application Status
SMTP (Simple Mail Transfer Protocol) is a protocol used for sending email
messages between servers. It works by using a series of commands and responses
to transfer messages from the sender's email server to the recipient's email server.
42
SMTP is widely used for sending email messages over the Internet.
MIME (Multipurpose Internet Mail Extensions) is an Internet standard that
extends the format of email messages to support text in character sets other than
ASCII, as well as attachments of audio, video, images, and application programs.
MIME allows email messages to contain multiple parts, with different content
types and encoding methods.
Fig 8.11 Mail Sending Module Using SMTP Protocol
In the given code, the smtplib library is used to connect to the SMTP server
with TLS encryption, and the MIMEText class is used to create an email message
with a subject, body, and sender/recipient addresses. The msg.as_string() method
is used to convert the message to a string format that can be sent over the SMTP
connection. Finally, the server. sendmail() method is used to send the email from
the sender's email address to the recipient's email address.
def S_mail(m):
# SMTP server settings
SMTP_SERVER = 'smtp.gmail.com'
SMTP_PORT = 587
# Sender and recipient email addresses
FROM = 'jeganjega807@gmail.com'
TO = m
# Email message
msg = MIMEText(session['body'])
msg['Subject'] = 'Loan Application Status-ONLINE BANKING'
msg['From'] = formataddr(('Banking.site',FROM ))
msg['To'] = m
# Connect to the SMTP server with TLS encryption
with smtplib.SMTP(SMTP_SERVER, SMTP_PORT) as server:
server.starttls() # Enable TLS encryption
server.login(FROM, 'qxkviegnzmgtlnda') # Enter your email
account password
# Send the email
server.sendmail(FROM, TO, msg.as_string())
return 'success'
43
8.7 AADHAR/PAN VERIFICATION
Fig 8.12 checking Aadhar from Database
Aadhaar and PAN are two identification documents used in India for various
purposes, including financial transactions. Aadhaar is a 12-digit unique identity
number issued by the Unique Identification Authority of India (UIDAI) to
residents of India, while PAN (Permanent Account Number) is a ten-digit
alphanumeric number issued by the Income Tax Department.
Aadhaar verification involves checking whether the Aadhaar number entered
by the user is valid and matches the details of the person. This can be done by
accessing the UIDAI database and verifying the details using the Aadhaar API.
Fig 8.13 Invalid Aadhar Intimation
PAN verification involves checking whether the PAN number entered by the
user is valid and matches the details of the person. This can be done by accessing
the Income Tax Department database and verifying the details using the PAN
API. Verification of Aadhaar and PAN can help prevent fraud and ensure that
only legitimate users are allowed to access financial services.
def checkaadhar(a):
mydb = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="data_protection"
)
mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM aadhar WHERE adhrno='"+a+"'")
resu = mycursor.fetchall()
if(len(resu)!=0):
return 1
else:
return 0
# Determine the output
if checkaadhar(adhrno)!=1:
return render_template('application.html',
error="Invalid Aadhar card number")
44
CHAPTER 9
APPENDICES
9.1 SAMPLE CODE
======================Mainserver.py=======================
from flask import Flask,redirect,url_for,request,jsonify, render_template,session
from email.utils import formataddr
# Data manipulation
import pandas as pd
# Matrices manipulation
import numpy as np
import smtplib
from email.mime.text import MIMEText
from cryptography.fernet import Fernet
import mysql.connector
app = Flask(__name__)
app.secret_key = 'xsdhrtsrdj56s5rn7snsr67s'
def sample(e,r1):
a=["http://localhost:8084/","http://localhost:8081/","http://localhost:8082/","http://l
ocalhost:8083/"]
def datainsert(a):
mydb = mysql.connector.connect(
sql = "INSERT INTO data
(emailid,password1,password2,password3,password4,key1,key2,key3,key4)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)"
val = (a,"","","","","","","","")
mycursor.execute(sql, val)
mydb.commit()
45
fernet = Fernet(key)
encMessage = fernet.encrypt(message.encode())
print(encMessage)
@app.route('/',methods=["POST","GET"])
def hello_world():
if request.method=="POST":
user=request.form['emails']
user1=request.form['passwords']
if(checkemail(user)==0):
sample(user,user1)
session['id']=user
return render_template("index.html")
elif(checkemail(user)==1):
return redirect("http://localhost/FinalYR/index.php?signid=alr")
else:
return "success"
@app.route('/login',methods=["POST","GET"])
def hello_world1():
if request.method=="POST":
user=request.form['email']
user1=request.form['password']
result = ValuePredictor(data = df)
# Determine the output
if checkaadhar(adhrno)!=1:
return render_template('application.html', error="Invalid Aadhar card number")
elif int(result) == 1 and checkaadhar(adhrno)==1:
46
prediction = 'Dear Mr/Mrs/Ms {name}, your loan is approved!'.format(name =
name)
session['predict']=prediction
body="Dear {name},nnWe are pleased to inform you that your loan application
has been approved! We understand the importance of the financial support that you
require and we are thrilled to be able to help.nn once again and we wish you all
the best for your future endeavors!nn".format(name = name)
session['body']=body
else:
prediction = 'Sorry Mr/Mrs/Ms {name}, your loan is rejected!'.format(name =
name)
body="Dear {name},nnWe are writing to inform you of the status of your loan
application. After careful consideration and review of your application, we regret
to inform you that your loan application has been rejected.. We wish you the best
of luck in your financial endeavors.nSincerely,nOnline Banking".format(name =
name)
session['body']=body
S_mail(session['id'])
# redirect(url_for('/sent'))
# Return the prediction
return render_template('prediction.html', prediction = prediction)
# return redirect(url_for('S_mail')), render_template('prediction.html',
prediction=prediction)
# return (redirect(url_for('/sent')),render_template('prediction.html',
prediction=prediction))
# return (redirect(url_for('/sent')), render_template('prediction.html',
prediction=prediction))
47
# Something error
else:
# Return error
return render_template('error.html', prediction = prediction)
if __name__ == '__main__':
app.run(debug = True)
========================Index.html=======================
<!doctype html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-
to-fit=no">
<title>Online Banking | Cryptography Fernet </title>
<link
href="//fonts.googleapis.com/css2?family=Kumbh+Sans:wght@300;400;700&
display=swap" rel="stylesheet">
<link rel="stylesheet" href="https://kit.fontawesome.com/7745b6ed41.css"
crossorigin="anonymous">
<link rel="stylesheet" href="path/to/font-awesome/css/font-
awesome.min.css">
<link rel="icon" type="image/x-icon" href="/images/favicon.ico">
<i style="color:#614da7" class='fas fa-piggy-bank'></i>
<!-- Template CSS -->
<link rel="stylesheet" href="/static/assets/css/style-starter.css">
48
</head>
<body>
<!--header-->
<header id="site-header" class="fixed-top">
<div class="container">
<nav class="navbar navbar-expand-lg stroke px-0">
<h1> <a class="navbar-brand" href="landing.html">
<i class="fa fa-briefcase" aria-hidden="true" style="padding: 0 10px 0
0;"></i>Online Banking
</a></h1>
<!-- if logo is image enable this
<a class="navbar-brand" href="#index.html">
<img src="image-path" alt="Your logo" title="Your logo"
style="height:35px;" />
</a> -->
<p class="mt-md-4 mt-3">Our Bank is the best option if you are looking for
high-quality and reliable banking services. We provide reliable services for
you
</p><a class="btn btn-style btn-primary mt-sm-5 mt-4 mr-2" style="border-
Read More</a>
</div>
<div class="col-lg-5 col-md-8 img offset-lg-1 mt-lg-0 mt-4">
<img src="/static/assets/images/Terms.png" alt="img"
class="img-fluid radius-image-curve" />
</div></div></div></div></div></li></div></div></div></section>
49
==========Server1.js, Server2.js, Server3.js & Server4.js ==========
var http = require('http');
var mysql=require('mysql2');
const axios = require('axios');
con.connect(function(err) {
if (err) {
throw err;}
else{
console.log("Connected!");}
});
http.createServer(function (req, res) {
let data = '';
if(req.method=="POST"){
console.log(req.method);
req.on('data', chunk => {
data += chunk;
res.end("sucess");
str=String(data);
str=str.replaceAll("%40","@");
str=str.replaceAll("%3D","=");
e=str.slice(str.indexOf("email="),str.indexOf("&password="));
e=e.replace("email=","");
s2=str.slice(str.indexOf("&password="),str.indexOf("&key"));
s2=s2.replace("&password=","");
s1=str.slice(str.indexOf("&key="));
s1=s1.replace("&key=","");
50
console.log(str);
con.connect(function(err) {
if (err) throw err;
var sql = "UPDATE data SET password1 = '"+s2+"',"+"key1='"
+s1+"'WHERE emailid = '"+e+"'";
con.query(sql, function (err, result) {
if (err) throw err;
console.log(result.affectedRows + " record(s) updated");
});});})}
else if(req.method=="GET"){
req.on('data',chunk=>{
data += chunk;
str=String(data);
console.log(str);
con.connect(function(err) {
if (err) throw err;
con.query("SELECT password1, key1 FROM data WHERE
emailid='"+str+"'", function (err, result, fields) {
if(result.length!=0){
res.end(String(result[0].password1)+" "+String(result[0].key1));}
else{
res.end("fail");
}});
});})}
}).
listen(8084)
51
=====================Application.html======================
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>Loan Approval Prediction</title>
<!-- Font Icon -->
<link rel="stylesheet" href="../static/fonts/material-icon/css/material-design-
iconic-font.min.css">
<!-- Main css -->
<link rel="stylesheet" href="../static/styles/styleloan.css">
</head>
<body>
<!-- Aadhar card Details -->
<div class="form-group">
<label for="adhrno"><img src="/static/assets/images/aadharIco.png"
width="16px" alt=""></label>
<input type="number" name="adhrno" id="adhrno" placeholder="Enter Your
Aadhar Card No" required/><!--pattern="[0-9]{12}"-->
{% if error %}
<script>
alert('{{ error }}');
</script>
{% endif %}
52
</div>
<!-- Birthdate -->
<div class="form-group">
<label for="birthdate"><i class="zmdi zmdi-calendar"></i></label>
<input type="date" name="birthdate" id="birthdate" placeholder="Your
Birthdate" required/>
</div>
<!-- Applicant Income per Month -->
<div class="form-group">
<label for="applicant_income"> <img src="/static/assets/images/rupee-
indian.png" width="12px" alt=""></label>
<input type="number" min="0" name="applicant_income"
placeholder="Applicant Income per Month (INR)" required/>
</div>
<!-- Co-Applicant Income per Month -->
<div class="form-group">
<label for="coapplicant_income"> <img src="/static/assets/images/rupee-
indian.png" width="12px" alt=""></label>
<input type="number" min="0" name="coapplicant_income"
placeholder="Co-Applicant Income per Month (INR)" required/> </div>
<h3>Loan and Credit Description</h3>
<!-- Loan Amount -->
<div class="form-group">
<label for="loan_amount"> <img src="/static/assets/images/rupee-indian.png"
width="12px" alt=""></label>
<input type="number" min="0" name="loan_amount" placeholder="Your
Loan Amount (INR)" required/>
53
</div>
<!-- Loan Amount Term -->
<div class="form-group">
<label for="loan_term"><i class="zmdi zmdi-calendar-check"></i></label>
<input type="number" min="0" name="loan_term" placeholder="Your Loan
Term (days)" required/>
</div>
<!-- Credit History -->
<div class="form-group">
<label for="credit_history"> <img src="/static/assets/images/rupee-indian.png"
width="12px" alt=""></label>
<select name="credit_history" id="credit_history" placeholder="Your Credit
History" required>
<option value="" disabled selected>Your Credit History</option>
<option value="1">All Debts Paid</option>
<option value="0">Not paid</option>
</select>
</div>
<div class="signup-image">
<figure><img src="../static/images/Loan-Home.jpg" alt="Loan-Home
image"></figure>
</div></div></div></section></div>{% if error %}
<script>alert('{{ error }}');</script>
{% endif %}
</body>
</html>
54
9.2 SCREENSHOTS
9.2.1 Login/Register Page
Fig 9.1 Login/Register Page Using cryptography fernet.
9.2.2 Online Banking Website
Fig 9.2 Online Banking Website
55
9.2.3 Application Status Interface
Fig 9.3. Application Status Interface with flask
9.2.4 Performance Metrics
Fig 9.4. Performance Metrics
9.2.5 Fitting the Model with Train and Test data
Fig 9.5. Fitting the Model with Train and Test data
56
9.2.6 Loan Status by Education
Fig 9.6. Composition of Loan Status by Education
9.2.7 Loan Status by dependents
Fig 9.7. Composition of Loan Status by dependents
57
9.2.8 Loan Application Interface
Fig 9.8. Loan Application Interface
The loan application form developed for this project utilizes XGBoost
algorithm to predict the likelihood of loan approval. The form captures various
factors such as applicant's income, credit score, loan amount, and loan term to
generate an accurate prediction. This prediction can help financial institutions in
making informed decisions about loan approvals.
58
CHAPTER 10
CONCLUSION AND FUTURE ENHANCEMENT
10.1 CONCLUSION
In conclusion, the economic growth of a country is highly dependent on the
efficiency and effectiveness of its banking system. The ability to provide secure
and reliable financial transactions is a significant asset to the development of a
country. With the development of modern technologies, like the ones used in this
project, such as cryptography and machine learning algorithms, the banking
system can ensure the protection of sensitive information and enhance the
accuracy of loan application approval predictions. This project showcases the
potential for innovation and expertise in the development of secure and efficient
financial systems, which ultimately contribute to the growth and development of a
country.
10.2 FUTURE ENHANCEMENT
Integration with additional data sources: Currently, the system relies on the
data provided in the input Excel sheet. However, additional data sources, such as
credit scores or employment history, could be incorporated to improve the
accuracy of the model. Also Dynamic threshold adjustment: The current threshold
for loan approval is fixed at 0.5. However, in practice, it may be beneficial to
adjust this threshold dynamically based on factors such as the current economic
climate or the financial health of the lending institution.
Multi-party encryption: While the current system uses Fernet cryptography
to encrypt user data, it could be enhanced to support multi-party encryption. This
would allow for the encryption of data by multiple parties, such as the lending
institution, the borrower, and a third-party intermediary.
59
REFERENCES
1. Anand, Mayank, Arun Velu, and Pawan Whig. "Prediction of loan behavior
with machine learning models for secure banking." Journal of Computer Science
and Engineering (JCSE) 3.1 (2022): 1-13.
2. Praveen, Tumuluru , et al. "Comparative Analysis of Customer Loan Approval
Prediction using Machine Learning Algorithms." 2022 Second International
Conference on Artificial Intelligence and Smart Energy (ICAIS). IEEE, 2022.
3. Adebiyi, Marion O., et al. "Secured Loan Prediction System Using Artificial
Neural Network." Journal Of Engineering Science and Technology 17.2 (2022):
0854-0873.
4. Ashwini S, Kadam., et al. "Prediction for loan approval using machine learning
algorithm." International Research Journal of Engineering and Technology
(IRJET) 8.04 (2021).
5. Madaan, Mehul, et al. "Loan default prediction using decision trees and
random forest: A comparative study." IOP Conference Series: Materials Science
and Engineering. Vol. 1022. No. 1. IOP Publishing, 2021.
6. Singh, Vishal, et al. "Prediction of modernized loan approval system based on
machine learning approach." 2021 International Conference on Intelligent
Technologies (CONIT). IEEE, 2021.
7. Tejaswini, J., et al. "Accurate loan approval prediction based on machine
learning approach." Journal of Engineering Science 11.4 (2020): 523-532.
60
8. Geet, Shingi,. "A federated learning based approach for loan defaults
prediction." 2020 International Conference on Data Mining Workshops (ICDMW).
IEEE, 2020.
9. Ayushman Yadav and Vishal Singh, “Prediction of Modernized Loan approval
System Based on Machine Learning Approach” IEEE, 2021.
10. Mohammad J. Hamayel and Mohammad More, “Improvement of personal
loans granting methods in banks using machine learning methods and approaches
in Palestine”, IEEE, 2021.
11. Loan Approval Prediction using Machine Learning Algorithms Approach.
2021 [Ebook]. Retrieved from https://ijirt.org/master/publishedpaper/IJIRT151
769_PAPER.pdf.
12. Anshika Gupta and Vinay Pant, “Bank Loan Prediction System using Machine
Learning”, IEEE 2020.
13. A. K. Goel and T. Kumar, M. A. Sheikh, "An Approach for Prediction of
Loan Approval using Machine Learning Algorithm," 2020 International
Conference on Electronics and Sustainable Communication Systems (ICESC), pp.
490-494, 2020.
14. P.K Bansal, A Gupta, S Kumar and V Pant, "Bank Loan Prediction System
using Machine Learning", IEEE 9th International Conference System Modeling
and Advancement in Research Trends, pp. 423-426, December 2020.
15. Tejaswini, J., et al. "Accurate loan approval prediction based on machine
learning approach." Journal of Engineering Science vol. 11, no.4, pp. 523-532.
2020.
61
PUBLICATION
JEGAN S, NAVEEN V, VIJAYABARATH D, Mr.K.KARTHICK,
‘EXPLORATORY DATA ANALYSIS IN LOAN APPLICANT
APPROVAL’, in the Third International Conference on Artificial Intelligence,
5G Communications and Network Technologies (ICA5NT 2023) at
VELAMMAL INSTITUTE OF TECHNOLOGY held on 23rd
and 24th
of March
2023.

More Related Content

PPT
Graph Coloring : Greedy Algorithm & Welsh Powell Algorithm
PPTX
Support Vector Machine ppt presentation
PPTX
Unsupervised learning
PDF
Railway management system, database mini project
PPT
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
PDF
Machine Learning Clustering
PPTX
SPADE -
Graph Coloring : Greedy Algorithm & Welsh Powell Algorithm
Support Vector Machine ppt presentation
Unsupervised learning
Railway management system, database mini project
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Machine Learning Clustering
SPADE -

What's hot (20)

PPTX
Introduction to Big Data/Machine Learning
PDF
Linear regression
PPTX
Convolution Neural Network (CNN)
PPTX
Presentation on unsupervised learning
PPTX
K-Nearest Neighbor Classifier
PPTX
The Deep Learning Glossary
DOC
Online Shopping System Test case Writing
PDF
Introduction to Model-Based Machine Learning
PPT
Graph colouring
PPTX
PPT 7.4.2015
PDF
GTSRB Traffic Sign recognition using machine learning
PPTX
Supervised Machine Learning
PPTX
Fuzzy Clustering(C-means, K-means)
PDF
IRJET- An Android Application for Blood and Organ Donation Management
PPTX
Cyber cash
PPTX
ML_Unit_1_Part_C
PDF
Sequential Pattern Mining and GSP
PDF
Johnson's algorithm
PPT
Instance Based Learning in Machine Learning
Introduction to Big Data/Machine Learning
Linear regression
Convolution Neural Network (CNN)
Presentation on unsupervised learning
K-Nearest Neighbor Classifier
The Deep Learning Glossary
Online Shopping System Test case Writing
Introduction to Model-Based Machine Learning
Graph colouring
PPT 7.4.2015
GTSRB Traffic Sign recognition using machine learning
Supervised Machine Learning
Fuzzy Clustering(C-means, K-means)
IRJET- An Android Application for Blood and Organ Donation Management
Cyber cash
ML_Unit_1_Part_C
Sequential Pattern Mining and GSP
Johnson's algorithm
Instance Based Learning in Machine Learning
Ad

Similar to Report[Batch-08].pdf (20)

PDF
IOT lab ManualArduino_IOTArduino_IOTArdu
PDF
NAAC PPT1
DOCX
SE-LAB-Manual.docx of software engineering
DOCX
Car Price prediction final pdf1.docx
PPTX
Terminologies.pptx
PPTX
Key Components of OBE.pptx
PPT
Electrical depy.ppt
PDF
PEO, PO'S & PSO'S.pdf
PPTX
OBE PPT 1.pptx outcomes based education edu
PDF
CS8383 Object Oriented Programming Laboratory Manual
PPT
Key Components of OBE for NBA and preparing Course file
PPTX
OUTCOME BASED EDUCATON BY PROFESSOR DR. B.V. RANGANATH
PPTX
NBA OUTCOME BASED EDUCATION ACCREDITATION
PDF
Workshop manual
PDF
PPTX
Chennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptx
DOCX
DBMS LAB MANUAL RECORD BOOK TO BE USED BY BTECH STUDENTS
PDF
Exploring the Exciting World of B.Tech in Computer Science and Engineering
PDF
Dsp lab manual
PPTX
OBE Introduction for students OBE Introduction for students
IOT lab ManualArduino_IOTArduino_IOTArdu
NAAC PPT1
SE-LAB-Manual.docx of software engineering
Car Price prediction final pdf1.docx
Terminologies.pptx
Key Components of OBE.pptx
Electrical depy.ppt
PEO, PO'S & PSO'S.pdf
OBE PPT 1.pptx outcomes based education edu
CS8383 Object Oriented Programming Laboratory Manual
Key Components of OBE for NBA and preparing Course file
OUTCOME BASED EDUCATON BY PROFESSOR DR. B.V. RANGANATH
NBA OUTCOME BASED EDUCATION ACCREDITATION
Workshop manual
Chennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptx
DBMS LAB MANUAL RECORD BOOK TO BE USED BY BTECH STUDENTS
Exploring the Exciting World of B.Tech in Computer Science and Engineering
Dsp lab manual
OBE Introduction for students OBE Introduction for students
Ad

Recently uploaded (20)

PPT
Lines and angles cbse class 9 math chemistry
PPTX
code of ethics.pptxdvhwbssssSAssscasascc
PPTX
Operating System Processes_Scheduler OSS
PPTX
02fdgfhfhfhghghhhhhhhhhhhhhhhhhhhhh.pptx
PDF
Prescription1 which to be used for periodo
PPTX
5. MEASURE OF INTERIOR AND EXTERIOR- MATATAG CURRICULUM.pptx
PPTX
Lecture-3-Computer-programming for BS InfoTech
PPTX
figurative-languagepowerpoint-150309132252-conversion-gate01.pptx
PPT
FABRICATION OF MOS FET BJT DEVICES IN NANOMETER
PPT
Hypersensitivity Namisha1111111111-WPS.ppt
PPTX
ERP good ERP good ERP good ERP good good ERP good ERP good
PPTX
Sem-8 project ppt fortvfvmat uyyjhuj.pptx
PPTX
1.pptxsadafqefeqfeqfeffeqfqeqfeqefqfeqfqeffqe
PPTX
"Fundamentals of Digital Image Processing: A Visual Approach"
PDF
Smarter Security: How Door Access Control Works with Alarms & CCTV
PPTX
Embedded for Artificial Intelligence 1.pptx
PPTX
了解新西兰毕业证(Wintec毕业证书)怀卡托理工学院毕业证存档可查的
PPTX
making presentation that do no stick.pptx
PPTX
Presentacion compuuuuuuuuuuuuuuuuuuuuuuu
PDF
YKS Chrome Plated Brass Safety Valve Product Catalogue
Lines and angles cbse class 9 math chemistry
code of ethics.pptxdvhwbssssSAssscasascc
Operating System Processes_Scheduler OSS
02fdgfhfhfhghghhhhhhhhhhhhhhhhhhhhh.pptx
Prescription1 which to be used for periodo
5. MEASURE OF INTERIOR AND EXTERIOR- MATATAG CURRICULUM.pptx
Lecture-3-Computer-programming for BS InfoTech
figurative-languagepowerpoint-150309132252-conversion-gate01.pptx
FABRICATION OF MOS FET BJT DEVICES IN NANOMETER
Hypersensitivity Namisha1111111111-WPS.ppt
ERP good ERP good ERP good ERP good good ERP good ERP good
Sem-8 project ppt fortvfvmat uyyjhuj.pptx
1.pptxsadafqefeqfeqfeffeqfqeqfeqefqfeqfqeffqe
"Fundamentals of Digital Image Processing: A Visual Approach"
Smarter Security: How Door Access Control Works with Alarms & CCTV
Embedded for Artificial Intelligence 1.pptx
了解新西兰毕业证(Wintec毕业证书)怀卡托理工学院毕业证存档可查的
making presentation that do no stick.pptx
Presentacion compuuuuuuuuuuuuuuuuuuuuuuu
YKS Chrome Plated Brass Safety Valve Product Catalogue

Report[Batch-08].pdf

  • 1. i EXPLORATORY DATA ANALYSIS IN LOAN APPLICANT APPROVAL A PROJECT REPORT Submitted by JEGAN. S NAVEEN. V VIJAYA BARATH. D in partial fulfillment for the award of the degree of BACHELOR OF ENGINEERING in COMPUTER SCIENCE AND ENGINEERING KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY (AUTONOMOUS) ANNA UNIVERSITY :: CHENNAI 600 025 APRIL 2023
  • 2. ii KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY, (AUTONOMOUS) Tholurpatti (Po), Thottiam (Tk), Trichy (Dt) – 621 215 COLLEGE VISION & MISSION STATEMENT VISION “To become an Internationally Renowned Institution in Technical Education, Research and Development by Transforming the Students into Competent Professionals with Leadership Skills and Ethical Values.” MISSION • Providing the Best Resources and Infrastructure. • Creating Learner centric Environment and continuous –Learning. • Promoting Effective Links with Intellectuals and Industries. • Enriching Employability and Entrepreneurial Skills. • Adapting to Changes for Sustainable Development.
  • 3. iii COMPUTER SCIENCE AND ENGINEERING VISION & MISSION STATEMENT VISION To produce competent software professionals, academicians, researchers, and entrepreneurs with moral values through quality education in the field of Computer Science and Engineering. MISSION • Enrich the students' knowledge and computing skills through an innovative teaching- learning process with state- of- art- infrastructure facilities. • Endeavour the students to become an entrepreneur and employable through adequate industry-institute interaction. • Inculcating leadership skills, professional communication skills with moral and ethical values to serve the society and focus on students' overall development PROGRAM EDUCATIONAL OBJECTIVES (PEOs) PEO I: Graduates shall be professionals with expertise in the fields of Software Engineering, Networking, Data Mining, and Cloud Computing and shall undertake Software Development, Teaching and Research. PEO II: Graduates will analyze problems, design solutions, and develop programs with sound Domain Knowledge. PEO III: Graduates shall have professional ethics, team spirit, lifelong learning, good oral and written communication skills, and adopt the corporate culture, core values and leadership skills. PROGRAM SPECIFIC OUTCOMES (PSOs) PSO1: Professional skills: Students shall understand, analyze and develop computer applications in the field of Data Mining/Analytics, Cloud Computing, Networking, etc., to meet the requirements of industry and society. PSO2: Competency: Students shall qualify at the State, National, and International level competitive examination for employment, higher studies, and research.
  • 4. iv PROGRAM OUTCOMES (PO’s) 1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals, and an engineering specialization to the solution of complex engineering problems. 2. Problem analysis: Identity, formulate, review research literature, and analyze complex engineering problems reaching substantiated conclusions using the first principles of mathematics, natural sciences, and engineering sciences. 3. Design/development of solutions: Design solutions for complex engineering problems and design system components or processes that meet the specified needs with appropriate consideration for public health and safety, and the ural, societal, and environmental considerations. 4. Conduct investigations of complex problems: Use research-based knowledge and research methods including design of experiments, analysis, and interpretation of data, and synthesis of the information to provide valid conclusions. 5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern engineering and IT tools including prediction and modeling to complex engineering activities with an understanding of the limitations. 6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional engineering practice. 7. Environment and sustainability: Understand the impact of the professional engineering solutions in societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable development. 8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the engineering practice. 9. Individual and teamwork: Function effectively as an individual, and as a member or leader in diverse teams, and multidisciplinary settings. 10. Communication: Communicate effectively on complex engineering activities with the engineering community and with society at large, such as being able to comprehend and write effective reports and design documentation, make effective presentations, and give and receive clear instructions. 11. Project management and finance: Demonstrate knowledge and understanding of the engineering and management principles and apply these to one’s work, as a member and leader in a team, to manage projects and in multidisciplinary environments. 12. Life-Long Learning: Recognize the need for and have the preparation and ability to engage in independent and life-long learning in the broadest context of technological change.
  • 5. v KONGUNADU COLLEGE OF ENGINEERING AND TECHNOLOGY (AUTONOMOUS) ANNA UNIVERSITY : CHENNAI 600 025 BONAFIDE CERTIFICATE Certified that this project report “EXPLORATORY DATA ANALYSIS IN LOAN APPLICANT APPROVAL” is the bonafide work of “JEGAN S (621319104017), NAVEEN V (621319104037), VIJAYABARATH D (621319104063)” who carried out the project work under my supervision. SIGNATURE SIGNATURE Dr.C.Saravanabhavan, M.Tech., Ph.D., Mr.K.Karthick, M.E., (Ph.D.,) HEAD OF THE DEPARTMENT SUPERVISOR Assistant Professor, Department of Computer Science and Department of Computer Science Engineering, and Engineering, Kongunadu College of Engineering and Kongunadu College of Engineering Technology, Thottiam, Trichy. and Technology, Thottiam, Trichy. Submitted for the Project Viva-Voce examination held on Internal Examiner External Examiner
  • 6. vi ACKNOWLEDGEMENT We wish to express our sincere thanks to our beloved Chairman Dr.PSK.R.PERIASWAMY for providing immense facilities in our institution. We proudly render our thanks to our Principal Dr.R.ASOKAN, M.S., M.Tech., Ph.D., for the facilities and the encouragement was given by him to the progress and completion of our project. We proudly render our immense gratitude and sincere thanks to our Head of the Department of Computer Science and Engineering Dr.C.SARAVANABHAVAN, M.Tech., Ph.D., for his effective leadership, encouragement, and, guidance in the project. We are highly indebted to provide our heart full thanks to our supervisor Mr.K.KARTHICK, M.E., (Ph.D.,) for his valuable suggestion during execution of our project work and for continued encouragement in conveying us for making many constructive comments for improving comments the operation of this project report. We are highly indebted to provide our heart full thanks to our project coordinator Mr.K.KARTHICK, M.E., (Ph.D.,) for his valuable ideas, constant encouragement, and supportive guidance throughout the project. We wish to extend our sincere thanks to all teaching and non-teaching staff of the Computer Science and Engineering department for their valuable suggestion, cooperation, and encouragement in the successful completion of this project. We wish to acknowledge the help received from various departments and various individuals during the preparation and editing stages of the manuscript.
  • 7. vii ABSTRACT The loan lending process is the manual evaluation of loan applications which can lead to errors, discrimination, lack of transparency, and lack of fair lending practices. The Solution in computerized system typically includes a user-friendly interface for loan applicants to submit their information, and a set of algorithms and rules to assess the applicant's creditworthiness and ability to repay the loan. The system also integrates with external credit bureaus and financial institutions to retrieve additional information about the applicant. The system can quickly and accurately evaluate a large volume of loan applications, and provide a decision (approval or rejection) in real-time using XGBoost Algorithm. This can help lenders to reduce the cost and time for loan processing and also to decrease the risk of loan defaults. In addition to the encryption of data, the system also includes functionality to send approval/rejection emails to applicants. This helps to streamline the communication process and keep applicants informed of the status of their loan application. Overall, the combination of data encryption and automated email communication helps to improve the security and efficiency of the process and XGBoost Algorithm which results 85.7% Accuracy rate.
  • 8. viii TABLE OF CONTENTS CHAPTER NO TITLE PAGE NO ABSTRACT vii LIST OF FIGURES x LIST OF ABBREVIATIONS xii 1. INTRODUCTION 1 1.1 OVERVIEW 1 1.2 OBJECTIVES AND GOALS 2 1.3 APPLIED DATA SCIENCE (ADS) 3 1.3 MACHINE LEARNING (ML) 6 2. LITERATURE SURVEY 7 3. SYSTEM ANALYSIS 15 3.1 EXISTING SYSTEM 15 3.1.1 Disadvantages 15 3.2 PROPOSED SYSTEM 16 3.2.1 Advantages 17 4. SYSTEM REQUIREMENTS 18 4.1 HARDWARE REQUIREMENTS 18 4.2 SOFTWARE REQUIREMENTS 18 5. SYSTEM DESIGN 19 5.1 ARCHITECTURE DIAGRAM 19 6. SYSTEM IMPLEMENTATION 23 6.1 MODULES 23 6.2 MODULES DESCRIPTION 23 6.2.1 Data Collection 23 6.2.2 Data Preprocessing 24 6.2.3 Data Visualization 24 6.2.4 Exploratory Data Analysis 25
  • 9. ix 6.2.5 Model Building 25 6.2.6 Model Deployment 27 6.2.5 Secured Environment Module 28 7. SYSTEM TESTING 32 7.1 TEST PLAN 32 7.2 TEST CASES 33 8. SIMULATION RESULTS 35 8.1 DATA COLLECTION AND PACKAGE IMPORTING 35 8.2 DATA STORED 36 8.3 UI DEVELOPMENT (FLASK) 37 8.4 LOAN APPLICATION STATUS (APPROVED/REJECTED) 39 8.5 PERFORMANCE METRICS 40 8.6 MAIL SENDER USING SMTP AND MIME 41 8.7 AADHAR/PAN VERIFICATION 43 9. APPENDICES 44 9.1 SAMPLE CODE 44 9.2 SCREENSHOTS 54 10. CONCLUSION AND FUTURE ENHANCEMENT 58 10.1 CONCLUSION 58 10.2 FUTURE ENHANCEMENT 58 REFERENCES 59 PUBLICATION 61
  • 10. x LIST OF FIGURES FIGURE NO NAME OF THE FIGURE PAGE NO 5.1 Architecture diagram 19 5.2 Cipher Block Chaining (CBC)mode encryption 20 5.3 Cipher Block Chaining (CBC) mode decryption 21 5.4 Formulae based Algorithm 22 6.1 Data Collection (Train and Test Data) 23 6.2 Data Visualization 24 6.3 Loading Model to Interface 26 6.4 Cryptography Fernet Implementation 28 6.5 Split up Server Architecture 29 6.6 Servers maintained in Nodejs Servers 30 6.7 MySQL storage value with hashed process. 30 6.8 Accuracy Score, Recall, Precision, F1 Scores 31 8.1 Importing Essential Libraries 35 8.2 Hashed Value Pair of plain text are stored in MySQL 36 8.3 Key values of the Cipher Text 36 8.4 Four maintained NodeJs ports using Apache server. 37 8.5 Login/Register Page Using cryptography fernet 37 8.6 User Interface using HTML, CSS, Js, Session 38 8.7 Loan Application Form 39 8.8 Application Status 40 8.9 Model Evaluation of test data 41
  • 11. xi 8.10 Email Loan Application Status 41 8.11 Mail Sending Module Using SMTP Protocol 42 8.12 Checking Aadhar from Database 43 8.13 Invalid Aadhar Intimation 43 9.1 Login/Register Page Using cryptography fernet 54 9.2 Online Banking Website 54 9.3 Application Status Interface with flask 55 9.4 Performance Metrics 55 9.5 Fitting the Model with Train and Test data 55 9.6 Composition of Loan Status by Education 56 9.7 Composition of Loan Status by dependents 56 9.8 Loan Application Interface 57
  • 12. xii LIST OF ABBREVIATIONS AES Advanced Encryption Standard AI Artificial Intelligence CBC Cipher Block Chaining CSS Cascading Style Sheet DL Deep Learning HMAC Hash-based Message Authentication Code HMAC-SHA256 Hash-based Message Authentication Code with Secure Hash Algorithm 256-bit. HTML Hypertext Markup Language JS JavaScript JSON JavaScript Object Notation KDF Key Derivation Function ML Machine Learning NLTK Natural Language Toolkit PAN Permanent Account Number UI User Interface XGBoost eXtreme Gradient Boosting Algorithm
  • 13. 1 1.1 OVERVIEW CHAPTER 1 INTRODUCTION In the loan application approval project, the goal is to conduct an Exploratory Data Analysis (EDA) on loan applicant data to determine the factors that influence loan approval. The data collected for the EDA will include information about the loan applicant, such as personal and financial details, as well as information about the loan, such as the loan amount and purpose. The data will be cleaned and processed to ensure it is suitable for analysis, after which various data visualizations and statistical methods will be used to explore the data and uncover patterns and relationships. The results of the EDA will provide valuable insights into the loan approval process and potentially improve its accuracy. Data security and privacy are paramount concerns in any project involving sensitive information. In the loan application approval project, sensitive information from the loan applicants is encrypted using the Fernet cryptography method to ensure privacy and protect against unauthorized access. This method uses state-of-the-art encryption techniques to secure the sensitive information, helping to meet regulations regarding data privacy and security and reducing the risk of penalties. The Fernet cryptography method not only ensures the privacy and security of sensitive information, but it also helps to build trust with loan applicants. Loan applicants are often hesitant to share sensitive information due to concerns about its security and privacy. By using encryption to secure their information, loan applicants can be assured that their information will be protected, increasing the likelihood that they will be willing to share the information necessary for the loan approval process.
  • 14. 2 1.2 OBJECTIVES AND GOALS The primary objectives of the project "Exploratory Data Analysis in Loan Applicant Approval" are: 1. To create an online loan approval system that is user-friendly and accessible to both loan applicants and lenders. 2. To implement the Fernet algorithm for secure communication and data transmission between the loan applicant and lender. 3. To automate the loan approval process and reduce the time and effort required for manual processing. 4. To ensure the confidentiality and integrity of sensitive information such as personal details and loan applications. 5. To provide a platform for loan applicants to easily apply for loans and receive instant loan approval or rejection based on their creditworthiness. Goals: The Primary goals of the project "Exploratory Data Analysis in Loan Applicant Approval" are: 1. To increase the accessibility and efficiency of the loan approval process. 2. To improve the security of sensitive information during communication and data transmission. 3. To reduce the risk of fraud and identity theft. 4. To provide loan applicants with an improved user experience and a quick and convenient way to apply for loans. 5. To benefit lenders by automating the loan approval process, reducing manual processing time and effort, and providing a secure platform for loan applications.
  • 15. 3 1.3 APPLIED DATA SCIENCE Applied data science is the process of using statistical and computational methods to analyze data, extract insights and knowledge, and apply them to real- world problems or decision-making. It involves a combination of skills and expertise in mathematics, statistics, programming, and domain-specific knowledge. The process of applied data science typically involves the following steps: • Problem identification: Identify a problem or a question that can be addressed through data analysis. • Data collection: Gather relevant data from various sources, such as databases, APIs, or surveys. • Data preprocessing: Clean and transform the data into a suitable format for analysis, including handling missing values, outliers, and encoding categorical variables. • Data exploration and visualization: Explore and visualize the data to gain insights and identify patterns or trends. • Statistical modeling: Apply statistical and machine learning techniques to develop models that can explain or predict the target variable. • Model evaluation: Evaluate the performance of the models using appropriate metrics and compare them with each other. • Deployment and monitoring: Deploy the model in production and monitor its performance over time to ensure its effectiveness and accuracy. Applied data science can be applied in various fields, such as finance, healthcare, marketing, and social sciences. Its applications include fraud detection, customer segmentation, disease diagnosis, image and speech recognition, and many more.
  • 16. 4 More Relatively Applied Data Science provides Communication and interpretation: Communicate the results of the data analysis to stakeholders in a clear and understandable way. This involves translating complex statistical concepts and results into actionable insights and recommendations. Data ethics: Consider ethical issues and potential biases that may arise from the data or the models. For example, data privacy, fairness, and transparency are important ethical considerations in data science. Data engineering: Develop and maintain the infrastructure and tools for data collection, storage, and processing. This involves working with big data technologies such as Hadoop, Spark, and NoSQL databases. Collaborative work: Data science projects often involve working in interdisciplinary teams, including data scientists, domain experts, software engineers, and business analysts. Effective communication and collaboration are essential for the success of such projects. Continuous learning: Data science is a rapidly evolving field, and staying up-to- date with new techniques, tools, and trends is crucial for success. Continuous learning and professional development are necessary to keep pace with the changing landscape of data science. Feature engineering: This involves selecting and creating relevant features (i.e., variables) from the raw data to improve the accuracy of machine learning models. Feature engineering requires a deep understanding of the data and the problem at hand. Natural Language Processing (NLP): NLP is a subfield of data science that focuses on understanding and analyzing human language. NLP techniques can be used for tasks such as sentiment analysis, text classification, and language translation.
  • 17. 5 Computer Vision: Computer vision is another subfield of data science that deals with the interpretation of images and videos. Computer vision techniques can be used for tasks such as object recognition, facial recognition, and image segmentation. Time series analysis: Time series analysis is a statistical technique that deals with data that changes over time. Time series analysis can be used for tasks such as forecasting future values, identifying trends and patterns, and detecting anomalies. Cloud computing: Cloud computing has become increasingly important in data science due to the need for large-scale data processing and storage. Cloud-based platforms such as AWS, Google Cloud, and Microsoft Azure provide scalable and cost-effective solutions for data science projects. Data visualization: Data visualization is the process of creating visual representations of data to communicate insights and findings. Effective data visualization requires a balance between aesthetics and functionality, and it can help stakeholders better understand complex data. Deep learning: Deep learning is a subfield of machine learning that focuses on training neural networks with multiple layers. Deep learning techniques have been used for tasks such as image recognition, speech recognition, and natural language processing.
  • 18. 6 1.4 MACHINE LEARNING Machine learning is a subfield of artificial intelligence that focuses on the development of algorithms and statistical models that enable computers to "learn" from and make predictions or decisions without being explicitly programmed. It's based on the idea that a computer program can learn from data, identify patterns, and make decisions with minimal human intervention. Machine learning has become increasingly popular in recent years, as the growth of big data and the availability of powerful computing resources have made it possible to process large amounts of data and develop complex models. There are several types of machine learning, including supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. In supervised learning, the algorithm is trained on a labeled dataset, where the correct output is provided for each input. In unsupervised learning, the algorithm is not given labeled data and must find patterns or structure in the data on its own. Semi-supervised learning is a combination of supervised and unsupervised learning, where the algorithm is provided with some labeled data and must find structure in the remaining unlabeled data. In reinforcement learning, the algorithm receives feedback in the form of rewards or penalties for its actions, and learns to make decisions based on this feedback. Machine learning is used in a variety of applications, including image recognition, natural language processing, fraud detection, recommendation systems, and predictive maintenance. Despite its many advantages, machine learning also has its challenges, such as the potential for biased results, the difficulty in interpreting complex models, and the need for large amounts of high- quality training data. However, with continued advancements in the field, machine learning is poised to become an even more integral part of our lives and revolutionize the way we interact with technology.
  • 19. 7 CHAPTER 2 LITERATURE SURVEY [1] TITLE: Prediction of Loan Behavior with Machine Learning Models for Secure Banking (2022) Author: Anand, Mayank, Arun Velu, and Pawan Whig Given loan default prediction has such a large impact on earnings, it is one of the most influential factor on credit score that banks and other financial organizations face. There have been several traditional methods for mining information about a loan application and some new machine learning methods of which, most of these methods appear to be failing, as the number of defaults in loans has increased. For loan default prediction, a variety of techniques such as Multiple Logistic Regression, Decision Tree, Random Forests, Gaussian Naive Bayes, Support Vector Machines, and other ensemble methods are presented in this research work. The prediction is based on loan data from multiple internet sources such as Kaggle, as well as data sets from the applicant's loan application. Significant evaluation measures including Confusion Matrix, Accuracy, Recall, Precision, F1- Score, ROC analysis area and Feature Importance has been calculated and shown in the results section. It is found that Extra Trees Classifier and Random Forest has highest Accuracy of using predictive modelling, this research concludes effectual results for loan credit disapproval on vulnerable consumers from a large number of loan applications. Techniques: Extra Trees Classifier and Random Forest. Merits: Loan default prediction using machine learning techniques can help financial organizations to make more informed decisions about loan approval.
  • 20. 8 [2] TITLE: Comparative Analysis of Customer Loan Approval Prediction using Machine Learning Algorithms. (2022) Author: Tumuluru & Praveen In today’s increasingly competitive market, estimating the risk involved in a loan application is one of the most crucial challenges for banks’ survival and profitability. The banks receive many loan applications from their customers and other individuals daily. Not every applicant is accepted. Most banks employ their credit scoring and risk assessment procedures to examine loan applications and make credit approval decisions. Despite this, many incidents of people failing to repay loans or defaulting on them occur every year, causing financial institutions to lose a significant amount of money. In this study, Machine Learning (ML) algorithms are used to extract patterns from a common loan-approved dataset and retrieve patterns in forecasting future loan defaulters. Customers’ past data, such as their age, income, loan amount, and tenure of work, will be used to conduct the analysis. To determine the maximum relevant features, i.e. the factors that have the most impact on the prediction outcome, various ML algorithms such as Random Forest, Support Vector Machine, K-Nearest Neighbor and Logistic Regression, were used. These mentioned algorithms are evaluated with the standard metrics and compared with each other. The random forest algorithm achieves better accuracy. Techniques: Random Forest algorithm, K-Nearest Neighbor algorithm Merits: Machine learning algorithms can extract patterns from large datasets that are difficult for humans to recognize.
  • 21. 9 [3] TITLE: The biometric cardless transaction with shuffling keypad using proximity sensor (2020) Author: Adebiyi & Marion O. Loan approval is an essential factor that decides the loss or gains a financial institution would accrue at the end of a fiscal year. Banks are looking for ways to ensure that these loans are paid back within the specified period. Therefore, this study aims to develop a loan prediction system using Artificial Neural Network that will determine whether a loan is a good or bad one and whether a loan is a payable debt or bad debt. The system can also assist to predict whether a loan applicant would default in repayment or not. The study used an Artificial Neural Network algorithm to develop a loan prediction scheme. The system was designed and implemented using Python as the programming language, Hypertext Mark-Up Language (HTML), Cascading Style sheet (CSS) for the front end, and then PHP as the backend. The system also used the confusion matrix as the performance metrics to evaluate the system accuracy. The result shows that the system has 92% accuracy which showed that the developed system predicted well and can predict whether a loan applicant would default in repayment or not. The system can also predict whether a loan is a bad debtor payment one. The system was finally compared with other previous researches using the accuracy of the system and it was concluded that the proposed system performed better than the previous researches. Techniques: Artificial Neural Network (ANN) algorithm was used to develop the loan prediction system. Merits: The system used the confusion matrix as the performance metrics to evaluate the system's accuracy. This is a widely used method for evaluating the accuracy of classification models.
  • 22. 10 [4] TITLE: OTP based cardless transction using ATM (2019) Author: Kadam & Ashwini S In this banking system, banks have many products to sell but main source of income of any banks is on its credit line. So they can earn from interest of those loans which they credits. A bank’s profit or a loss depends to a large extent on loans i.e. whether the customers are paying back the loan or defaulting. By predicting the loan defaulters, the bank can reduce its Non-performing Assets. This makes the study of this phenomenon very important. Previous research in this era has shown that there are so many methods to study the problem of controlling loan default. But as the right predictions are very important for the maximization of profits, it is essential to study the nature of the different methods and their comparison. A very important approach in predictive analytics is used to study the problem of predicting loan defaulters (i) Collection of Data, (ii) Data Cleaning and (iii) Performance Evaluation. Experimental tests found that the Naïve Bayes model has better performance than other models in terms of loan forecasting. Techniques: Naïve Bayes Model. Merits: The study shows that the Naïve Bayes model performs better than other models in predicting loan defaulters. This means that the model can accurately predict whether an applicant is likely to default on their loan or not.
  • 23. 11 [5] TITLE: Loan Default Prediction using Decision Trees and Random Forest: A Comparative Study (2021) Author: Mehul Madaan, Aniket Kumar, Chirag Keshri, Rachna Jain and Preeti Nagrath. With the improving banking sector in recent times and the increasing trend of taking loans, a large population applies for bank loans. But one of the major problem banking sectors face in this ever-changing economy is the increasing rate of loan defaults, and the banking authorities are finding it more difficult to correctly assess loan requests and tackle the risks of people defaulting on loans. The two most critical questions in the banking industry are (i) How risky is the borrower? and (ii) Given the borrower's risk, should we lend him/her? In light of the given problems, this paper proposes two machine learning models to predict whether an individual should be given a loan by assessing certain attributes and therefore help the banking authorities by easing their process of selecting suitable people from a given list of candidates who applied for a loan. This paper does a comprehensive and comparative analysis between two algorithms (i) Random Forest, and (ii) Decision Trees. Both the algorithms have been used on the same dataset and the conclusions have been made with results showing that the Random Forest algorithm outperformed the Decision Tree algorithm with much higher accuracy. Techniques: Random Forest and Decision Trees. Merits: The study addresses a critical problem faced by the banking industry: the increasing rate of loan defaults and the need to assess loan requests and borrower risk.
  • 24. 12 [6] TITLE: Prediction of Modernized Loan Approval System Based on Machine Learning Approach (2021) Author: Vishal Singh, Ayushman Yadav & Rajat Awasthi Technology has boosted the existence of humankind the quality of life they live. Every day we are planning to create something new and different. We have a solution for every other problem we have machines to support our lives and make us somewhat complete in the banking sector candidate gets proofs/ backup before approval of the loan amount. The application approved or not approved depends upon the historical data of the candidate by the system. Every day lots of people applying for the loan in the banking sector but Bank would have limited funds. In this case, the right prediction would be very beneficial using some classes- function algorithm. An example the logistic regression, random forest classifier, support vector machine classifier, etc. A Bank’s profit and loss depend on the amount of the loans that is whether the Client or customer is paying back the loan. Recovery of loans is the most important for the banking sector. The improvement process plays an important role in the banking sector. The historical data of candidates was used to build a machine learning model using different classification algorithms. The main objective of this paper is to predict whether a new applicant granted the loan or not using machine learning models trained on the historical data set. Techniques: logistic regression, random forest classifier, and support vector machine classifier. Merits: Machine learning algorithms can process large amounts of data quickly and accurately, allowing banks to make informed decisions about loan applications in a timely manner.
  • 25. 13 [7] TITLE: Accurate Loan Approval Prediction Based on Machine Learning Approach (2020) Author: J. Tejaswini, T. Mohana Kavya, R. Devi Naga Ramya, P. Sai Triveni, Venkata Rao Maddumala. ACT Loan approval is a very important process for banking organizations. Banking Industry always needs a more accurate predictive modeling system for many issues. Predicting credit defaulters is a difficult task for the banking industry. The system approved or rejects the loan applications. Recovery of loans is a major contributing parameter in the financial statements of a bank. It is very difficult to predict the possibility of payment of loan by the customer. Machine Learning (ML) techniques are very useful in predicting outcomes for large amount of data. In this paper three machine learning algorithms, Logistic Regression (LR), Decision Tree (DT) and Random Forest (RF) are applied to predict the loan approval of customers. The experimental results conclude that the accuracy of Decision Tree machine learning algorithm is better as compared to Logistic Regression and Random Forest machine learning approaches. Techniques: Logistic Regression, Decision Tree, and Random Forest. Merits: Machine learning algorithms can analyze large amounts of data and identify patterns that humans may not be able to detect. This leads to more accurate predictions of loan approvals and defaults.
  • 26. 14 [8] TITLE: A Federated Learning Based Approach for Loan Defaults Prediction (2020) Author: Geet Shingi., The number of defaults in bank loans have recently been increasing in the past years. However, the process of sanctioning the loan has still been done manually in many of the banking organizations. Dependency on human intervention and delay in results have been the biggest obstacles in this system. While implementing machine learning models for banking applications, the security of sensitive customer banking data has always been a crucial concern and with strong legislative rules in place, sharing of data with other organizations is not possible. Along with this, the loan dataset is highly imbalanced, there are very few samples of defaults as compared to repaid loans. Hence, these problems make the default prediction system difficult to learn the patterns of defaults and thus difficult to predict them. We propose a federated learning-based approach for the prediction of loan applications that are less likely to be repaid which helps in resolving the above mentioned issues by sharing the weight of the model which are aggregated at the central server. The federated system is coupled with Synthetic Minority Over-sampling Technique (SMOTE) to solve the problem of imbalanced training data. Further, the federated system is coupled with a weighted aggregation based on the number of samples and performance of a worker on his dataset to further augment the performance. The improved performance by this model on publicly available real-world data further validates the same. Techniques: Synthetic Minority Over-sampling Technique (SMOTE). Merits: By using a machine learning model, the loan approval process can be automated, reducing the dependency on human intervention to get faster results.
  • 27. 15 CHAPTER 3 SYSTEM ANALYSIS 3.1 EXISTING SYSTEM The existing system for online loan lender approval typically involves a manual process, where borrowers fill out paper or digital loan application forms, submit the required documentation, and meet with a loan officer for an in-person interview. The loan officer then evaluates the borrower's credit history, employment status, income, and other factors to determine the risk involved in granting the loan. If the loan is approved, the loan officer provides the borrower with the necessary loan documents, which must be signed and returned. The loan funds are then disbursed, usually after a background check and verification of the borrower's information. This manual loan approval process can result in long wait times for borrowers. Additionally, there is a risk of human error, such as missing or inaccurate information, in the manual loan approval process. Despite these limitations, the manual loan approval process remains the most common method used by many financial institutions today. 3.1.1Disadvantages The existing manual loan approval system for online lenders has several disadvantages, including: 1. Time-consuming: The manual loan approval process can be slow and take several days or even weeks to complete. This can be frustrating for borrowers who are in need of quick access to funds. 2. Increased risk of human error: The manual loan approval process involves a significant amount of manual effort, which can result in human error. For example, loan officers may miss important information or make incorrect assessments of a borrower's creditworthiness.
  • 28. 16 3. Lack of transparency: The manual loan approval process can be opaque, with little visibility into the factors that influence loan approval decisions. This can make it difficult for borrowers to understand why their loan applications were approved or rejected. 4. Limited loan options: The manual loan approval process can limit the number of loan options available to borrowers, as loan officers may only be able to offer a limited range of loan products. 3.2 PROPOSED SYSTEM The proposed online loan lender approval system is designed to address the limitations of the existing manual loan approval process. This system leverages technology such as cryptography and machine learning algorithms to streamline the loan approval process, increase transparency, and enhance security. The proposed system will automate the loan approval process, allowing borrowers to apply for loans online and receive near-instant loan decisions. This will significantly reduce the risk of human error and minimize wait times for loan approval. Additionally, the use of cryptography and encryption algorithms will ensure the security and privacy of sensitive borrower information, providing greater peace of mind for borrowers and reducing the risk of security breaches and identity theft. In addition to this, PAN Card verification can also be added as an additional feature to determine the credit score of the loan applicants. This project aims to implement the exploratory data analysis in the loan application approval system using the Fernet cryptography process and PAN Card verification to improve the accuracy and security of the loan approval process. The results of this project show that the use of the Fernet cryptography process and PAN Card verification has improved the security and accuracy of the loan approval process.
  • 29. 17 3.2.1Advantages The proposed online loan lender approval system offers several advantages over the existing manual loan approval process, including: 1. Automation: The loan approval process will be fully automated, reducing the need for manual effort and minimizing the risk of human error. Borrowers will be able to apply for loans online and receive near-instant loan decisions. 2. Increased transparency: The use of cryptography and encryption algorithms will ensure the security and privacy of sensitive borrower information. 3. Improved user experience: The proposed system improves the loan approval process by reducing wait times and enhancing the user experience for borrowers. 4. Increased loan options: The proposed system will allow lenders to offer a wider range of loan products to borrowers, giving them greater flexibility and choice in their loan options. 5. Enhanced security: The use of cryptography and encryption algorithms will protect sensitive borrower information and reduce the risk of security breaches and identity theft.
  • 30. 18 CHAPTER 4 SYSTEM REQUIREMENTS 4.1 HARDWARE REQUIREMENTS • Processor : Multi-core processor with a clock speed of at least 2.5 GHz or higher. • RAM : 8GB • Hard disk : Solid-State Drives (SSDs)- 500GB or higher • Keyboard : Standard keyboard and mouse • Monitor : LCD or LED displays with at least 1920 x 1080 resolution 4.2 SOFTWARE REQUIREMENTS • TOOL : Visual Studio Code • Frontend : HTML, CSS, Js, PHP • Framework : Python Flask Framework • Languages : Python 3 (Python 3.7 and above must be installed) • Operating system: Windows 10/Mac OS/Linux • Technologies : Python 3.7, Flask, XGBoost, JSON, NodeJs, PHP
  • 31. 19 CHAPTER 5 SYSTEM DESIGN 5.1 ARCHITECTURE DIAGRAM Fig 5.1 Architecture diagram The Visual Representation shows the different parts of a loan application approval system. Users enter loan application data in an Excel sheet, which is then analyzed and pre-processed by the Exploratory Data Analysis (EDA) module. The data is then encrypted using the Fernet Cryptography module. The XGBoost Algorithm uses the encrypted data to build a machine learning model to predict loan approvals. The Model Training module trains the model, and the Data Visualization module generates visualizations of the model's performance. The Model Deployment module deploys the trained model using FLASK and PKL formats, and the user receives an approval or rejection decision based on their loan application data. The system ensures security by encrypting sensitive data before using it to build the machine learning model. Overall, the loan application approval system architecture has four main components: the user interface, the database, the loan approval system, and the cryptography module.
  • 32. 20 User Interface: This component is responsible for providing a graphical user interface (GUI) that enables borrowers to interact with the system. This may include a web-based or mobile-based interface that provides forms and fields for borrowers to input information and request loan approvals. Database: This component is responsible for storing the data generated by the system. This may include a relational database management system (RDBMS) such as MySQL or PostgreSQL, or a NoSQL database such as MongoDB. Loan Approval System: This component is responsible for evaluating loan requests and determining whether to approve or reject a loan application. This may include a combination of manual review by loan officers and automated decision-making algorithms that analyze credit scores, income, and other factors. Cryptography Module: This component is responsible for securely encrypting and decrypting sensitive data, such as borrowers' personal information and financial details. Fig 5.2 Cipher Block Chaining (CBC)mode encryption.
  • 33. 21 Fig 5.3 Cipher Block Chaining (CBC) mode decryption. This may include the use of the Fernet algorithm, which is a symmetric encryption algorithm that uses secure key to encrypt and decrypt data. XGBoost Algorithm: The online loan lender approval system can also incorporate a machine learning model that is trained using the XGBoost algorithm to improve the accuracy and efficiency of the loan approval process. When using gradient boosting for regression, the weak learners are regression trees, and each regression tree maps an input data point to one of its leafs that contains a continuous score. XGBoost minimizes a regularized (L1 and L2) objective function that combines a convex loss function (based on the difference between the predicted and target outputs) and a penalty term for model complexity (in other words, the regression tree functions). The training proceeds iteratively, adding new trees that predict the residuals or errors of prior trees that are then combined with previous trees to make the final prediction. It's called gradient boosting because it uses a gradient descent algorithm to minimize the loss when adding new models. XGBoost (eXtreme Gradient Boosting) is a powerful and widely used machine learning algorithm that provides robust and scalable solutions for regression and classification problems.
  • 34. 22 Fig 5.4 Formulae based Algorithm In the context of the loan approval system, the XGBoost algorithm can be used to train a model on historical loan data, such as loan amount, loan term, credit score, and other relevant factors, to determine the probability of loan default. The specific design of the architecture will depend on the specific requirements of the project, such as the scale of the project, the complexity of the loan approval process, and the security and privacy requirements for the system.
  • 35. 23 CHAPTER 6 SYSTEM IMPLEMENTATION 6.1 MODULES To effectively implement a Loan Lending system, it is important to categorize the different stages involved in the process. The different segments of the system implementation can be categorized as follows: • Data Collection • Data Preprocessing • Data visualization • Exploratory Data Analysis (EDA) • Model Building • Model Deployment (FLASK & PKL Model) • Secured Environment Module 6.2 MODULE DESCRIPTION 6.2.1.Data Collection The loan approval prediction system can get loan application data from Kaggle, a platform that offers many machine learning datasets. This data includes loan application details, borrower information, and loan outcomes. Fig 6.1 Data Collection (Train and Test Data)
  • 36. 24 Kaggle offers a wide range of loan application data, which can improve the accuracy of the system's predictions. Kaggle also provides valuable information and feedback that can help users make better decisions. 6.2.2.Data Preprocessing Once the data has been collected, it needs to be pre-processed to ensure that it is clean and usable for exploratory data analysis. This step involves handling missing or erroneous data, converting categorical data into numerical features, and normalizing or scaling the data. 6.2.3.Data Visualization Data visualization is a crucial aspect of exploratory data analysis that helps in understanding the patterns and trends in the data. In this loan application approval system project, we have used several visualization techniques to gain insights into the data and to identify the important features for model building. Fig 6.2 Data Visualization Firstly, we have used a correlation matrix to identify the correlation between different features of the dataset. This helped us in identifying the most important features for loan approval. Secondly, we have created bar graphs and histograms to visualize the distribution of loan amounts and loan terms.
  • 37. 25 Finally, we have created a confusion matrix to visualize the performance of the model. This helped us in identifying the number of true positives, false positives, true negatives, and false negatives, which is crucial for evaluating the performance of the model. 6.2.4.Exploratory Data Analysis The next step involves the exploratory data analysis process, which involves analyzing various features of the loan applications, such as the loan amount, the loan purpose, and the applicant's credit score. The analysis may include visualizations such as scatter plots, histograms, and heatmaps to identify patterns and correlations in the data. The development of a loan application approval model that takes an Excel sheet as input and predicts the likelihood of loan approval. The model will analyse various features of the loan applications such as the loan amount, credit score, income, and employment history, and provide a prediction on whether the loan should be approved or not. The output will be presented in the form of an approval section, which will display the predicted results for each loan application in the Excel sheet. 6.2.5.Model Building 1. Data preparation: The first step in building a model using XGBoost is to prepare the data. This includes cleaning, pre-processing, and transforming the data to make it suitable for use in the model. 2. Splitting the Data: The data is divided into training and testing sets. The training set is used to develop the model, while the testing set is used to evaluate its performance. 3. Setting the Parameters: The XGBoost algorithm has several parameters that need to be set before building the model, such as the maximum depth of the tree, the learning rate, and the number of trees to build. The parameters are set based on the nature of the problem and the characteristics of the data.
  • 38. 26 def ValuePredictor(data = pd.DataFrame): # Model name model_name = 'bin/xgboostModel.pkl' # Directory where the model is stored model_dir = os.path.join(current_dir, model_name) # Load the model loaded_model = joblib.load(open(model_dir, 'rb')) # Predict the data result = loaded_model.predict(data) return result[0] 4. Training the Model: The XGBoost model is trained on the training dataset. During the training process, the algorithm builds a set of decision trees based on the input features and the target variable. 5. Evaluating the Model: After training the model, it is evaluated on the testing dataset to assess its performance. This is done using metrics such as accuracy, precision, recall, and F1 score. 6. Tuning the Parameters: The XGBoost model has several hyperparameters that need to be tuned to optimize its performance. This involves selecting the best combination of hyperparameters using techniques such as grid search or random search. 7. Finalizing the Model: Once the optimal hyperparameters have been identified, the XGBoost model is retrained on the entire dataset using the optimal hyperparameters. This is done to finalize the model that will be used for predictions. 8. Saving the Model: The final model is saved to a file using a library like Pickle so that it can be loaded and used later for making predictions on new data. Fig 6.3 Loading Model to Interface Overall, building a model using XGBoost involves several important steps, including data preparation, setting parameters, training and evaluating the model, tuning hyperparameters, finalizing the model, and saving the model. These steps are crucial for building a robust and accurate machine learning model.
  • 39. 27 6.2.6.Model Deployment Model deployment is an important step in the data science project life cycle that involves making the model available for use by end-users. Flask is a popular Python web framework that can be used for deploying machine learning models. Steps involved in deploying a model using Flask and pickle: 1. Export the trained model: Once the model has been trained, it needs to be exported to a file format that can be loaded by Flask. The pickle library in Python can be used to serialize the model object and save it as a .pkl file. 2. Set up the Flask app: Create a new Flask app and define the endpoints that will handle incoming requests from the user. 3. Load the model: In the Flask app, load the trained model from the .pkl file using the pickle library. 4. Define the prediction function: Define a function that takes in the user input, preprocesses it, and uses the loaded model to make a prediction. 5. Create the API endpoint: Create an endpoint that will receive incoming requests from the user, preprocess the input, and use the prediction function to return the predicted output. 6. Test the endpoint: Test the API endpoint by sending sample requests and checking if the predicted output matches the expected output. 7. Deploy the Flask app: Finally, deploy the Flask app on a server that can be accessed by end-users. Once the model has been deployed using Flask and pickle, users can access it through a web interface or API endpoint, and use it to make predictions on new data. It's important to monitor the performance of the model in production and update it periodically to ensure that it remains accurate and reliable.
  • 40. 28 def encrypt(message): data=[] key = Fernet.generate_key() print(key) data.append(key) fernet = Fernet(key) encMessage = fernet.encrypt(message.encode()) print(encMessage) data.append(encMessage) return data data=encrypt(r1) password=sep(data[1]) key=sep(data[0]) def postdata(a,b,c): response=requests.post(a,{ "email":e, "password":b, "key":c }) datainsert(e) for i in range(4): postdata(a[i],password[i],key[i]) 6.2.7.Secured Environment Module To protect the privacy and security of the loan application data, this project utilizes Fernet cryptography encryption. Fernet Cryptography Encryption: This step involves encrypting the loan application data before storing it on the servers. Fernet is a symmetric encryption algorithm that uses a shared secret key to encrypt and decrypt data. This step helps to ensure that the data is secure and cannot be accessed by unauthorized parties. Fernet encryption uses symmetric encryption and message authentication codes (MACs) to provide secure data transmission. The encryption process involves the following steps: 1. Generate a random secret key K. 2. Generate a message authentication code (MAC) for the data to be encrypted using the HMAC-SHA256 algorithm: MAC = HMAC-SHA256(K, data) 3. Encrypt the data using the AES-CBC algorithm with the secret key K: encrypted_data = AES-CBC.encrypt(K, data) 4. Combine the encrypted data and the MAC into a token: token = base64.url safe_b64encode(encrypted_data + MAC) Fig 6.4 Cryptography Fernet Implementation
  • 41. 29 The decryption process involves the following steps: 1. Decode the token back to its original form: original = base64.urlsafe_b64 decode(token) 2. Extract the encrypted data and the MAC from the token: encrypted_data, MAC = original[:-32], original[-32:] 3. Verify the MAC using the HMAC-SHA256 algorithm and the secret key K: HMAC-SHA256(K, encrypted_data) == MAC 4. Decrypt the encrypted data using the AES-CBC algorithm and the secret key K: data = AES-CBC.decrypt(K, encrypted_data) The Fernet encryption process uses a combination of symmetric encryption and message authentication codes. Distributed Server System: The Fernet encryption process uses a combination of symmetric encryption and message authentication codes to ensure that the data remains secure and has not been tampered with during transmission. Fig 6.5 Split up Server Architecture This project also utilizes split-up servers to maintain the hashed data from users. This step involves dividing the data into separate parts and storing each part on a different server. By doing so, the system makes it more difficult for hackers to access all of the data at once. Each server maintains a hashed version of the original data and keys, and these are used to reconstitute the original data.
  • 42. 30 The split-up servers were designed to work in parallel, which means that they can process the data simultaneously and independently of each other. This enables the system to handle a high volume of data and provide fast and accurate results. Fig 6.6 Servers maintained in Nodejs Servers Overall, the split-up servers plays a critical role in the loan application approval system, ensuring that the data is processed efficiently, accurately, and securely. Database Storage: Fig 6.7 MySQL storage value with hashed process At the end of the process, the hashed data is stored in a MySQL database. The database will contain the hashed version of the original data and keys, making it difficult for hackers to access the data. The database can also be queried to extract specific information about loan applications. def checkpass(a1,b): password=[] key=[] a=["http://localhost:8084/","http://localhost:8081/","http://localhost:8082/","http://lo calhost:8083/"] for i in a: x = requests.get(i,data=a1) r=x.text.split(" ") print(x.text) password.append(r[0]) key.append(r[1]) password="".join(password).encode() key="".join(key).encode() f=Fernet(key) rp=f.decrypt(password).decode() if (rp==b): return 1 else: return 0
  • 43. 31 Performance metrics: The accuracy score of 85.7% indicates that the loan application approval system has a relatively high level of accuracy in predicting loan approvals. However, it is important to evaluate the system's performance using other metrics as well. One useful metric to consider is precision, which measures the proportion of predicted loan approvals that are actually correct. This is important because false positive predictions (incorrect approvals) can result in financial losses for the lender. To calculate precision, we divide the number of true positives (correct approvals) by the sum of true positives and false positives (incorrect approvals). Fig 6.8 Accuracy Score, Recall, Precision, F1 Scores In addition, we can plot a receiver operating characteristic (ROC) curve to evaluate the model's performance at different threshold values. The area under the curve (AUC) can then be calculated to provide an overall measure of the model's performance. Overall, while the accuracy score of 85.7% is a good starting point, it is important to evaluate the system's performance using a range of performance metrics to get a more complete picture of its strengths and weaknesses.
  • 44. 32 CHAPTER 7 SYSTEM TESTING 7.1 TEST PLAN The purpose of this test plan is to describe the approach and procedures that will be used to test the exploratory data analysis in loan applicant approval using cryptography fernet process with distributed servers for encrypted data split. Functional Testing: • Testing encryption and decryption process using Fernet. • Testing the splitting of data across the four servers. • Testing the communication between the servers to ensure that data is transmitted correctly. • Testing the ability of the system to handle different types of loan applications. Security Testing: • Testing the authentication process to ensure that only authorized users can access the system. • Testing the authorization process to ensure that users have the appropriate level of access to data. • Testing the encryption process to ensure that all data is encrypted and stored securely. • Test system for potential vulnerabilities or weaknesses. Performance Testing: • Testing system response time under different load conditions. • Testing system's ability to handle a high volume of loan applications. • Testing system's ability to handle multiple users simultaneously.
  • 45. 33 7.2 TEST CASES For checking the loan application, We have two testing aspects • Eligible • Not eligible This is based on the training and testing the model we used in this application. This eligibility can be checked by using the details entered by the users. This includes the details like • Gender • Status • Dependants • Education • Employ • Income • Co-income (additional income) • Loan amount • Loan amount term (in days) • Credit history • Aadhar/PAN Verification • Property area (type of location) Testing system's existing functionality to ensure that it has not been affected by any changes. Test Cases in this Lending Application: Test Case 1: Functional Testing – Secured Data Storage. Input : Loan application’s input data with various attributes such as Loan Terms, Annual Income, age, income, and credit score, etc..
  • 46. 34 Expected Output : The data should be successfully stored in the database with all the attributes and values retained. Actual Output : The data was successfully stored in the database with all the attributes and values retained, and can be retrieved for further analysis. Test Case 2: Performance Testing - Prediction of Result using XGBoost Input : A set of loan application records with various attributes such as age, income, and credit score. Expected Output : The XGBoost model should predict the loan approval status of each application accurately based on the input attributes. Actual Output : The XGBoost model predicted the loan approval status of each application accurately based on the input attributes, and the predictions can be used to inform lending decisions. Test Case 3: Security Testing - Distributed Server for Data Transaction Input : A set of loan application records stored in a distributed server environment with multiple nodes. Expected Output : The data should be able to be transferred seamlessly between the nodes and the main database with no loss or corruption of data. Actual Output : The data was transferred seamlessly between the nodes and the main database with no loss or corruption of data, ensuring the integrity and security of the loan application data.
  • 47. 35 CHAPTER 8 SIMULATION RESULTS The simulation results of this project indicate that the implemented loan approval system has a high accuracy rate in predicting the loan status of loan applications. The exploratory data analysis conducted on the loan application data showed that the majority of approved applications had a higher income, lower debt-to-income ratio, and a longer credit history compared to the rejected loan applications. 8.1 DATA COLLECTION AND PACKAGE IMPORTING The data collection process for this loan application project involved gathering information from various sources, including public datasets, private financial institutions, and individual loan applicants. We collected data on various factors such as employment status, income, credit history, loan amount requested, and loan purpose. Fig 8.1 Importing Essential Libraries
  • 48. 36 Additionally, we also ensured compliance with relevant data privacy packages and security regulations to protect the confidentiality and integrity of the collected data. Overall, the data collection process was crucial in building a robust and reliable loan application system that can accurately assess creditworthiness and provide fair lending practices. 8.2 DATA STORED The data stored in the database of MySQL after the user data has been split into 4 key and 4 cipher texts is crucial for the success of the project. The splitting of the user data into 4 key and 4 cipher texts ensures that the user data is secure and protected from any unauthorized access or manipulation. The use Fig 8.2 Hashed Value Pair of plain text are stored in MySQL Fig 8.3 Key values of the Cipher Text of cryptography Fernet further enhances the security of the data by encrypting the plain text user data and generating a secret key for decryption. This approach ensures that only authorized users can access and analyze the user data.
  • 49. 37 Fig 8.4 Four maintained NodeJs ports using Apache server The storage of the split user data in the database of MySQL further ensures that the data is easily accessible for analysis. The use of Node.js servers further enhances the efficiency of data transfer and storage, making it easier to access the data for further analysis. 8.3 UI DEVELOPMENT (FLASK) Fig 8.5 Login/Register Page Using cryptography fernet Fernet algorithm is a symmetric encryption algorithm that generates a key from a user-supplied passphrase to encrypt and decrypt data.
  • 50. 38 It is a secure and easy-to-use algorithm that is widely used for secure communication and authentication systems. Fig 8.6 User Interface using HTML, CSS, Js, Session The landing account page is the first page that the user sees after successfully logging in. Fig 8.7 the loan application form developed for this project utilizes XGBoost algorithm to predict the likelihood of loan approval. The form captures various factors such as applicant's income, credit score, loan amount, and loan term to generate an accurate prediction. This prediction can help financial institutions in making informed decisions about loan approvals.
  • 51. 39 Fig 8.7 Loan Application Form 8.4 LOAN APPLICATION STATUS (APPROVED OR REJECTED) The application status page is designed to display the loan application status to the user. The XGBoost pickle model is used to evaluate the accuracy of the loan prediction. The model analyzes the loan application data provided by the user and returns a prediction of the likelihood of the loan being approved or rejected.
  • 52. 40 Fig 8.8 Application Status The application status page displays this prediction to the user along with a message indicating whether their loan application has been approved or rejected based on the prediction. This page also displays other relevant information such as the loan amount, interest rate, and repayment term. 8.5 PERFORMANCE METRICS The accuracy score of 85.7% indicates that the loan application approval system has a relatively high level of accuracy in predicting loan approvals. However, it is important to evaluate the system's performance using other metrics as well. One useful metric to consider is precision, which measures the proportion of predicted loan approvals that are actually correct. This is important because false positive predictions (incorrect approvals) can result in financial losses for the lender. To calculate precision, we divide the number of true positives (correct approvals) by the sum of true positives and false positives (incorrect approvals).
  • 53. 41 Fig 8.9 Model Evaluation of test data: Overall, while the accuracy score of 85.7% is a good starting point, it is important to evaluate the system's performance using a range of performance metrics to get a more complete picture of its strengths and weaknesses. 8.6 MAIL SENDER USING SMTP AND MIME Fig 8.10 Email Loan Application Status SMTP (Simple Mail Transfer Protocol) is a protocol used for sending email messages between servers. It works by using a series of commands and responses to transfer messages from the sender's email server to the recipient's email server.
  • 54. 42 SMTP is widely used for sending email messages over the Internet. MIME (Multipurpose Internet Mail Extensions) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. MIME allows email messages to contain multiple parts, with different content types and encoding methods. Fig 8.11 Mail Sending Module Using SMTP Protocol In the given code, the smtplib library is used to connect to the SMTP server with TLS encryption, and the MIMEText class is used to create an email message with a subject, body, and sender/recipient addresses. The msg.as_string() method is used to convert the message to a string format that can be sent over the SMTP connection. Finally, the server. sendmail() method is used to send the email from the sender's email address to the recipient's email address. def S_mail(m): # SMTP server settings SMTP_SERVER = 'smtp.gmail.com' SMTP_PORT = 587 # Sender and recipient email addresses FROM = 'jeganjega807@gmail.com' TO = m # Email message msg = MIMEText(session['body']) msg['Subject'] = 'Loan Application Status-ONLINE BANKING' msg['From'] = formataddr(('Banking.site',FROM )) msg['To'] = m # Connect to the SMTP server with TLS encryption with smtplib.SMTP(SMTP_SERVER, SMTP_PORT) as server: server.starttls() # Enable TLS encryption server.login(FROM, 'qxkviegnzmgtlnda') # Enter your email account password # Send the email server.sendmail(FROM, TO, msg.as_string()) return 'success'
  • 55. 43 8.7 AADHAR/PAN VERIFICATION Fig 8.12 checking Aadhar from Database Aadhaar and PAN are two identification documents used in India for various purposes, including financial transactions. Aadhaar is a 12-digit unique identity number issued by the Unique Identification Authority of India (UIDAI) to residents of India, while PAN (Permanent Account Number) is a ten-digit alphanumeric number issued by the Income Tax Department. Aadhaar verification involves checking whether the Aadhaar number entered by the user is valid and matches the details of the person. This can be done by accessing the UIDAI database and verifying the details using the Aadhaar API. Fig 8.13 Invalid Aadhar Intimation PAN verification involves checking whether the PAN number entered by the user is valid and matches the details of the person. This can be done by accessing the Income Tax Department database and verifying the details using the PAN API. Verification of Aadhaar and PAN can help prevent fraud and ensure that only legitimate users are allowed to access financial services. def checkaadhar(a): mydb = mysql.connector.connect( host="localhost", user="root", password="", database="data_protection" ) mycursor = mydb.cursor() mycursor.execute("SELECT * FROM aadhar WHERE adhrno='"+a+"'") resu = mycursor.fetchall() if(len(resu)!=0): return 1 else: return 0 # Determine the output if checkaadhar(adhrno)!=1: return render_template('application.html', error="Invalid Aadhar card number")
  • 56. 44 CHAPTER 9 APPENDICES 9.1 SAMPLE CODE ======================Mainserver.py======================= from flask import Flask,redirect,url_for,request,jsonify, render_template,session from email.utils import formataddr # Data manipulation import pandas as pd # Matrices manipulation import numpy as np import smtplib from email.mime.text import MIMEText from cryptography.fernet import Fernet import mysql.connector app = Flask(__name__) app.secret_key = 'xsdhrtsrdj56s5rn7snsr67s' def sample(e,r1): a=["http://localhost:8084/","http://localhost:8081/","http://localhost:8082/","http://l ocalhost:8083/"] def datainsert(a): mydb = mysql.connector.connect( sql = "INSERT INTO data (emailid,password1,password2,password3,password4,key1,key2,key3,key4) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s)" val = (a,"","","","","","","","") mycursor.execute(sql, val) mydb.commit()
  • 57. 45 fernet = Fernet(key) encMessage = fernet.encrypt(message.encode()) print(encMessage) @app.route('/',methods=["POST","GET"]) def hello_world(): if request.method=="POST": user=request.form['emails'] user1=request.form['passwords'] if(checkemail(user)==0): sample(user,user1) session['id']=user return render_template("index.html") elif(checkemail(user)==1): return redirect("http://localhost/FinalYR/index.php?signid=alr") else: return "success" @app.route('/login',methods=["POST","GET"]) def hello_world1(): if request.method=="POST": user=request.form['email'] user1=request.form['password'] result = ValuePredictor(data = df) # Determine the output if checkaadhar(adhrno)!=1: return render_template('application.html', error="Invalid Aadhar card number") elif int(result) == 1 and checkaadhar(adhrno)==1:
  • 58. 46 prediction = 'Dear Mr/Mrs/Ms {name}, your loan is approved!'.format(name = name) session['predict']=prediction body="Dear {name},nnWe are pleased to inform you that your loan application has been approved! We understand the importance of the financial support that you require and we are thrilled to be able to help.nn once again and we wish you all the best for your future endeavors!nn".format(name = name) session['body']=body else: prediction = 'Sorry Mr/Mrs/Ms {name}, your loan is rejected!'.format(name = name) body="Dear {name},nnWe are writing to inform you of the status of your loan application. After careful consideration and review of your application, we regret to inform you that your loan application has been rejected.. We wish you the best of luck in your financial endeavors.nSincerely,nOnline Banking".format(name = name) session['body']=body S_mail(session['id']) # redirect(url_for('/sent')) # Return the prediction return render_template('prediction.html', prediction = prediction) # return redirect(url_for('S_mail')), render_template('prediction.html', prediction=prediction) # return (redirect(url_for('/sent')),render_template('prediction.html', prediction=prediction)) # return (redirect(url_for('/sent')), render_template('prediction.html', prediction=prediction))
  • 59. 47 # Something error else: # Return error return render_template('error.html', prediction = prediction) if __name__ == '__main__': app.run(debug = True) ========================Index.html======================= <!doctype html> <html lang="en"> <head> <!-- Required meta tags --> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink- to-fit=no"> <title>Online Banking | Cryptography Fernet </title> <link href="//fonts.googleapis.com/css2?family=Kumbh+Sans:wght@300;400;700& display=swap" rel="stylesheet"> <link rel="stylesheet" href="https://kit.fontawesome.com/7745b6ed41.css" crossorigin="anonymous"> <link rel="stylesheet" href="path/to/font-awesome/css/font- awesome.min.css"> <link rel="icon" type="image/x-icon" href="/images/favicon.ico"> <i style="color:#614da7" class='fas fa-piggy-bank'></i> <!-- Template CSS --> <link rel="stylesheet" href="/static/assets/css/style-starter.css">
  • 60. 48 </head> <body> <!--header--> <header id="site-header" class="fixed-top"> <div class="container"> <nav class="navbar navbar-expand-lg stroke px-0"> <h1> <a class="navbar-brand" href="landing.html"> <i class="fa fa-briefcase" aria-hidden="true" style="padding: 0 10px 0 0;"></i>Online Banking </a></h1> <!-- if logo is image enable this <a class="navbar-brand" href="#index.html"> <img src="image-path" alt="Your logo" title="Your logo" style="height:35px;" /> </a> --> <p class="mt-md-4 mt-3">Our Bank is the best option if you are looking for high-quality and reliable banking services. We provide reliable services for you </p><a class="btn btn-style btn-primary mt-sm-5 mt-4 mr-2" style="border- Read More</a> </div> <div class="col-lg-5 col-md-8 img offset-lg-1 mt-lg-0 mt-4"> <img src="/static/assets/images/Terms.png" alt="img" class="img-fluid radius-image-curve" /> </div></div></div></div></div></li></div></div></div></section>
  • 61. 49 ==========Server1.js, Server2.js, Server3.js & Server4.js ========== var http = require('http'); var mysql=require('mysql2'); const axios = require('axios'); con.connect(function(err) { if (err) { throw err;} else{ console.log("Connected!");} }); http.createServer(function (req, res) { let data = ''; if(req.method=="POST"){ console.log(req.method); req.on('data', chunk => { data += chunk; res.end("sucess"); str=String(data); str=str.replaceAll("%40","@"); str=str.replaceAll("%3D","="); e=str.slice(str.indexOf("email="),str.indexOf("&password=")); e=e.replace("email=",""); s2=str.slice(str.indexOf("&password="),str.indexOf("&key")); s2=s2.replace("&password=",""); s1=str.slice(str.indexOf("&key=")); s1=s1.replace("&key=","");
  • 62. 50 console.log(str); con.connect(function(err) { if (err) throw err; var sql = "UPDATE data SET password1 = '"+s2+"',"+"key1='" +s1+"'WHERE emailid = '"+e+"'"; con.query(sql, function (err, result) { if (err) throw err; console.log(result.affectedRows + " record(s) updated"); });});})} else if(req.method=="GET"){ req.on('data',chunk=>{ data += chunk; str=String(data); console.log(str); con.connect(function(err) { if (err) throw err; con.query("SELECT password1, key1 FROM data WHERE emailid='"+str+"'", function (err, result, fields) { if(result.length!=0){ res.end(String(result[0].password1)+" "+String(result[0].key1));} else{ res.end("fail"); }}); });})} }). listen(8084)
  • 63. 51 =====================Application.html====================== <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> <title>Loan Approval Prediction</title> <!-- Font Icon --> <link rel="stylesheet" href="../static/fonts/material-icon/css/material-design- iconic-font.min.css"> <!-- Main css --> <link rel="stylesheet" href="../static/styles/styleloan.css"> </head> <body> <!-- Aadhar card Details --> <div class="form-group"> <label for="adhrno"><img src="/static/assets/images/aadharIco.png" width="16px" alt=""></label> <input type="number" name="adhrno" id="adhrno" placeholder="Enter Your Aadhar Card No" required/><!--pattern="[0-9]{12}"--> {% if error %} <script> alert('{{ error }}'); </script> {% endif %}
  • 64. 52 </div> <!-- Birthdate --> <div class="form-group"> <label for="birthdate"><i class="zmdi zmdi-calendar"></i></label> <input type="date" name="birthdate" id="birthdate" placeholder="Your Birthdate" required/> </div> <!-- Applicant Income per Month --> <div class="form-group"> <label for="applicant_income"> <img src="/static/assets/images/rupee- indian.png" width="12px" alt=""></label> <input type="number" min="0" name="applicant_income" placeholder="Applicant Income per Month (INR)" required/> </div> <!-- Co-Applicant Income per Month --> <div class="form-group"> <label for="coapplicant_income"> <img src="/static/assets/images/rupee- indian.png" width="12px" alt=""></label> <input type="number" min="0" name="coapplicant_income" placeholder="Co-Applicant Income per Month (INR)" required/> </div> <h3>Loan and Credit Description</h3> <!-- Loan Amount --> <div class="form-group"> <label for="loan_amount"> <img src="/static/assets/images/rupee-indian.png" width="12px" alt=""></label> <input type="number" min="0" name="loan_amount" placeholder="Your Loan Amount (INR)" required/>
  • 65. 53 </div> <!-- Loan Amount Term --> <div class="form-group"> <label for="loan_term"><i class="zmdi zmdi-calendar-check"></i></label> <input type="number" min="0" name="loan_term" placeholder="Your Loan Term (days)" required/> </div> <!-- Credit History --> <div class="form-group"> <label for="credit_history"> <img src="/static/assets/images/rupee-indian.png" width="12px" alt=""></label> <select name="credit_history" id="credit_history" placeholder="Your Credit History" required> <option value="" disabled selected>Your Credit History</option> <option value="1">All Debts Paid</option> <option value="0">Not paid</option> </select> </div> <div class="signup-image"> <figure><img src="../static/images/Loan-Home.jpg" alt="Loan-Home image"></figure> </div></div></div></section></div>{% if error %} <script>alert('{{ error }}');</script> {% endif %} </body> </html>
  • 66. 54 9.2 SCREENSHOTS 9.2.1 Login/Register Page Fig 9.1 Login/Register Page Using cryptography fernet. 9.2.2 Online Banking Website Fig 9.2 Online Banking Website
  • 67. 55 9.2.3 Application Status Interface Fig 9.3. Application Status Interface with flask 9.2.4 Performance Metrics Fig 9.4. Performance Metrics 9.2.5 Fitting the Model with Train and Test data Fig 9.5. Fitting the Model with Train and Test data
  • 68. 56 9.2.6 Loan Status by Education Fig 9.6. Composition of Loan Status by Education 9.2.7 Loan Status by dependents Fig 9.7. Composition of Loan Status by dependents
  • 69. 57 9.2.8 Loan Application Interface Fig 9.8. Loan Application Interface The loan application form developed for this project utilizes XGBoost algorithm to predict the likelihood of loan approval. The form captures various factors such as applicant's income, credit score, loan amount, and loan term to generate an accurate prediction. This prediction can help financial institutions in making informed decisions about loan approvals.
  • 70. 58 CHAPTER 10 CONCLUSION AND FUTURE ENHANCEMENT 10.1 CONCLUSION In conclusion, the economic growth of a country is highly dependent on the efficiency and effectiveness of its banking system. The ability to provide secure and reliable financial transactions is a significant asset to the development of a country. With the development of modern technologies, like the ones used in this project, such as cryptography and machine learning algorithms, the banking system can ensure the protection of sensitive information and enhance the accuracy of loan application approval predictions. This project showcases the potential for innovation and expertise in the development of secure and efficient financial systems, which ultimately contribute to the growth and development of a country. 10.2 FUTURE ENHANCEMENT Integration with additional data sources: Currently, the system relies on the data provided in the input Excel sheet. However, additional data sources, such as credit scores or employment history, could be incorporated to improve the accuracy of the model. Also Dynamic threshold adjustment: The current threshold for loan approval is fixed at 0.5. However, in practice, it may be beneficial to adjust this threshold dynamically based on factors such as the current economic climate or the financial health of the lending institution. Multi-party encryption: While the current system uses Fernet cryptography to encrypt user data, it could be enhanced to support multi-party encryption. This would allow for the encryption of data by multiple parties, such as the lending institution, the borrower, and a third-party intermediary.
  • 71. 59 REFERENCES 1. Anand, Mayank, Arun Velu, and Pawan Whig. "Prediction of loan behavior with machine learning models for secure banking." Journal of Computer Science and Engineering (JCSE) 3.1 (2022): 1-13. 2. Praveen, Tumuluru , et al. "Comparative Analysis of Customer Loan Approval Prediction using Machine Learning Algorithms." 2022 Second International Conference on Artificial Intelligence and Smart Energy (ICAIS). IEEE, 2022. 3. Adebiyi, Marion O., et al. "Secured Loan Prediction System Using Artificial Neural Network." Journal Of Engineering Science and Technology 17.2 (2022): 0854-0873. 4. Ashwini S, Kadam., et al. "Prediction for loan approval using machine learning algorithm." International Research Journal of Engineering and Technology (IRJET) 8.04 (2021). 5. Madaan, Mehul, et al. "Loan default prediction using decision trees and random forest: A comparative study." IOP Conference Series: Materials Science and Engineering. Vol. 1022. No. 1. IOP Publishing, 2021. 6. Singh, Vishal, et al. "Prediction of modernized loan approval system based on machine learning approach." 2021 International Conference on Intelligent Technologies (CONIT). IEEE, 2021. 7. Tejaswini, J., et al. "Accurate loan approval prediction based on machine learning approach." Journal of Engineering Science 11.4 (2020): 523-532.
  • 72. 60 8. Geet, Shingi,. "A federated learning based approach for loan defaults prediction." 2020 International Conference on Data Mining Workshops (ICDMW). IEEE, 2020. 9. Ayushman Yadav and Vishal Singh, “Prediction of Modernized Loan approval System Based on Machine Learning Approach” IEEE, 2021. 10. Mohammad J. Hamayel and Mohammad More, “Improvement of personal loans granting methods in banks using machine learning methods and approaches in Palestine”, IEEE, 2021. 11. Loan Approval Prediction using Machine Learning Algorithms Approach. 2021 [Ebook]. Retrieved from https://ijirt.org/master/publishedpaper/IJIRT151 769_PAPER.pdf. 12. Anshika Gupta and Vinay Pant, “Bank Loan Prediction System using Machine Learning”, IEEE 2020. 13. A. K. Goel and T. Kumar, M. A. Sheikh, "An Approach for Prediction of Loan Approval using Machine Learning Algorithm," 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), pp. 490-494, 2020. 14. P.K Bansal, A Gupta, S Kumar and V Pant, "Bank Loan Prediction System using Machine Learning", IEEE 9th International Conference System Modeling and Advancement in Research Trends, pp. 423-426, December 2020. 15. Tejaswini, J., et al. "Accurate loan approval prediction based on machine learning approach." Journal of Engineering Science vol. 11, no.4, pp. 523-532. 2020.
  • 73. 61 PUBLICATION JEGAN S, NAVEEN V, VIJAYABARATH D, Mr.K.KARTHICK, ‘EXPLORATORY DATA ANALYSIS IN LOAN APPLICANT APPROVAL’, in the Third International Conference on Artificial Intelligence, 5G Communications and Network Technologies (ICA5NT 2023) at VELAMMAL INSTITUTE OF TECHNOLOGY held on 23rd and 24th of March 2023.