SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 875
Optical Character Recognition based KYC System
Manasi Bhabal1, Dhruv Desai2, Harsh Desai3, Prof. Sejal D’mello4
1,2,3B.E. Student, Information Technology Engineering, Atharva College of Engineering, Mumbai, India
4Professor, Information Technology Engineering, Atharva College of Engineering, Mumbai, India
-----------------------------------------------------------------------***--------------------------------------------------------------------
Abstract - The aim of this paper is to build a platform using
Machine Learning and Optical CharacterRecognitiontomake
it easier for customers to complete their KYC (Know Your
Customer) for banking purposes such as opening a bank
account, applying for a policy, and various other business
purposes, for the sake of convenience and time savings. This
will not only significantly speed up the KYC process, but it will
also eliminate errors. Customers can update their KYC in our
system by scanning and uploading their AADHAR and PAN
cards. The app will use OCR (OpticalCharacterRecognition) to
reduce typing errors, verify documents, and thus auto-fill the
form without errors, saving time.
Key Words: KYC (Know your customer), OCR (Optical
Character Recognition), Tesseract, Authentication, Data
Extraction.
1.INTRODUCTION
Know Your Customer (KYC) is one of the most important
processes for any businesses and digitalization of this
process is need of the hour. Almost everything in the digital
era is automated, and information is stored and
communicated in digital formats. KYC can also refer to
regulated banking practices that are used to verify clients'
identities. The traditional method of working for KYC is a bit
tedious and takes a long time to complete. The proposed
system is a software-basedanddependableKYCmethodthat
uses the concept of OCR (Optical Character Recognition) to
extract text from images, verifies it, and then stores the data
in a database. Additionally, auto form filling is implemented
in the project.
1.1 PURPOSE
Customers typically fill out forms on paper sheets by hand.
Because of human errors, unclear handwriting,anddefective
writing materials, this might lead to a lot of discrepancy.This
could result in a significant amount of paper waste
(considering this is a process adopted worldwide). Next,
there could be errorsproducedbytheauthoritiesresponsible
for data sentry, while referring the handwritten form. Even
the current digitalKYCsystemrequiresmanualeffortsindata
entry. Furthermore, because of the manner customers enter
their data, there may be inconsistenciesinthecustomerdata,
generating serious validation issues for the customer whose
primary goal was to authenticate their identity in the first
place.
Customer identification also aids in the control of financial
fraud, the detection of money laundering and suspicious
activities, and the scrutiny and monitoring of large cash
transactions. To avoid these problems, the Reserve Bank of
India (RBI) directed all banks and financial institutions in
India to implement a policy framework that requires themto
know their customers before opening any accounts. This
entails verifying customers' identities and addresses by
requesting documents that are accepted as relevant proof.
KYC norms require proof of identityandproofofresidenceas
mandatory details. Proof of identitycanbeapassport,voter's
ID card, Permanent Account Number (PAN) card, or driving
license, and proof of residence can be a ration card, an
electricity or telephone bill, or a letter from the employer or
any recognized public authority certifying the address, in
addition to proof of identity being used as residence proof if
they carry address.
1.2 OBJECTIVES
• To ease the process of KYC for various
organizations.
• To create an E-KYC system and make it convenient
for users to complete their KYC.
• To extract data from the uploaded documents of
users.
• To verify and authenticate the uploadeddocuments
• To auto fill the user form and complete the process.
• Making a database of the information gathered for
each customer for further use.
• Driving down cost of operations by automating the
whole process.
• By increasing customer satisfaction in a number of
different functional operations across the
organization.
2. FEASIBILITY
2.1. Technical Feasibility
Thetechnical resources(hardwareandsoftware) requiredto
build the project are the focus of technical feasibility. It also
investigates the specifics of how you intend to deliver the
product and whether the technical team is capable of
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 876
translating the concept into working systems. In our project,
weare creating a website with HTML and CSS and storing the
extracted data in an SQLite database.
In addition, we use Python libraries to implement the
algorithm. Other than a laptop, no additional hardware is
required for this project.
2.2. Economic Feasibility
The economic feasibility of a project is determined by the
costs incurred in the project's development. HTML and CSS
were used to build this site. It is a costless language. This
project does not necessitate the use of any additional
hardware. Asit ispurelya software project,wedidnothaveto
spend any money on it.
2.3. Operational Feasibility
It evaluateshow well a proposedsystemsolves problemsand
meets the requirements identified during the requirement
analysis phase. The primary goal of this project is to make the
KYC process as simple as possible for employees while
reducing the risks associated with the manual process. This
project implements KYC in a single click with form
automation, which is superior to traditional KYC methods.
3. PROPOSED SYSTEM
The flow chart in figure 2.1. illustrates how the system
works. First, the user registers in the system, after whichthe
home page is displayed. Next, the user crops and uploads
images of an Aadhar card and a Pan Card, and the data is
extracted using OCR. Following data extraction, each
uploaded document is authenticated by a verification
process, and a KYC form is automatically filled out and the
process is terminated.
Fig-2.1. Flowchart
4. METHODOLOGY
Data extraction, authentication of ID proofs, and auto-filling
KYC forms are the three primary phases in the system. Data
from the customer's documents is extracted and mapped,
then saved in a database for future auto-form filling. As soon
as the customerpresentshisdocuments,thesystemidentifies
the relevant elements in the image. To auto-fill the form, the
extracted data from the photos is checked first, then mapped
to the form fields. After the form has been correctly filled,the
customer is notified through e-mail about successful
completion of KYC.
Data is extracted from images using the Tesseract Python
library. The algorithm mapsfieldsonimageswheredata isto
be fetched from. Once this data is fetched, the extracted data
is displayed to the user to ensure its accuracy.
Furthermore, the retrieveddata ischeckedtoconfirmthatthe
provided documents are authentic. To obtain the API for
verifyingIndianAadharcards,anofficialapplicationshouldbe
submitted to UIDAI at https://www.uidai.gov.in/. The user
clicks the "Fill Form" button after the obtained data has been
verified. The KYC form is auto-filled, and a Selenium web-
driver is utilized to connect an external form to our system in
order to accomplish this.
Tesseract is a text recognition (OCR) engine that is open
source and licensed under the Apache 2.0 license. It can be
used directly or through an API (for programmers) to extract
written text from photos. It helps us with a wide range of
languages. Tesseract doesn't come with a graphical user
interface, however there are a few on the 3rdParty website.
Tesseract can be used with a variety of computer languages
and frameworks thanks to wrappers available. It can be used
in conjunction with the existing layout analysis to recognize
text insidea hugedocument, or withan external textdetector
to recognize text from a single text line image.
5. SCOPE
KYC is performed using a few required government
documents. Thissystemis notintendedforalldocuments.The
KYC is performed by extracting relevant details from the
predefineddocuments. Furthermore, theformis pre-defined,
and the extracted details are eventually mapped to the form.
Thesystem can beimprovedfurtherto be moresecure and to
perform KYC with more documents. Basic data is gathered,
and software generally compares it to lists of individuals
known for corruption, sanctioned, suspected of committing a
crime, or at high risk of bribery or money laundering.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 877
6. RESULTS
Figure 6.1
Figure 6.2
7. CONCLUSIONS
The keyboard is the most common way to enter data into a
computer. However, this isn't always the best or most
efficient solution. The goal of this system is to develop a new
KYC system that allows users to complete KYC ina fewclicks
and includes verification and auto-formfillingwithuserdata
that is precisely mapped to the form. Although many
processes are still carried out on paper, it is clear that
automatic data recognition technologies are gaining
popularity. During subsequent processing steps, the
document is repeatedly copied and changed, resulting in a
large number of copies. In somecases,theycanhelphumans,
but in others, they are useless. The aim of this system is to
successfully enter data into the form by accurately mapping
every user detail from the database to the form field using
OCR technology without compromising data quality.
8. ACKNOWLEDGEMENT
We would like to express our heartfelt gratitude to our
guide, Professor Sejal D'mello, for her motivation and
guidance throughout the process. We would also like to
thank the faculty of the InformationTechnologyDepartment
for their valuable assistance with our project.
9. REFERENCES
[1] Rishabh Mittal and Anchal Garg, “Text extraction using
OCR: A Systematic Review.” IEEE (2020)
[2] S. Tomovic, K. Pavlovic, M. Bajceta, “Aligning document
layouts extracted with different OCR engines with
clustering approach.” Science Direct (2020)
[3] Yash Kumar, Gaurav Sharma, Komal, Prof. Audumbar
Umbare “E-KYC Mobile Application using Optical
Character Recognition”. IRJET (2020)
[4] Hejing Wu, Fang Liu, Long Zhao Yabin Shao, “Data
Analysis andCrawlerApplicationImplementationBased
on Python.” IEEE 2020
[5] S. D. Bandari, Ankita Jagtap, Namrata Mane, Swarali
Garud “Intelligent Framework for Auto filling Web form
using Scanned Documents” IJISRT
[6] A Al Mamun, SR Hasan, MS Bhuiyan, M Shamim Kaisar,
Mohammad Abu Yousuf SecureandTransparentKYCfor
Banking System Using IPFS and Blockchain Technology.
IEEE (2020)

More Related Content

PDF
IRJET - Digital KYC with Auto Form Filling
PDF
Application on Know Your Customer Authentication
PDF
A Comprehensive Survey of Identity Document Data Extraction Techniques for Ef...
PDF
OCR DETECTION AND BIOMETRIC AUTHENTICATED CREDIT CARD PAYMENT SYSTEM.
PDF
A Survey on e-KYC Verifier Using Blockchain
PPTX
Final Year Project - Automated web based form filling using OCR.pptx
PPTX
Online paymentusingsteganographt&Visualcryptography
PDF
KYC FACE RECOGITION USING LBP ALGORITHM
IRJET - Digital KYC with Auto Form Filling
Application on Know Your Customer Authentication
A Comprehensive Survey of Identity Document Data Extraction Techniques for Ef...
OCR DETECTION AND BIOMETRIC AUTHENTICATED CREDIT CARD PAYMENT SYSTEM.
A Survey on e-KYC Verifier Using Blockchain
Final Year Project - Automated web based form filling using OCR.pptx
Online paymentusingsteganographt&Visualcryptography
KYC FACE RECOGITION USING LBP ALGORITHM

Similar to Optical Character Recognition based KYC System (20)

PDF
KYC VERIFICATION USING BLOCKCHAIN
PDF
KYC Automation Using AI
PDF
IRJET- Using Fingerprint, Pycrypto, and Mobile Banking App, to Withdraw Cash ...
PDF
How AI in KYC Know Your Customer Simplifies Verification.pdf
PDF
How AI in KYC Know Your Customer Simplifies Verification.pdf
PDF
KYC automation using artificial intelligence (AI)
 
PDF
Smart-Authentication: A secure web service for providing bus pass renewal system
PPTX
Automating_Bank_Data_Management_Using_OCR_Technology[2] [Autosaved]-1.pptx
PPTX
BLOCKCHAIN BASED SECURITY FOR KYC MANAGEMENT
PPTX
fp_trans 1strvw.pptx money transfer using biometric authentication
PPTX
fp_trans 1strvw_grp.pptx money transfer using biometric authentication
PDF
IRJET- Graphical Secret Code in Internet Banking for Improved Security Transa...
PDF
KYC using Blockchain
PDF
IRJET - Graphical Password Authentication for Banking System
PDF
IRJET- Decentralized Kyc System
PDF
Strategies for Choosing eKYC Solutions.pdf
PDF
Psdot 16 a new framework for credit card transactions involving mutual authen...
PDF
IRJET - Three Layered Security for Banking
PDF
IRJET- Easykey - Multipurpose RFID Card based IoT System
PDF
SECURE DATA ENCRYPTION FOR ATM TRANSACTIONS
KYC VERIFICATION USING BLOCKCHAIN
KYC Automation Using AI
IRJET- Using Fingerprint, Pycrypto, and Mobile Banking App, to Withdraw Cash ...
How AI in KYC Know Your Customer Simplifies Verification.pdf
How AI in KYC Know Your Customer Simplifies Verification.pdf
KYC automation using artificial intelligence (AI)
 
Smart-Authentication: A secure web service for providing bus pass renewal system
Automating_Bank_Data_Management_Using_OCR_Technology[2] [Autosaved]-1.pptx
BLOCKCHAIN BASED SECURITY FOR KYC MANAGEMENT
fp_trans 1strvw.pptx money transfer using biometric authentication
fp_trans 1strvw_grp.pptx money transfer using biometric authentication
IRJET- Graphical Secret Code in Internet Banking for Improved Security Transa...
KYC using Blockchain
IRJET - Graphical Password Authentication for Banking System
IRJET- Decentralized Kyc System
Strategies for Choosing eKYC Solutions.pdf
Psdot 16 a new framework for credit card transactions involving mutual authen...
IRJET - Three Layered Security for Banking
IRJET- Easykey - Multipurpose RFID Card based IoT System
SECURE DATA ENCRYPTION FOR ATM TRANSACTIONS
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Ad

Recently uploaded (20)

PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Welding lecture in detail for understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Construction Project Organization Group 2.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
web development for engineering and engineering
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
OOP with Java - Java Introduction (Basics)
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Embodied AI: Ushering in the Next Era of Intelligent Systems
Welding lecture in detail for understanding
Internet of Things (IOT) - A guide to understanding
Construction Project Organization Group 2.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
web development for engineering and engineering
Lecture Notes Electrical Wiring System Components
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
UNIT 4 Total Quality Management .pptx
additive manufacturing of ss316l using mig welding
Foundation to blockchain - A guide to Blockchain Tech
OOP with Java - Java Introduction (Basics)
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
R24 SURVEYING LAB MANUAL for civil enggi

Optical Character Recognition based KYC System

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 875 Optical Character Recognition based KYC System Manasi Bhabal1, Dhruv Desai2, Harsh Desai3, Prof. Sejal D’mello4 1,2,3B.E. Student, Information Technology Engineering, Atharva College of Engineering, Mumbai, India 4Professor, Information Technology Engineering, Atharva College of Engineering, Mumbai, India -----------------------------------------------------------------------***-------------------------------------------------------------------- Abstract - The aim of this paper is to build a platform using Machine Learning and Optical CharacterRecognitiontomake it easier for customers to complete their KYC (Know Your Customer) for banking purposes such as opening a bank account, applying for a policy, and various other business purposes, for the sake of convenience and time savings. This will not only significantly speed up the KYC process, but it will also eliminate errors. Customers can update their KYC in our system by scanning and uploading their AADHAR and PAN cards. The app will use OCR (OpticalCharacterRecognition) to reduce typing errors, verify documents, and thus auto-fill the form without errors, saving time. Key Words: KYC (Know your customer), OCR (Optical Character Recognition), Tesseract, Authentication, Data Extraction. 1.INTRODUCTION Know Your Customer (KYC) is one of the most important processes for any businesses and digitalization of this process is need of the hour. Almost everything in the digital era is automated, and information is stored and communicated in digital formats. KYC can also refer to regulated banking practices that are used to verify clients' identities. The traditional method of working for KYC is a bit tedious and takes a long time to complete. The proposed system is a software-basedanddependableKYCmethodthat uses the concept of OCR (Optical Character Recognition) to extract text from images, verifies it, and then stores the data in a database. Additionally, auto form filling is implemented in the project. 1.1 PURPOSE Customers typically fill out forms on paper sheets by hand. Because of human errors, unclear handwriting,anddefective writing materials, this might lead to a lot of discrepancy.This could result in a significant amount of paper waste (considering this is a process adopted worldwide). Next, there could be errorsproducedbytheauthoritiesresponsible for data sentry, while referring the handwritten form. Even the current digitalKYCsystemrequiresmanualeffortsindata entry. Furthermore, because of the manner customers enter their data, there may be inconsistenciesinthecustomerdata, generating serious validation issues for the customer whose primary goal was to authenticate their identity in the first place. Customer identification also aids in the control of financial fraud, the detection of money laundering and suspicious activities, and the scrutiny and monitoring of large cash transactions. To avoid these problems, the Reserve Bank of India (RBI) directed all banks and financial institutions in India to implement a policy framework that requires themto know their customers before opening any accounts. This entails verifying customers' identities and addresses by requesting documents that are accepted as relevant proof. KYC norms require proof of identityandproofofresidenceas mandatory details. Proof of identitycanbeapassport,voter's ID card, Permanent Account Number (PAN) card, or driving license, and proof of residence can be a ration card, an electricity or telephone bill, or a letter from the employer or any recognized public authority certifying the address, in addition to proof of identity being used as residence proof if they carry address. 1.2 OBJECTIVES • To ease the process of KYC for various organizations. • To create an E-KYC system and make it convenient for users to complete their KYC. • To extract data from the uploaded documents of users. • To verify and authenticate the uploadeddocuments • To auto fill the user form and complete the process. • Making a database of the information gathered for each customer for further use. • Driving down cost of operations by automating the whole process. • By increasing customer satisfaction in a number of different functional operations across the organization. 2. FEASIBILITY 2.1. Technical Feasibility Thetechnical resources(hardwareandsoftware) requiredto build the project are the focus of technical feasibility. It also investigates the specifics of how you intend to deliver the product and whether the technical team is capable of
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 876 translating the concept into working systems. In our project, weare creating a website with HTML and CSS and storing the extracted data in an SQLite database. In addition, we use Python libraries to implement the algorithm. Other than a laptop, no additional hardware is required for this project. 2.2. Economic Feasibility The economic feasibility of a project is determined by the costs incurred in the project's development. HTML and CSS were used to build this site. It is a costless language. This project does not necessitate the use of any additional hardware. Asit ispurelya software project,wedidnothaveto spend any money on it. 2.3. Operational Feasibility It evaluateshow well a proposedsystemsolves problemsand meets the requirements identified during the requirement analysis phase. The primary goal of this project is to make the KYC process as simple as possible for employees while reducing the risks associated with the manual process. This project implements KYC in a single click with form automation, which is superior to traditional KYC methods. 3. PROPOSED SYSTEM The flow chart in figure 2.1. illustrates how the system works. First, the user registers in the system, after whichthe home page is displayed. Next, the user crops and uploads images of an Aadhar card and a Pan Card, and the data is extracted using OCR. Following data extraction, each uploaded document is authenticated by a verification process, and a KYC form is automatically filled out and the process is terminated. Fig-2.1. Flowchart 4. METHODOLOGY Data extraction, authentication of ID proofs, and auto-filling KYC forms are the three primary phases in the system. Data from the customer's documents is extracted and mapped, then saved in a database for future auto-form filling. As soon as the customerpresentshisdocuments,thesystemidentifies the relevant elements in the image. To auto-fill the form, the extracted data from the photos is checked first, then mapped to the form fields. After the form has been correctly filled,the customer is notified through e-mail about successful completion of KYC. Data is extracted from images using the Tesseract Python library. The algorithm mapsfieldsonimageswheredata isto be fetched from. Once this data is fetched, the extracted data is displayed to the user to ensure its accuracy. Furthermore, the retrieveddata ischeckedtoconfirmthatthe provided documents are authentic. To obtain the API for verifyingIndianAadharcards,anofficialapplicationshouldbe submitted to UIDAI at https://www.uidai.gov.in/. The user clicks the "Fill Form" button after the obtained data has been verified. The KYC form is auto-filled, and a Selenium web- driver is utilized to connect an external form to our system in order to accomplish this. Tesseract is a text recognition (OCR) engine that is open source and licensed under the Apache 2.0 license. It can be used directly or through an API (for programmers) to extract written text from photos. It helps us with a wide range of languages. Tesseract doesn't come with a graphical user interface, however there are a few on the 3rdParty website. Tesseract can be used with a variety of computer languages and frameworks thanks to wrappers available. It can be used in conjunction with the existing layout analysis to recognize text insidea hugedocument, or withan external textdetector to recognize text from a single text line image. 5. SCOPE KYC is performed using a few required government documents. Thissystemis notintendedforalldocuments.The KYC is performed by extracting relevant details from the predefineddocuments. Furthermore, theformis pre-defined, and the extracted details are eventually mapped to the form. Thesystem can beimprovedfurtherto be moresecure and to perform KYC with more documents. Basic data is gathered, and software generally compares it to lists of individuals known for corruption, sanctioned, suspected of committing a crime, or at high risk of bribery or money laundering.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 877 6. RESULTS Figure 6.1 Figure 6.2 7. CONCLUSIONS The keyboard is the most common way to enter data into a computer. However, this isn't always the best or most efficient solution. The goal of this system is to develop a new KYC system that allows users to complete KYC ina fewclicks and includes verification and auto-formfillingwithuserdata that is precisely mapped to the form. Although many processes are still carried out on paper, it is clear that automatic data recognition technologies are gaining popularity. During subsequent processing steps, the document is repeatedly copied and changed, resulting in a large number of copies. In somecases,theycanhelphumans, but in others, they are useless. The aim of this system is to successfully enter data into the form by accurately mapping every user detail from the database to the form field using OCR technology without compromising data quality. 8. ACKNOWLEDGEMENT We would like to express our heartfelt gratitude to our guide, Professor Sejal D'mello, for her motivation and guidance throughout the process. We would also like to thank the faculty of the InformationTechnologyDepartment for their valuable assistance with our project. 9. REFERENCES [1] Rishabh Mittal and Anchal Garg, “Text extraction using OCR: A Systematic Review.” IEEE (2020) [2] S. Tomovic, K. Pavlovic, M. Bajceta, “Aligning document layouts extracted with different OCR engines with clustering approach.” Science Direct (2020) [3] Yash Kumar, Gaurav Sharma, Komal, Prof. Audumbar Umbare “E-KYC Mobile Application using Optical Character Recognition”. IRJET (2020) [4] Hejing Wu, Fang Liu, Long Zhao Yabin Shao, “Data Analysis andCrawlerApplicationImplementationBased on Python.” IEEE 2020 [5] S. D. Bandari, Ankita Jagtap, Namrata Mane, Swarali Garud “Intelligent Framework for Auto filling Web form using Scanned Documents” IJISRT [6] A Al Mamun, SR Hasan, MS Bhuiyan, M Shamim Kaisar, Mohammad Abu Yousuf SecureandTransparentKYCfor Banking System Using IPFS and Blockchain Technology. IEEE (2020)