SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 516
Image to Text Conversion Using Tesseract
Nisha Pawar1, Zainab Shaikh2, Poonam Shinde3, Prof. Y.P. Warke4
1,2,3,4Dept. of Computer Engineering, Marathwada Mitra Mandal’s Institute of Technology, Maharashtra, India
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - In the current world, there isagreatincreaseinthe
utilization of digital technology and various methods are
available for the people to capture images. Such images may
contain important textual data that the user may need to edit
or store digitally. This can be done using Optical Character
Recognition with the help of Tesseract OCR Engine. OCR is a
branch of artificial intelligence that is used in applications to
recognize text from scanned documents or images. The
recognized text can also be converted to audio format to help
visually impaired people hear the information that they wish
to know.
Key Words: Artificial Intelligence, Optical Character
Recognition, Tesseract, text-to-speech
1. INTRODUCTION
Textual information is available in many resources such as
documents, newspapers, faxes, printed information, written
notes, etc. Many people simply scan the document to store
the data in the computers. When a documentisscannedwith
a scanner, it is stored in the form of images. Buttheseimages
are not editable and it is very difficult to find what the user
requires as they will have to go through the whole image,
reading each line and word to determine if it is relevant to
their need. Images also take up morespacethanwordfilesin
the computer. It is essential to be able to store this
information in such a way so that it becomeseasiertosearch
and edit the data. There is a growing demand for
applications that can recognize characters from scanned
documents or captured images and make them editable and
easily accessible.
Artificial intelligence is an area of computer science where a
machine is trained to think and behave like intelligent
human beings. Optical Character Recognition (OCR) is a
branch of artificial intelligence. It is used to detect and
extract characters from scanned documents or images and
convert them to editable form. Earlier methods of OCR used
convolutional neural networks but theyarecomplicatedand
usually best suitable for single characters. These methods
also had a higher error rate. Tesseract OCR Engine makes
use of Long Short Term Memory (LSTM) which is a part of
Recurrent Neural Networks. It is open-source and is more
suitable for handwritten texts. It is also suitable at
recognizing larger portion of text data instead of single
characters. Tesseract OCR Engine significantly reduces
errors created in the process of character recognition.
Tesseract assumes that the input image is a binary image
and processing takes place step-by-step. The first step is to
recognize connected components. Outlines are nested into
blobs. These blobs are organized into text lines. Text lines
are broken according to the pitching. If there is a fixed pitch
between the characters then recognition of text takes place
which is a two-pass process.
An adaptive classifier is used here. Words that are
recognized in the first pass are given to the classifier so that
it can learn from the data and use that information for the
second pass to recognize the words that were left out in the
previous pass. Words that are joined arechoppedand words
that are broken are recognized with the help of A* algorithm
that maintains a priority queue which contains the best
suitable characters. Then, the usercanstorethisinformation
in their computers by saving them in word documents or
notepads that can be edited any time they want.
It is difficult for visually impaired people to read textual
information. Blind people havetomakeuseofBrailletoread.
It would be easier for them to simply listen to theaudioform
of the data. This application can be used to convert textual
data to audio format so that it is easier for people to hear the
information. Google Text-To-Speech API is used to convert
the text information into audio form.
2. EXISTING SYSTEM
There are various methods for OCR. Some of them are:
2.1 Connected components based method
It is a well known method used for text detection from
images. The connected components are extracted with help
of algorithm. The resulting components are thenpartitioned
into clusters. This approach detects pixel differences
between the text and the backgroundofthetextimage.It can
extract and recognize the characters too.
2.2 Sliding window based method
This method is also known as text binarisation process. It
classifies individual pixels as text or background in the
textual images. The method acts as bridge between
localization and recognization by OCR.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 517
Fig -1: Sliding window based method
2.3 Hybrid method
Hybrid method is used for text classification. This approach
detects and recognizes texts in CAPTCHA images. The
strength of CAPTCHAcanbe checked.Thismethod efficiently
detects and recognizes the text with a low false positive.
Fig -2: Captcha
2.4 Edge based method
This method is also known as image processing technique,
which finds boundaries of the images or any other objects
within the images. It works by detecting discontinuities in
brightness. This approach is also used for image
segmentation and data extraction in areas such as image
processing, machine vision and computer vision.
Fig -3: Edge-based method
2.5 Color based method
Color based approach is used for clustering. It consists of
two phases: text detection phase and text extraction phase.
In text detection phase, two features are considered –
homogeneous color and sharp edges, and color based
clustering is used to decompose color edge map of image to
several edge maps, which makes text detection more
accurate. In text extractionphase,thedifference betweenthe
text and background in image is considered.
2.6 Texture based method
It is another approach used for detecting texts in images.
This approach uses Support Vector Machine (SVM) to
analyze the textural properties of texts. This method also
uses continuously adaptive mean shift algorithm
(CAMSHIFT) that results in texture analysis. It combines
both SVM and CAMSHIFT to provide robustand efficienttext
detection.
2.7 Corner based method
Corner based method is used for text extraction method. It
has three stages - a) Computing corner response in multi-
scale space and thresholding it to get the candidate region of
text; b)verifying candidate region by combining color and
size range features and; c) locating the text line using
bounding box. It is two-dimensional feature point whichhas
high curvature in region boundary.
2.8 Stroke based method
This approach is used to detect and recognize text from the
video. It uses text confidence using an edge orientation
variance and oppositeedgepairfeature.Thecomponents are
extracted and grouped into text lines based on text
confidence maps. It can detect multilingual texts in video
with high accuracy.
Fig -4: Stroke-based method
2.9 Semi automatic ground truth generation method
This approach is also used for detecting and recognizingtext
from the videos. It can detect English and Chinese text of
different orientations. It has attributes like: line index, word
index, script type, area, content, type of text and many more.
It is most efficient method to detect texts from the videos.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 518
3. LITERATURE REVIEW
Text detection from images is useful in many real world
applications. The data that is stored in text is huge and there
is need to store this data in such a manner that can be
searched easily whenever required. Eliminationoftheuseof
paper is one of the steps to progress towards a world of
electronics. Also, data that can be converted to audio form is
a way to ease the lives of visually impaired people.
In [2007] Ray Smith published anoverviewofTesseractOCR
Engine. It stated that Tesseract started as a PhD project
sponsored by HP in 1984. In 1987, a second person was
assigned to help build it better. In 1988, HPLabs joint with
Scanner Division project. In 1990, the scanner product was
cancelled and fouryearsaheadHPLabsprojectwascancelled
too. From the year 1995 till the year 2005, Tesseract was in
its dark ages. But in the year 2005, it was open sourced by
HP. In 2006, Google took over it. In 2008, Tesseract
expanded to support six languages. By the year 2016, it was
developed further to makes use of LSTM for the purpose of
OCR.
In [2015] Pratik Madhukar Manwatkar and Dr. Kavita R.
Singh published a technical review on text recognition from
images. It emphasizes on the growing demand of OCR
applications as it necessary in today’s world to store
information digitally so that it can be edited whenever
required. This information can later be searched easily as it
is in digital format. The system takes image as an input,
processes on the image and the output is in the form of
textual data.
In [2016] Akhilesh A. Panchal, Shrugal Varde and M.S. Panse
proposed a character detection and recognition system for
visually impaired people. Theyfocusedon the needof people
that are visually impaired as it is difficult for them to read
text data. This system can be used to extract text data from
shop boards or direction boardsandconveythisinformation
to the user in audio form. The main challenges are the
different fonts of the texts on the natural scene images.
In [2017] Nada Farhani, Naim Terbeh and Mounir Zrigui
published a paper that stated the conversion of different
modalities. Human beings have different modalities such as
gesture, sound, touch and images. It is vital that they can
convert the information between these modalities. The
paper focuses on conversion of image to text and also on
text-to-speech so that the user can hear the information
whenever required.
In [2017] Azmi Can Özgen, Mandana Fasounaki and Hazım
Kemal Ekenel published a paper that stated how text data
can be extracted from both natural scene images and
computer generated images. They make use of Maximally
Stable Extermal Regions for the purpose of text detection
and recognition. This method eliminatedthenon-textpartof
the image so that OCR can be done more efficiently.
In [2018] Sandeep Musale and Vikram Ghiye proposed a
system that is a smart reader for visually impaired people.
Using this system, they can convert the text information to
audio format. This system has an audio interface that the
people with visual problems can use easily. It uses a
combination of OTSU and Canny algorithms for the purpose
for character recognition.
In [2018] Christian Reul, Uwe Springmann, Christoph Wick
and Frank Puppe proposed a method to reduce the errors
generated during OCR process. It includescrossfoldtraining
and voting to recognize the words more accurately.AsLSTM
is introduced now, it is easier to recognize words of old
printed books, handwritten words, blurry or uneven words
with high accuracy. A combination of ground truths and
confidence values are used in this method for optimal
recognition of the characters.
4. CONCLUSION AND FUTURE SCOPE
In this age of technology, there is a huge amount of data and
it keeps on increasing day by day. Even though much of the
data is digital, people still prefer to make use of written
transcripts. However, it is necessary to store this data in
digital format in computers so that it can be accessed and
edited easily by the user. This system can be used for
character recognition from scanned documents so that data
can be digitalized. Also, the data can be converted to audio
form so as to help visually impaired people obtain the data
easily.
In the future, we can expand the system to that is can
recognize more languages, different fonts and also
handwritten notes. Various accents can also be added for
audio data.
REFERENCES
[1] Ray Smith, “An overview of the tesseract OCR engine,”
2005.
[2] Pratik Madhukar Manwatkar, Dr. Kavita R. Singh, “A
technical review on text recognition from images,” IEEE
Sponsored 9th International Conference on Intelligent
Systems and Control (ISCO), 2015.
[3] Akhilesh A. Panchal, Shrugal Varde, M.S. Panse,
“Character detection and recognitionsystemforvisually
impaired people,” IEEE International Conference On
Recent Trends In Electronics Information
Communication Technology, May 20-21, 2016.
[4] Nada Farhani, Naim Terbeh, Mounir Zrigui, “Image to
text conversion: state of the art and extended work,”
IEEE/ACS 14th International Conference on Computer
Systems and Applications, 2017.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 519
[5] Azmi Can Özgen, Mandana Fasounaki, Hazım Kemal
Ekenel, “Text detection in natural and computer-
generated images,” 2017.
[6] Sandeep Musale, Vikram Ghiye, “Smart reader for
visually impaired,” Proceedings of the Second
International Conference on Inventive Systems and
Control (ICISC 2018) IEEE Xplore Compliant - Part
Number:CFP18J06-ART,ISBN:978-1-5386-0807-4;DVD
Part Number:CFP18J06DVD, ISBN:978-1-5386-0806-7.
[7] Christian Reul, Uwe Springmann, ChristophWick,Frank
Puppe, “Improving OCR accuracy onearlyprintedbooks
by utilizing cross fold training and voting,” 13th IAPR
International Workshopon DocumentAnalysisSystems,
2018.
[8] U. Springmann and A. Ludeling, “OCR of historical ¨
printings with an application to building diachronic
corpora: A case study using the RIDGES herbal corpus,”
Digital Humanities Quarterly, vol. 11, no. 2, 2017.
[9] J. C. Handley, “Improving OCR accuracy through
combination: A survey,” in Systems, Man, and
Cybernetics, IEEE, 1998.
[10] F. Boschetti, M. Romanello, A. Babeu, D. Bamman, and G.
Crane, “Improving OCR accuracy for classical critical
editions,” ResearchandAdvancedTechnologyforDigital
Libraries, pp. 156–167, 2009.
[11] Vinyals, Oriol, Toshev, Alexander, Bengio, Samy, and
Erhan, Dumitru, ”Show and tell: A neural image caption
generator”, In CVPR, 2015.
[12] Rafeal C. Ginzalez and Richard E. Woods, “Digital Image
Processing”, Pearson Education, Second Edtition, 2005.

More Related Content

PPTX
Optical Character Recognition( OCR )
DOCX
Project report of OCR Recognition
PDF
Handwritten Text Recognition and Digital Text Conversion
 
PDF
IRJET - A Review on Text Recognition for Visually Blind People
PDF
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
PDF
Design and implementation of optical character recognition using template mat...
PDF
Optical Character Recognition Using Python
PPTX
OCR Presentation (Optical Character Recognition)
Optical Character Recognition( OCR )
Project report of OCR Recognition
Handwritten Text Recognition and Digital Text Conversion
 
IRJET - A Review on Text Recognition for Visually Blind People
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
Design and implementation of optical character recognition using template mat...
Optical Character Recognition Using Python
OCR Presentation (Optical Character Recognition)

What's hot (20)

DOCX
Hand Written Character Recognition Using Neural Networks
PPTX
OCR speech using Labview
PPTX
Automatic handwriting recognition
PDF
OCR Text Extraction
PDF
IRJET- Wearable AI Device for Blind
PPTX
Optical Character Recognition (OCR) based Retrieval
DOCX
Optical character recognition IEEE Paper Study
PDF
Optical Character Recognition (OCR) System
PDF
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
PPTX
Handwriting Recognition Using Deep Learning and Computer Version
PDF
Handwritten character recognition in
 
PPTX
Final Report on Optical Character Recognition
PPTX
Optical Character Recognition
PPTX
Handwriting Recognition
PDF
A SURVEY ON DEEP LEARNING METHOD USED FOR CHARACTER RECOGNITION
PPTX
Handwritten Character Recognition
PDF
Rule based algorithm for handwritten characters recognition
PPTX
OCR (Optical Character Recognition)
PDF
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
PDF
A Comprehensive Study On Handwritten Character Recognition System
Hand Written Character Recognition Using Neural Networks
OCR speech using Labview
Automatic handwriting recognition
OCR Text Extraction
IRJET- Wearable AI Device for Blind
Optical Character Recognition (OCR) based Retrieval
Optical character recognition IEEE Paper Study
Optical Character Recognition (OCR) System
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
Handwriting Recognition Using Deep Learning and Computer Version
Handwritten character recognition in
 
Final Report on Optical Character Recognition
Optical Character Recognition
Handwriting Recognition
A SURVEY ON DEEP LEARNING METHOD USED FOR CHARACTER RECOGNITION
Handwritten Character Recognition
Rule based algorithm for handwritten characters recognition
OCR (Optical Character Recognition)
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
A Comprehensive Study On Handwritten Character Recognition System
Ad

Similar to IRJET- Image to Text Conversion using Tesseract (20)

PDF
Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...
PDF
Text Recognition Using Tesseract OCR Facilitating Multilingualism: A Review
PDF
IRJET- Text Extraction from Text Based Image using Android
PPTX
Team-98 research paper presentation.pptx
PDF
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
PDF
IRJET - Optical Character Recognition and Translation
PDF
IRJET-MText Extraction from Images using Convolutional Neural Network
PDF
IRJET- A Novel Approach – Automatic paper evaluation system
PDF
IRJET- Photo Optical Character Recognition Model
PDF
IRJET- Optical Character Recognition using Image Processing
PDF
Text Detection and Recognition with Speech Output for Visually Challenged Per...
PPTX
PDF
Handwritten Text Recognition and Translation with Audio
PDF
Entering the Fourth Dimension of OCR with Tesseract
PDF
IRJET- Offline Transcription using AI
PDF
Z04405149151
PDF
Entering the Fourth Dimension of OCR with Tesseract - Talk from Voxxed Days B...
PDF
Optical Recognition of Handwritten Text
PDF
IRJET- Detection and Recognition of Hypertexts in Imagery using Text Reco...
PDF
IRJET- Intelligent Character Recognition of Handwritten Characters
Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator ...
Text Recognition Using Tesseract OCR Facilitating Multilingualism: A Review
IRJET- Text Extraction from Text Based Image using Android
Team-98 research paper presentation.pptx
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
IRJET - Optical Character Recognition and Translation
IRJET-MText Extraction from Images using Convolutional Neural Network
IRJET- A Novel Approach – Automatic paper evaluation system
IRJET- Photo Optical Character Recognition Model
IRJET- Optical Character Recognition using Image Processing
Text Detection and Recognition with Speech Output for Visually Challenged Per...
Handwritten Text Recognition and Translation with Audio
Entering the Fourth Dimension of OCR with Tesseract
IRJET- Offline Transcription using AI
Z04405149151
Entering the Fourth Dimension of OCR with Tesseract - Talk from Voxxed Days B...
Optical Recognition of Handwritten Text
IRJET- Detection and Recognition of Hypertexts in Imagery using Text Reco...
IRJET- Intelligent Character Recognition of Handwritten Characters
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Welding lecture in detail for understanding
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
additive manufacturing of ss316l using mig welding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Digital Logic Computer Design lecture notes
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
DOCX
573137875-Attendance-Management-System-original
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Project quality management in manufacturing
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lecture Notes Electrical Wiring System Components
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Welding lecture in detail for understanding
CYBER-CRIMES AND SECURITY A guide to understanding
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
UNIT 4 Total Quality Management .pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
additive manufacturing of ss316l using mig welding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Digital Logic Computer Design lecture notes
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
573137875-Attendance-Management-System-original
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Project quality management in manufacturing
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lecture Notes Electrical Wiring System Components

IRJET- Image to Text Conversion using Tesseract

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 516 Image to Text Conversion Using Tesseract Nisha Pawar1, Zainab Shaikh2, Poonam Shinde3, Prof. Y.P. Warke4 1,2,3,4Dept. of Computer Engineering, Marathwada Mitra Mandal’s Institute of Technology, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - In the current world, there isagreatincreaseinthe utilization of digital technology and various methods are available for the people to capture images. Such images may contain important textual data that the user may need to edit or store digitally. This can be done using Optical Character Recognition with the help of Tesseract OCR Engine. OCR is a branch of artificial intelligence that is used in applications to recognize text from scanned documents or images. The recognized text can also be converted to audio format to help visually impaired people hear the information that they wish to know. Key Words: Artificial Intelligence, Optical Character Recognition, Tesseract, text-to-speech 1. INTRODUCTION Textual information is available in many resources such as documents, newspapers, faxes, printed information, written notes, etc. Many people simply scan the document to store the data in the computers. When a documentisscannedwith a scanner, it is stored in the form of images. Buttheseimages are not editable and it is very difficult to find what the user requires as they will have to go through the whole image, reading each line and word to determine if it is relevant to their need. Images also take up morespacethanwordfilesin the computer. It is essential to be able to store this information in such a way so that it becomeseasiertosearch and edit the data. There is a growing demand for applications that can recognize characters from scanned documents or captured images and make them editable and easily accessible. Artificial intelligence is an area of computer science where a machine is trained to think and behave like intelligent human beings. Optical Character Recognition (OCR) is a branch of artificial intelligence. It is used to detect and extract characters from scanned documents or images and convert them to editable form. Earlier methods of OCR used convolutional neural networks but theyarecomplicatedand usually best suitable for single characters. These methods also had a higher error rate. Tesseract OCR Engine makes use of Long Short Term Memory (LSTM) which is a part of Recurrent Neural Networks. It is open-source and is more suitable for handwritten texts. It is also suitable at recognizing larger portion of text data instead of single characters. Tesseract OCR Engine significantly reduces errors created in the process of character recognition. Tesseract assumes that the input image is a binary image and processing takes place step-by-step. The first step is to recognize connected components. Outlines are nested into blobs. These blobs are organized into text lines. Text lines are broken according to the pitching. If there is a fixed pitch between the characters then recognition of text takes place which is a two-pass process. An adaptive classifier is used here. Words that are recognized in the first pass are given to the classifier so that it can learn from the data and use that information for the second pass to recognize the words that were left out in the previous pass. Words that are joined arechoppedand words that are broken are recognized with the help of A* algorithm that maintains a priority queue which contains the best suitable characters. Then, the usercanstorethisinformation in their computers by saving them in word documents or notepads that can be edited any time they want. It is difficult for visually impaired people to read textual information. Blind people havetomakeuseofBrailletoread. It would be easier for them to simply listen to theaudioform of the data. This application can be used to convert textual data to audio format so that it is easier for people to hear the information. Google Text-To-Speech API is used to convert the text information into audio form. 2. EXISTING SYSTEM There are various methods for OCR. Some of them are: 2.1 Connected components based method It is a well known method used for text detection from images. The connected components are extracted with help of algorithm. The resulting components are thenpartitioned into clusters. This approach detects pixel differences between the text and the backgroundofthetextimage.It can extract and recognize the characters too. 2.2 Sliding window based method This method is also known as text binarisation process. It classifies individual pixels as text or background in the textual images. The method acts as bridge between localization and recognization by OCR.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 517 Fig -1: Sliding window based method 2.3 Hybrid method Hybrid method is used for text classification. This approach detects and recognizes texts in CAPTCHA images. The strength of CAPTCHAcanbe checked.Thismethod efficiently detects and recognizes the text with a low false positive. Fig -2: Captcha 2.4 Edge based method This method is also known as image processing technique, which finds boundaries of the images or any other objects within the images. It works by detecting discontinuities in brightness. This approach is also used for image segmentation and data extraction in areas such as image processing, machine vision and computer vision. Fig -3: Edge-based method 2.5 Color based method Color based approach is used for clustering. It consists of two phases: text detection phase and text extraction phase. In text detection phase, two features are considered – homogeneous color and sharp edges, and color based clustering is used to decompose color edge map of image to several edge maps, which makes text detection more accurate. In text extractionphase,thedifference betweenthe text and background in image is considered. 2.6 Texture based method It is another approach used for detecting texts in images. This approach uses Support Vector Machine (SVM) to analyze the textural properties of texts. This method also uses continuously adaptive mean shift algorithm (CAMSHIFT) that results in texture analysis. It combines both SVM and CAMSHIFT to provide robustand efficienttext detection. 2.7 Corner based method Corner based method is used for text extraction method. It has three stages - a) Computing corner response in multi- scale space and thresholding it to get the candidate region of text; b)verifying candidate region by combining color and size range features and; c) locating the text line using bounding box. It is two-dimensional feature point whichhas high curvature in region boundary. 2.8 Stroke based method This approach is used to detect and recognize text from the video. It uses text confidence using an edge orientation variance and oppositeedgepairfeature.Thecomponents are extracted and grouped into text lines based on text confidence maps. It can detect multilingual texts in video with high accuracy. Fig -4: Stroke-based method 2.9 Semi automatic ground truth generation method This approach is also used for detecting and recognizingtext from the videos. It can detect English and Chinese text of different orientations. It has attributes like: line index, word index, script type, area, content, type of text and many more. It is most efficient method to detect texts from the videos.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 518 3. LITERATURE REVIEW Text detection from images is useful in many real world applications. The data that is stored in text is huge and there is need to store this data in such a manner that can be searched easily whenever required. Eliminationoftheuseof paper is one of the steps to progress towards a world of electronics. Also, data that can be converted to audio form is a way to ease the lives of visually impaired people. In [2007] Ray Smith published anoverviewofTesseractOCR Engine. It stated that Tesseract started as a PhD project sponsored by HP in 1984. In 1987, a second person was assigned to help build it better. In 1988, HPLabs joint with Scanner Division project. In 1990, the scanner product was cancelled and fouryearsaheadHPLabsprojectwascancelled too. From the year 1995 till the year 2005, Tesseract was in its dark ages. But in the year 2005, it was open sourced by HP. In 2006, Google took over it. In 2008, Tesseract expanded to support six languages. By the year 2016, it was developed further to makes use of LSTM for the purpose of OCR. In [2015] Pratik Madhukar Manwatkar and Dr. Kavita R. Singh published a technical review on text recognition from images. It emphasizes on the growing demand of OCR applications as it necessary in today’s world to store information digitally so that it can be edited whenever required. This information can later be searched easily as it is in digital format. The system takes image as an input, processes on the image and the output is in the form of textual data. In [2016] Akhilesh A. Panchal, Shrugal Varde and M.S. Panse proposed a character detection and recognition system for visually impaired people. Theyfocusedon the needof people that are visually impaired as it is difficult for them to read text data. This system can be used to extract text data from shop boards or direction boardsandconveythisinformation to the user in audio form. The main challenges are the different fonts of the texts on the natural scene images. In [2017] Nada Farhani, Naim Terbeh and Mounir Zrigui published a paper that stated the conversion of different modalities. Human beings have different modalities such as gesture, sound, touch and images. It is vital that they can convert the information between these modalities. The paper focuses on conversion of image to text and also on text-to-speech so that the user can hear the information whenever required. In [2017] Azmi Can Ă–zgen, Mandana Fasounaki and Hazım Kemal Ekenel published a paper that stated how text data can be extracted from both natural scene images and computer generated images. They make use of Maximally Stable Extermal Regions for the purpose of text detection and recognition. This method eliminatedthenon-textpartof the image so that OCR can be done more efficiently. In [2018] Sandeep Musale and Vikram Ghiye proposed a system that is a smart reader for visually impaired people. Using this system, they can convert the text information to audio format. This system has an audio interface that the people with visual problems can use easily. It uses a combination of OTSU and Canny algorithms for the purpose for character recognition. In [2018] Christian Reul, Uwe Springmann, Christoph Wick and Frank Puppe proposed a method to reduce the errors generated during OCR process. It includescrossfoldtraining and voting to recognize the words more accurately.AsLSTM is introduced now, it is easier to recognize words of old printed books, handwritten words, blurry or uneven words with high accuracy. A combination of ground truths and confidence values are used in this method for optimal recognition of the characters. 4. CONCLUSION AND FUTURE SCOPE In this age of technology, there is a huge amount of data and it keeps on increasing day by day. Even though much of the data is digital, people still prefer to make use of written transcripts. However, it is necessary to store this data in digital format in computers so that it can be accessed and edited easily by the user. This system can be used for character recognition from scanned documents so that data can be digitalized. Also, the data can be converted to audio form so as to help visually impaired people obtain the data easily. In the future, we can expand the system to that is can recognize more languages, different fonts and also handwritten notes. Various accents can also be added for audio data. REFERENCES [1] Ray Smith, “An overview of the tesseract OCR engine,” 2005. [2] Pratik Madhukar Manwatkar, Dr. Kavita R. Singh, “A technical review on text recognition from images,” IEEE Sponsored 9th International Conference on Intelligent Systems and Control (ISCO), 2015. [3] Akhilesh A. Panchal, Shrugal Varde, M.S. Panse, “Character detection and recognitionsystemforvisually impaired people,” IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016. [4] Nada Farhani, Naim Terbeh, Mounir Zrigui, “Image to text conversion: state of the art and extended work,” IEEE/ACS 14th International Conference on Computer Systems and Applications, 2017.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 02 | Feb 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 519 [5] Azmi Can Ă–zgen, Mandana Fasounaki, Hazım Kemal Ekenel, “Text detection in natural and computer- generated images,” 2017. [6] Sandeep Musale, Vikram Ghiye, “Smart reader for visually impaired,” Proceedings of the Second International Conference on Inventive Systems and Control (ICISC 2018) IEEE Xplore Compliant - Part Number:CFP18J06-ART,ISBN:978-1-5386-0807-4;DVD Part Number:CFP18J06DVD, ISBN:978-1-5386-0806-7. [7] Christian Reul, Uwe Springmann, ChristophWick,Frank Puppe, “Improving OCR accuracy onearlyprintedbooks by utilizing cross fold training and voting,” 13th IAPR International Workshopon DocumentAnalysisSystems, 2018. [8] U. Springmann and A. Ludeling, “OCR of historical ¨ printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus,” Digital Humanities Quarterly, vol. 11, no. 2, 2017. [9] J. C. Handley, “Improving OCR accuracy through combination: A survey,” in Systems, Man, and Cybernetics, IEEE, 1998. [10] F. Boschetti, M. Romanello, A. Babeu, D. Bamman, and G. Crane, “Improving OCR accuracy for classical critical editions,” ResearchandAdvancedTechnologyforDigital Libraries, pp. 156–167, 2009. [11] Vinyals, Oriol, Toshev, Alexander, Bengio, Samy, and Erhan, Dumitru, ”Show and tell: A neural image caption generator”, In CVPR, 2015. [12] Rafeal C. Ginzalez and Richard E. Woods, “Digital Image Processing”, Pearson Education, Second Edtition, 2005.