OCR speech using Labview

OCRBasedSpeechSynthesis
Bharat Thakur
Electrical & Electronics Engineering
Panjab University, Chandigarh
Bharat.puchd@gmail.com

Introduction
• Speech is more efficient and effective mode of communication as compared to text. In this
Project work OCR Based Speech Synthesis System has been discussed using LabVIEW 2013.
• The OCR application is developed with IMAQ Vision for LabVIEW software- developing tool
and it uses a commercial digital camera from any android phone as image acquisition device.
• Whole project can be divided into 2 parts:
Text to
speech
conversion
Optical
character
recognition

Components of OCR System
• The identity of each symbol is found by comparing the extracted features with descriptions of
the symbol classes obtained through a previous learning phase.
• Finally, contextual information is used to reconstruct the words and numbers of the original
text.
Fig 20. components of OCR system

System Analysis
A OCR based Speech Synthesis System is a computer-based system that should be able to read text
and give voice output, when the text is scanned and submitted to an Optical Character Recognition
(OCR) system.
Hardware Required:
1. Camera
2. Laptop
3. Speaker
Software Platform:
1. NI Labview
2. NI Vision Assistant

Image Acquisition
• The image has been captured using a digital camera from Redmi Note 3 Android Phone.
• The images are transmitted wirelessly to processor using an Android App named “IP Webcam”
Using Internet Protocol using the IP address of the streaming inside the app.
Fig 21. Block Diagram & Front Panel for Image Acquisition

Binarization
Binarization is the process of converting a grayscale image (0 to 255 pixel values) into binary image (0 to1
pixel values) by a threshold value of 175. the pixels lighter than the threshold are turned to white and the
remainder to black pixels.
Fig 22. Binarization

Template Matching
In template matching the written words in the image are segmented and then compared against a set
of character set file with the extension .abc.
This character set file is formed by using NI vision assistant itself.
After we got the
character by character
segmentation we store
the character image in
a structure. This
character as to be
identified for the pre-
defined character set.
Fig 23. Template Matching

Recognition
Fig 24. Recognition

Text to Speech Synthesis
• Speech synthesis is the artificial production of human speech.
• A computer system used for this purpose is called a speech computer or speech synthesizer.
• In text to speech module text recognized by OCR system will be the inputs of speech synthesis
system which is to be converted into speech which can be heard using an earphone connected to
the laptop or using the built in speakers.
• ActiveX is the general name for a set of Microsoft Technologies that allows you to reuse code
Constructor
Property Node
Invoke Node
Assemblies are
implemented in
3 steps

Text to Speech Code
Text to Speak
. The input given to the invoke node “Speak” in the last step is the text that gets converted to speech
and is available as output from the speakers of the laptop.
Speakers
Text to
Speech
OCRImage
Fig 25. Text to Speech Code

Final Code
Fig 26. Final Code
Fig 27. Steps inside the Vision Assistant

Results and Discussions
• Experiments Suggest that the system has been able to detect the text with high degree of
accuracy (75-80%). However, the efficiency of the systems depends a lot on the size of the font
which is under investigation.
Fig 28. Front Panel for final code

Future Prospects
OCR base Speech recognition system using LabVIEW is an efficient program giving good results for specific fonts
and font sizes, but there is room for improvement.
Future
Prospects
Multi-
Lingual
Educational
Purposes
Translator
Volume
Options
Omni-font
Font sizes

References
[1]www.scientificamerican.com/article/pavement-pounders-at-paris-marathon-generate-power/
[2] COMPARISON OF DIFFERENT BEAM SHAPES FOR PIEZOELECTRIC VIBRATION ENERGY HARVESTING [Maxime Defosseux1*,
Marjolaine Allain, Skandar Basrour, TIMA, UJF-CNRS-Grenoble INP, Grenoble, France]
[3]www.dailymail.co.uk/sciencetech/article-1027362/Britains-eco-nightclub-powered-pounding-feet-opens-doors.html
[4] Pataky TC, Bosch K, Mu T, Keijsers NLW, Segers V, Rosenbaum D, Goulermas JY (2011). An anatomically unbiased foot template for inter-
subject plantar pressure evaluation. Gait and Posture 33(3): 418-422.
[5] Kiran Boby, Aleena Paul K, Anumol.C.V, Josnie Ann Thomas, Nimisha K.K
“Footstep Power Generation Using Piezo Electric Transducers” International Journal of Engineering and Innovative Technology (IJEIT) Volume 3,
Issue 10, April 2014
[6] Landt, Jerry. "Shrouds of Time: The history of RFID," AIM, Inc., 31 May 2006
[7] National Instruments Vision Assistant Manual
[8] D. Klatt, “Review of Text-to-Speech Conversion for English,” Journal of the Acoustical Society of America, JASA vol. 82 (3), pp.737-793, 1987.
[9] ] E. Nunes; E. Abreu; J.C. Metrolho; N. Cardoso; M. Costa; E. Lopes, "Flour quality control using image processing," Industrial Electronics,
2003. ISIE '03. 2003 IEEE International Symposium on , vol.1, no., pp. 594-597 vol. 1, 9-11 June 2003
[10] Van Santen, J. (April 1994). "Assignment of segmental duration in text-to-speech synthesis". Computer Speech & Language 8 (2): 95–128.
doi:10.1006/csla.1994.1005.

OCR speech using Labview

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to OCR speech using Labview (20)

Recently uploaded (20)

OCR speech using Labview