Ai based character recognition and speech synthesis

Seminar on
“ AI Based Character Recognition and
Speech Synthesis”
Developed By:
Kalyani Hadke Rani Kubetkar
Shreya Surjuse Ankita Jadhao
Kruttika Sorte
Guided By
Prof. H. N. Datir

Artificial Intelligence
based
Character Recognition and Speech Synthesis

NEED!!!
We are facing so many problem in our daily life like, if we
capturing the image some time we can not get proper
image and not recognize the words.
Lots of people have the problem of illiteracy .
So we wish that this image should be converted to text
for various purposes.
While studying, we don’t read the text as a regular
practice. So we wish that this text can be converted into
audio.
Apart which we wish should be captured in image &
converted into audio.
As generally we prefer hearing songs,

Introduction to CR and SS
• Optical Character Recognition (OCR) is an electronic or
mechanical converter.
• OCR converts scanned images or text into machine code.
• Speech Synthesis is the artificial production of human
speech.
• Speech synthesizer – a computer system used for this
purpose.
• TTS engine performs:
• Language into speech
• Symbolic linguistic representation to speech

• Image
OCR
• Recognized
text
TEXT
• Speech
engine
speech
•Image
OCR
•Recognized
text
TEXT• Recognized
text
TEXT
• Speech
engine
speech
Overview

DFD For Character Recognition
System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
recognition Network testing
Pre-processing explanation

De-noising
De-skew
Binarization
Pre-processing

Network
DFD For Character Recognition System

 Image segmentation
 Decompose sequence of characters in individual
symbols.
 Directly affects the rate of recognition of script.
 Locate and identify boundaries of image.
1. External segmentation
2. Internal segmentation
SEGMENTATION

.
.
Image segmentation is the process of partitioning
an image into multiple segments ,so as to change
the representation of an image into something that
is more meaningful and easier to analyze.
1
2
3
4
. External Segmentation:
determine the character lines in the text.
Image segmentation is the process of partitioning 1

I m a g e
Internal Segmentation:
decompose an image of sequence of characters to
images of individual symbols

• Mapping of symbol image into a
corresponding two dimensional binary matrix
• Issue – deciding the size of matrix
• Sampling strategy for mapping the symbol
image
Image Digitization - Matrix matching

Input alphabet
‘ a ‘
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
Segmented grid
Digitization

• To feed matrix data to the network it must be
linearize to a single dimension
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
…………...0 1 1

N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAME
NEURAL
NETWORK
14
1
13
5Image of
scanned
document
Sub-
images of
individual
letter from
document
Binary representation of
sub-images. E.g 0 is
white and 1 is black.
A supervised
neural network
that has been
trained to
recognize
images of
characters.
Neural network output
numeric values
corresponding to the
recognized characters.
File contains
the text of
the scanned
document.

Network

 Artificial neural network consists of
 a large number of highly interconnected processing elements (neurons)
 working in unison to solve specific problems
 analogous to the biological neurons in the brain.
 Neurons communicated with weighted links
NEURON NEURON
Weighted link
X1
Xn
Output
Wk1
Wkp
Summation
Sigmoid function

• Feed-forward neural network
• A multilayer perceptron
• Teaching and adaption of ANN
• Implementation the ANN

Neural Network
Input Signal
Output signal
Input layer
First hidden layer
Second hidden layer
Output layer

Network Implementation
Training of Learning
Network
Recognition Network testing

Neural Network
Input Signal
Output signal
Binary converted image
Obtained text of
scanned image
Back-propagation for Error
calculation
ERROR

N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAME
NEURAL
NETWORK
14
1
13
5
Sub-
images of
individual
letter from
document
Binary representation of
sub-images. E.g 0 is
white and 1 is black.
A supervised
neural network
that has been
trained to
recognize
images of
characters.
Neural network output
numeric values
corresponding to the
recognized characters.
File contains
the text of
the scanned
document.
Image of
scanned
document

Speech Synthesis
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine

• TTS-Text to Speech engine
• a computer-based system that read any text
aloud.
• TTS engine consist of
Front-end - NLP
Back-end -DSP
Speech Synthesis

Modules of Text-to-Speech
Natural language
processing
Text Preprocessing
Text Analysis
Linguistic Analysis
Digital
signal
processing
Speech
Synthesizer
TEXT SPEECH
Prosody
Phonemes
Figure 1. A simple but general functional diagram of a TTS system
Input Output

Speech Synthesis
Network
Input
Image
File containing
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation

• This step called high-level, front-end or text-
to-phoneme.
• It consists of the following parts:
Text analysis
Automatic Phonetization
Prosody generation
NLP Module

NLP Module
Text Analysis
A pre-processing
A morphological
analysis
A contextual
analysis
A syntactic-prosodic
Text analysis

NLP Module
Automatic Phonetization
Rule-Based
Dictionary-based
Hybrid-approach
Automatic
Phonetization

Speech Synthesis
Network
Input
Image
File containing
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis

NLP Module
Prosody Generation
Pitch
Intonation
Ryhthm
Prosody
Generation

DSP component
• Low level phoneme to speech
• There are two main technologies used for the
generating synthetic speech waveforms:
• Concatenative synthesis
• Formant synthesis

Formant Synthesis
• Formant synthesis – rule-based synthesis
• does not use any human speech samples at runtime.
• Wave-form created using an acoustic model of the
human vocal tract.
• Generates artificial, somewhat robotic speech

Concatenative synthesis
• Based on the concatenation of segments of
recorded speech.
• Gives the most natural sounding synthesized
speech.

Concatenative
Synthesis
Diphone
Concatenation
Synthesis
Unit
Concatenation
Synthesis
Somewhat robotic
speech, sonic glitches natural speech
SUBTYPES

• Unit Concatenation Synthesis
– Algorithm
• Break language down to small units (phonemes, syllables, etc.)
• Create a large database of recorded speech
• Each unit is labeled: pitch, duration, prosody, position in syllable, etc.
Labeling is synthesizer-dependant
• Target utterance is selected at runtime by determining the best chain
of units (HMM, Decision Tree)
• Use DSP to smooth transitions between units
Approaches To Wave-form Generation
Concatenative

Network
Input
Image
File containing
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis

Advantages
• Machine Language Translation
• Information Retrievals
• Visual Issue
(Difficulty seeing text)
• Motor Issue
(Difficulty handling a book or paper)

Ai based character recognition and speech synthesis

More Related Content

What's hot (20)

Viewers also liked (15)

Similar to Ai based character recognition and speech synthesis (20)

Recently uploaded (20)

Ai based character recognition and speech synthesis