SlideShare a Scribd company logo
Seminar on
“ AI Based Character Recognition and
Speech Synthesis”
Developed By:
Kalyani Hadke Rani Kubetkar
Shreya Surjuse Ankita Jadhao
Kruttika Sorte
Guided By
Prof. H. N. Datir
Artificial Intelligence
based
Character Recognition and Speech Synthesis
NEED!!!
We are facing so many problem in our daily life like, if we
capturing the image some time we can not get proper
image and not recognize the words.
Lots of people have the problem of illiteracy .
So we wish that this image should be converted to text
for various purposes.
While studying, we don’t read the text as a regular
practice. So we wish that this text can be converted into
audio.
Apart which we wish should be captured in image &
converted into audio.
As generally we prefer hearing songs,
Introduction to CR and SS
• Optical Character Recognition (OCR) is an electronic or
mechanical converter.
• OCR converts scanned images or text into machine code.
• Speech Synthesis is the artificial production of human
speech.
• Speech synthesizer – a computer system used for this
purpose.
• TTS engine performs:
• Language into speech
• Symbolic linguistic representation to speech
• Image
OCR
• Recognized
text
TEXT
• Speech
engine
speech
•Image
OCR
•Recognized
text
TEXT• Recognized
text
TEXT
• Speech
engine
speech
Overview
DFD For Character Recognition
System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
recognition Network testing
Pre-processing explanation
De-noising
De-skew
Binarization
Pre-processing
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
recognition Network testing
Pre-processing explanation
DFD For Character Recognition System
 Image segmentation
 Decompose sequence of characters in individual
symbols.
 Directly affects the rate of recognition of script.
 Locate and identify boundaries of image.
1. External segmentation
2. Internal segmentation
SEGMENTATION
.
.
Image segmentation is the process of partitioning
an image into multiple segments ,so as to change
the representation of an image into something that
is more meaningful and easier to analyze.
1
2
3
4
. External Segmentation:
determine the character lines in the text.
Image segmentation is the process of partitioning 1
I m a g e
Internal Segmentation:
decompose an image of sequence of characters to
images of individual symbols
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
recognition Network testing
Pre-processing explanation
DFD For Character Recognition System
• Mapping of symbol image into a
corresponding two dimensional binary matrix
• Issue – deciding the size of matrix
• Sampling strategy for mapping the symbol
image
Image Digitization - Matrix matching
Input alphabet
‘ a ‘
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
Segmented grid
Digitization
• To feed matrix data to the network it must be
linearize to a single dimension
0
0 0
0
0
0
0
0
0
0 0
0
0
0
0
0
0
1
1 1
1
1
1
1
1
1
1
1 1 1
…………...0 1 1
N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAME
NEURAL
NETWORK
14
1
13
5Image of
scanned
document
Sub-
images of
individual
letter from
document
Binary representation of
sub-images. E.g 0 is
white and 1 is black.
A supervised
neural network
that has been
trained to
recognize
images of
characters.
Neural network output
numeric values
corresponding to the
recognized characters.
File contains
the text of
the scanned
document.
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
recognition Network testing
Pre-processing explanation
 Artificial neural network consists of
 a large number of highly interconnected processing elements (neurons)
 working in unison to solve specific problems
 analogous to the biological neurons in the brain.
 Neurons communicated with weighted links
NEURON NEURON
Weighted link
X1
Xn
Output
Wk1
Wkp
Summation
Sigmoid function
• Feed-forward neural network
• A multilayer perceptron
• Teaching and adaption of ANN
• Implementation the ANN
Neural Network
Input Signal
Output signal
Input layer
First hidden layer
Second hidden layer
Output layer
DFD For Character Recognition System
Pre-Processing Segmentation Image Digitization
Network Implementation
Training of Learning
Network
Recognition Network testing
Pre-processing explanation
Neural Network
Input Signal
Output signal
Binary converted image
Obtained text of
scanned image
Back-propagation for Error
calculation
ERROR
N
A
M
E
NAME
001110100….
111010011….
11001100….
000111101…..
NAME
NEURAL
NETWORK
14
1
13
5
Sub-
images of
individual
letter from
document
Binary representation of
sub-images. E.g 0 is
white and 1 is black.
A supervised
neural network
that has been
trained to
recognize
images of
characters.
Neural network output
numeric values
corresponding to the
recognized characters.
File contains
the text of
the scanned
document.
Image of
scanned
document
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
• TTS-Text to Speech engine
• a computer-based system that read any text
aloud.
• TTS engine consist of
Front-end - NLP
Back-end -DSP
Speech Synthesis
Modules of Text-to-Speech
Natural language
processing
Text Preprocessing
Text Analysis
Linguistic Analysis
Digital
signal
processing
Speech
Synthesizer
TEXT SPEECH
Prosody
Phonemes
Figure 1. A simple but general functional diagram of a TTS system
Input Output
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
• This step called high-level, front-end or text-
to-phoneme.
• It consists of the following parts:
Text analysis
Automatic Phonetization
Prosody generation
NLP Module
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
NLP Module
Text Analysis
A pre-processing
A morphological
analysis
A contextual
analysis
A syntactic-prosodic
Text analysis
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
NLP Module
Automatic Phonetization
Rule-Based
Dictionary-based
Hybrid-approach
Automatic
Phonetization
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
NLP Module
Prosody Generation
Pitch
Intonation
Ryhthm
Prosody
Generation
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
DSP component
• Low level phoneme to speech
• There are two main technologies used for the
generating synthetic speech waveforms:
• Concatenative synthesis
• Formant synthesis
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
Formant Synthesis
• Formant synthesis – rule-based synthesis
• does not use any human speech samples at runtime.
• Wave-form created using an acoustic model of the
human vocal tract.
• Generates artificial, somewhat robotic speech
Speech Synthesis
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
Concatenative synthesis
• Based on the concatenation of segments of
recorded speech.
• Gives the most natural sounding synthesized
speech.
Concatenative
Synthesis
Diphone
Concatenation
Synthesis
Unit
Concatenation
Synthesis
Somewhat robotic
speech, sonic glitches natural speech
SUBTYPES
• Unit Concatenation Synthesis
– Algorithm
• Break language down to small units (phonemes, syllables, etc.)
• Create a large database of recorded speech
• Each unit is labeled: pitch, duration, prosody, position in syllable, etc.
Labeling is synthesizer-dependant
• Target utterance is selected at runtime by determining the best chain
of units (HMM, Decision Tree)
• Use DSP to smooth transitions between units
Approaches To Wave-form Generation
Concatenative
Pre-Processing Segmentation Image Digitization
Network ImplementationTraining of Learning
Network
Input
Image
File containing
Text of scanned document
NLP DSP
SPEECH
TEXT
TTS Engine
TEXT
ANALYSIS
Auto
PHONEME
Prosody
Generation
Formant
synthesis
Concatenative
synthesis
Advantages
• Machine Language Translation
• Information Retrievals
• Visual Issue
(Difficulty seeing text)
• Motor Issue
(Difficulty handling a book or paper)
QUESTIONS
????

More Related Content

PPT
Speech Recognition System By Matlab
PDF
Deep Learning For Speech Recognition
PPTX
Speech recognition system seminar
PPTX
Automatic Speech Recognion
PPT
Speech Recognition in Artificail Inteligence
PPTX
Voice recognition system
DOC
Speaker recognition on matlab
PPT
Speech Recognition
Speech Recognition System By Matlab
Deep Learning For Speech Recognition
Speech recognition system seminar
Automatic Speech Recognion
Speech Recognition in Artificail Inteligence
Voice recognition system
Speaker recognition on matlab
Speech Recognition

What's hot (20)

PPTX
Speech recognition challenges
PPT
Voice Recognition
DOCX
Speech Recognition
PPTX
Voice Identification And Recognition System, Matlab
PPTX
Speech recognition final presentation
PPTX
Speech Recognition Technology
PPTX
Speaker recognition system by abhishek mahajan
PPT
Speech recognition
PPT
Voice Recognition
PDF
Automatic speech recognition system using deep learning
PPTX
Speaker recognition in android
PPTX
Automatic speech recognition system
PPTX
Esophageal Speech Recognition using Artificial Neural Network (ANN)
PPTX
Automatic speech recognition system
PPTX
A Survey on Speaker Recognition System
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
PPSX
Speech recognition an overview
PPTX
Speech Recognition Technology
Speech recognition challenges
Voice Recognition
Speech Recognition
Voice Identification And Recognition System, Matlab
Speech recognition final presentation
Speech Recognition Technology
Speaker recognition system by abhishek mahajan
Speech recognition
Voice Recognition
Automatic speech recognition system using deep learning
Speaker recognition in android
Automatic speech recognition system
Esophageal Speech Recognition using Artificial Neural Network (ANN)
Automatic speech recognition system
A Survey on Speaker Recognition System
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Speech recognition an overview
Speech Recognition Technology
Ad

Viewers also liked (15)

PPT
Artificial intelligence Speech recognition system
PPTX
Social messenger introduction
PPTX
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
PPT
Blackboard Pattern
PPTX
Blackboard architecture pattern
PPT
Speech Recognition
PPTX
blackboard architecture
PPT
Gujarati Text-to-Speech Presentation
PPTX
Speaker recognition using MFCC
PPT
Adaline madaline
DOCX
Project report of OCR Recognition
PDF
Speech Recognition , Noise Filtering and Content Search Engine , Research Do...
PDF
software architecture
PDF
SRS FOR CHAT APPLICATION
DOCX
Hand Written Character Recognition Using Neural Networks
Artificial intelligence Speech recognition system
Social messenger introduction
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Blackboard Pattern
Blackboard architecture pattern
Speech Recognition
blackboard architecture
Gujarati Text-to-Speech Presentation
Speaker recognition using MFCC
Adaline madaline
Project report of OCR Recognition
Speech Recognition , Noise Filtering and Content Search Engine , Research Do...
software architecture
SRS FOR CHAT APPLICATION
Hand Written Character Recognition Using Neural Networks
Ad

Similar to Ai based character recognition and speech synthesis (20)

PPTX
Deep-Learning-Basics-Introduction-RAJA M
PDF
Character recognition of Devanagari characters using Artificial Neural Network
PPTX
Deep learning Techniques JNTU R20 UNIT 2
PDF
NLP and Deep Learning for non_experts
PPTX
A12REVIEW.pptx
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
PDF
Mirko Lucchese - Deep Image Processing
PPTX
DEEP LEARNING BLACK AND WHITsesvffwsefwE.pptx
PDF
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
PPT
Machine Learning_ How to Do Speech Recognition with Deep Learning
DOCX
Opticalcharacter recognition
PPTX
Industrial Trainingdbhkbdbdwjb dbxjnwbndcbj
PPTX
Understanding deep learning
PPTX
Team-98 research paper presentation.pptx
PDF
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
PPTX
IBM Deep Learning Overview
PPTX
Deep_Learning_Algorithms_Presentation.pptx
PPTX
DeepFak.pptx asdasdasdasdasdasdasdasdasd
PDF
CIKM-keynote-Nov2014- Large Scale Deep Learning.pdf
PPTX
Natural Language Processing Advancements By Deep Learning: A Survey
Deep-Learning-Basics-Introduction-RAJA M
Character recognition of Devanagari characters using Artificial Neural Network
Deep learning Techniques JNTU R20 UNIT 2
NLP and Deep Learning for non_experts
A12REVIEW.pptx
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Mirko Lucchese - Deep Image Processing
DEEP LEARNING BLACK AND WHITsesvffwsefwE.pptx
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Machine Learning_ How to Do Speech Recognition with Deep Learning
Opticalcharacter recognition
Industrial Trainingdbhkbdbdwjb dbxjnwbndcbj
Understanding deep learning
Team-98 research paper presentation.pptx
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
IBM Deep Learning Overview
Deep_Learning_Algorithms_Presentation.pptx
DeepFak.pptx asdasdasdasdasdasdasdasdasd
CIKM-keynote-Nov2014- Large Scale Deep Learning.pdf
Natural Language Processing Advancements By Deep Learning: A Survey

Recently uploaded (20)

PPTX
Artificial Intelligence
PPTX
Lecture Notes Electrical Wiring System Components
PDF
Well-logging-methods_new................
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
DOCX
573137875-Attendance-Management-System-original
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Geodesy 1.pptx...............................................
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
web development for engineering and engineering
PPT
introduction to datamining and warehousing
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Digital Logic Computer Design lecture notes
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Artificial Intelligence
Lecture Notes Electrical Wiring System Components
Well-logging-methods_new................
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
573137875-Attendance-Management-System-original
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Safety Seminar civil to be ensured for safe working.
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Geodesy 1.pptx...............................................
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
web development for engineering and engineering
introduction to datamining and warehousing
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Digital Logic Computer Design lecture notes
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx

Ai based character recognition and speech synthesis

  • 1. Seminar on “ AI Based Character Recognition and Speech Synthesis” Developed By: Kalyani Hadke Rani Kubetkar Shreya Surjuse Ankita Jadhao Kruttika Sorte Guided By Prof. H. N. Datir
  • 3. NEED!!! We are facing so many problem in our daily life like, if we capturing the image some time we can not get proper image and not recognize the words. Lots of people have the problem of illiteracy . So we wish that this image should be converted to text for various purposes. While studying, we don’t read the text as a regular practice. So we wish that this text can be converted into audio. Apart which we wish should be captured in image & converted into audio. As generally we prefer hearing songs,
  • 4. Introduction to CR and SS • Optical Character Recognition (OCR) is an electronic or mechanical converter. • OCR converts scanned images or text into machine code. • Speech Synthesis is the artificial production of human speech. • Speech synthesizer – a computer system used for this purpose. • TTS engine performs: • Language into speech • Symbolic linguistic representation to speech
  • 5. • Image OCR • Recognized text TEXT • Speech engine speech •Image OCR •Recognized text TEXT• Recognized text TEXT • Speech engine speech Overview
  • 6. DFD For Character Recognition System Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network recognition Network testing Pre-processing explanation
  • 8. Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network recognition Network testing Pre-processing explanation DFD For Character Recognition System
  • 9.  Image segmentation  Decompose sequence of characters in individual symbols.  Directly affects the rate of recognition of script.  Locate and identify boundaries of image. 1. External segmentation 2. Internal segmentation SEGMENTATION
  • 10. . . Image segmentation is the process of partitioning an image into multiple segments ,so as to change the representation of an image into something that is more meaningful and easier to analyze. 1 2 3 4 . External Segmentation: determine the character lines in the text. Image segmentation is the process of partitioning 1
  • 11. I m a g e Internal Segmentation: decompose an image of sequence of characters to images of individual symbols
  • 12. Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network recognition Network testing Pre-processing explanation DFD For Character Recognition System
  • 13. • Mapping of symbol image into a corresponding two dimensional binary matrix • Issue – deciding the size of matrix • Sampling strategy for mapping the symbol image Image Digitization - Matrix matching
  • 14. Input alphabet ‘ a ‘ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 Segmented grid Digitization
  • 15. • To feed matrix data to the network it must be linearize to a single dimension 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 …………...0 1 1
  • 16. N A M E NAME 001110100…. 111010011…. 11001100…. 000111101….. NAME NEURAL NETWORK 14 1 13 5Image of scanned document Sub- images of individual letter from document Binary representation of sub-images. E.g 0 is white and 1 is black. A supervised neural network that has been trained to recognize images of characters. Neural network output numeric values corresponding to the recognized characters. File contains the text of the scanned document.
  • 17. DFD For Character Recognition System Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network recognition Network testing Pre-processing explanation
  • 18.  Artificial neural network consists of  a large number of highly interconnected processing elements (neurons)  working in unison to solve specific problems  analogous to the biological neurons in the brain.  Neurons communicated with weighted links NEURON NEURON Weighted link X1 Xn Output Wk1 Wkp Summation Sigmoid function
  • 19. • Feed-forward neural network • A multilayer perceptron • Teaching and adaption of ANN • Implementation the ANN
  • 20. Neural Network Input Signal Output signal Input layer First hidden layer Second hidden layer Output layer
  • 21. DFD For Character Recognition System Pre-Processing Segmentation Image Digitization Network Implementation Training of Learning Network Recognition Network testing Pre-processing explanation
  • 22. Neural Network Input Signal Output signal Binary converted image Obtained text of scanned image Back-propagation for Error calculation ERROR
  • 23. N A M E NAME 001110100…. 111010011…. 11001100…. 000111101….. NAME NEURAL NETWORK 14 1 13 5 Sub- images of individual letter from document Binary representation of sub-images. E.g 0 is white and 1 is black. A supervised neural network that has been trained to recognize images of characters. Neural network output numeric values corresponding to the recognized characters. File contains the text of the scanned document. Image of scanned document
  • 24. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine
  • 25. • TTS-Text to Speech engine • a computer-based system that read any text aloud. • TTS engine consist of Front-end - NLP Back-end -DSP Speech Synthesis
  • 26. Modules of Text-to-Speech Natural language processing Text Preprocessing Text Analysis Linguistic Analysis Digital signal processing Speech Synthesizer TEXT SPEECH Prosody Phonemes Figure 1. A simple but general functional diagram of a TTS system Input Output
  • 27. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation
  • 28. • This step called high-level, front-end or text- to-phoneme. • It consists of the following parts: Text analysis Automatic Phonetization Prosody generation NLP Module
  • 29. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation
  • 30. NLP Module Text Analysis A pre-processing A morphological analysis A contextual analysis A syntactic-prosodic Text analysis
  • 31. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation
  • 33. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 35. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 36. DSP component • Low level phoneme to speech • There are two main technologies used for the generating synthetic speech waveforms: • Concatenative synthesis • Formant synthesis
  • 37. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 38. Formant Synthesis • Formant synthesis – rule-based synthesis • does not use any human speech samples at runtime. • Wave-form created using an acoustic model of the human vocal tract. • Generates artificial, somewhat robotic speech
  • 39. Speech Synthesis Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 40. Concatenative synthesis • Based on the concatenation of segments of recorded speech. • Gives the most natural sounding synthesized speech.
  • 42. • Unit Concatenation Synthesis – Algorithm • Break language down to small units (phonemes, syllables, etc.) • Create a large database of recorded speech • Each unit is labeled: pitch, duration, prosody, position in syllable, etc. Labeling is synthesizer-dependant • Target utterance is selected at runtime by determining the best chain of units (HMM, Decision Tree) • Use DSP to smooth transitions between units Approaches To Wave-form Generation Concatenative
  • 43. Pre-Processing Segmentation Image Digitization Network ImplementationTraining of Learning Network Input Image File containing Text of scanned document NLP DSP SPEECH TEXT TTS Engine TEXT ANALYSIS Auto PHONEME Prosody Generation Formant synthesis Concatenative synthesis
  • 44. Advantages • Machine Language Translation • Information Retrievals • Visual Issue (Difficulty seeing text) • Motor Issue (Difficulty handling a book or paper)