A framework for bangla text to speech synthesis

A Framework for Bangla Text to Speech
Synthesis
Authors
K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi
Presented By
Sanjoy Dutta
Department of Computer Science & Engineering
Khulna University of Engineering and Technology, Khulna, Bangladesh.
Authors

Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework
• Rules and Structure Development
• Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
2

Problem Statement
•Develop a framework for Bangla Text to
Speech Synthesis.
3

Contents
• Conclusion
4

Factors for Speech Synthesis in Bangla
• Sequential flow of diphones
A diphone is a set of two adjacent phonemes where the transition between
two phonemes are modelled, usually from the middle of the first phoneme to
the middle of the second phoneme.
A phoneme is a sound or a group of different sounds perceived to have the
same function by speakers of the language or dialect in question. Like in
English for K/C phoneme: Skill, School.
• Position vs. Pronunciation
Three kinds of position occurs of consonant and vowels:
Constant Vowel(CV)
Vowel Constant(VC)
Vowel Constant Vowel(VCV)
5

Contents
• Conclusion
6

Proposed Framework Structure and
Rules
• Text Normalization:
Transforming text into a single standard form.
Used when converting text to speech, numbers, dates,
acronyms, and abbreviations.
Text Normalization for Position vs. Pronunciation.
7

Normalization rules for ‘ ’
8

Normalization rules for ‘ - - -
’
9

Syllable Parser Development
10

Contents
• Conclusion
12

Audio File Selection and Normalization
Total 39 consonants 11 vowels in Bangla
After Reduction
28 independent consonants
8 (the vowel ’ ‘ is the exception) vowel
13

Audio File Selection and Normalization
Finally
224 (28*8) audio files for the syllables.
28 consonant against 5 vowels to generate
140 (28*5) diphones.
In summary, we need (9 vowels, 28
consonants, 224 syllables and 140 diphones)
401 audio files to be created.
14

Contents
• Conclusion
15

Experimental Analysis and Results
Strategy of Analysis:
Sample Input Test: Various News Articles from News Portals
Listeners Selection: Anonymous Personals Chosen Randomly
Accuracy Analysis:
Accuracy =
𝑊𝑜𝑟𝑑𝑠 𝑙𝑖𝑠𝑡𝑒𝑛𝑒𝑟𝑠 𝑤𝑒𝑟𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 ℎ𝑒𝑎𝑟 𝑜𝑛 1𝑠𝑡 𝑎𝑡𝑡𝑒𝑚𝑝𝑡 𝑐𝑙𝑒𝑎𝑟𝑙𝑦∗100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑒𝑣𝑒𝑟𝑦 𝑠𝑎𝑚𝑝𝑙𝑒
16

Experiment Result
Listening Factors:
• Duration Synchronization and
Merging
• Numerical Value like years
Constrains in Sample 1:
‌ , , ,
, , ,
Constrains in Sample 2:
, , , , ,
,
17

Limitations and Future Works
Detect Noun and Adjective words namely
( ) Noun and
( ) Adjective
both words should follow the rule 3(a) .
But they don't follow the rule 3(a) and their pronunciation is different.
18

CONCLUSION
We believe the proposed framework can be useful for Bangla TTS
development to detect the Bangla words with minimum audio file
requirement.
19

A framework for bangla text to speech synthesis

More Related Content

What's hot (10)

Viewers also liked (15)

Similar to A framework for bangla text to speech synthesis (20)

Recently uploaded (20)

A framework for bangla text to speech synthesis