SlideShare a Scribd company logo
A Framework for Bangla Text to Speech
Synthesis
Authors
K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi
Presented By
Sanjoy Dutta
Department of Computer Science & Engineering
Khulna University of Engineering and Technology, Khulna, Bangladesh.
Authors
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework
• Rules and Structure Development
• Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
2
Problem Statement
•Develop a framework for Bangla Text to
Speech Synthesis.
3
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework
• Rules and Structure Development
• Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
4
Factors for Speech Synthesis in Bangla
• Sequential flow of diphones
A diphone is a set of two adjacent phonemes where the transition between
two phonemes are modelled, usually from the middle of the first phoneme to
the middle of the second phoneme.
A phoneme is a sound or a group of different sounds perceived to have the
same function by speakers of the language or dialect in question. Like in
English for K/C phoneme: Skill, School.
• Position vs. Pronunciation
Three kinds of position occurs of consonant and vowels:
Constant Vowel(CV)
Vowel Constant(VC)
Vowel Constant Vowel(VCV)
5
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework
• Rules and Structure Development
• Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
6
Proposed Framework Structure and
Rules
• Text Normalization:
Transforming text into a single standard form.
Used when converting text to speech, numbers, dates,
acronyms, and abbreviations.
Text Normalization for Position vs. Pronunciation.
7
Normalization rules for ‘ ’
8
Normalization rules for ‘ - - -
’
9
Syllable Parser Development
10
Syllable Parser In Action
11
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework
• Rules and Structure Development
• Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
12
Audio File Selection and Normalization
Total 39 consonants 11 vowels in Bangla
After Reduction
28 independent consonants
8 (the vowel ’ ‘ is the exception) vowel
13
Audio File Selection and Normalization
Finally
224 (28*8) audio files for the syllables.
28 consonant against 5 vowels to generate
140 (28*5) diphones.
In summary, we need (9 vowels, 28
consonants, 224 syllables and 140 diphones)
401 audio files to be created.
14
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework
• Rules and Structure Development
• Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
15
Experimental Analysis and Results
Strategy of Analysis:
Sample Input Test: Various News Articles from News Portals
Listeners Selection: Anonymous Personals Chosen Randomly
Accuracy Analysis:
Accuracy =
𝑊𝑜𝑟𝑑𝑠 𝑙𝑖𝑠𝑡𝑒𝑛𝑒𝑟𝑠 𝑤𝑒𝑟𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 ℎ𝑒𝑎𝑟 𝑜𝑛 1𝑠𝑡 𝑎𝑡𝑡𝑒𝑚𝑝𝑡 𝑐𝑙𝑒𝑎𝑟𝑙𝑦∗100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑒𝑣𝑒𝑟𝑦 𝑠𝑎𝑚𝑝𝑙𝑒
16
Experiment Result
Listening Factors:
• Duration Synchronization and
Merging
• Numerical Value like years
Constrains in Sample 1:
‌ , , ,
, , ,
Constrains in Sample 2:
, , , , ,
,
17
Limitations and Future Works
Detect Noun and Adjective words namely
( ) Noun and
( ) Adjective
both words should follow the rule 3(a) .
But they don't follow the rule 3(a) and their pronunciation is different.
18
CONCLUSION
We believe the proposed framework can be useful for Bangla TTS
development to detect the Bangla words with minimum audio file
requirement.
19
Thank You !!!
20

More Related Content

PDF
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
PPTX
Parts of speech tagger
PPTX
Experiments with Different Models of Statistcial Machine Translation
PDF
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
PDF
Natural language processing
PPTX
PPTX
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
PPTX
Detecting and Describing Historical Periods in a Large Corpora
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
Parts of speech tagger
Experiments with Different Models of Statistcial Machine Translation
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Natural language processing
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Detecting and Describing Historical Periods in a Large Corpora

What's hot (10)

PDF
Frontiers of Natural Language Processing
PDF
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
PDF
Question Answering - Application and Challenges
PPT
Information Highlighting
PDF
Filled pauses and L2 proficiency: Finnish Australians speaking English
PPTX
Lecture 1: Semantic Analysis in Language Technology
PPTX
Artificial Intelligence Notes Unit 4
PPTX
Corpus study design
PDF
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...
PPTX
Arabic question answering ‫‬
Frontiers of Natural Language Processing
ATTENTION-BASED SYLLABLE LEVEL NEURAL MACHINE TRANSLATION SYSTEM FOR MYANMAR ...
Question Answering - Application and Challenges
Information Highlighting
Filled pauses and L2 proficiency: Finnish Australians speaking English
Lecture 1: Semantic Analysis in Language Technology
Artificial Intelligence Notes Unit 4
Corpus study design
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...
Arabic question answering ‫‬
Ad

Viewers also liked (15)

DOCX
Speech
PDF
Speech by Sheikh Hasina, MP, Honourable Prime Minister Government of the Peop...
PPT
Drug Development Process
PPTX
Drug discovery process style 5 powerpoint presentation templates
PPT
Introduction To Drug Discovery
PPTX
Drug discovery and development
PPT
Corticosteroids
PPTX
Drug discovery and development
PPT
Drug Discovery & Development Overview
PPT
8 Parts of Speech PowerPoint
PDF
Dynamic thresholding on speech segmentation
PPT
Drug Design:Discovery, Development and Delivery
PPT
Drug development and clinical trial phases
PPT
Drug Development Life Cycle
PPTX
Drug discovery and development
Speech
Speech by Sheikh Hasina, MP, Honourable Prime Minister Government of the Peop...
Drug Development Process
Drug discovery process style 5 powerpoint presentation templates
Introduction To Drug Discovery
Drug discovery and development
Corticosteroids
Drug discovery and development
Drug Discovery & Development Overview
8 Parts of Speech PowerPoint
Dynamic thresholding on speech segmentation
Drug Design:Discovery, Development and Delivery
Drug development and clinical trial phases
Drug Development Life Cycle
Drug discovery and development
Ad

Similar to A framework for bangla text to speech synthesis (20)

PDF
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PDF
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PDF
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PPTX
Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman
PDF
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
PPT
Concatenative bangla speech synthesizer model
PDF
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
PDF
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
PDF
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
PPTX
BSc 4th year project proposal final 16-5-22
PDF
An expert system for automatic reading of a text written in standard arabic
PPTX
Speech synthesis
PDF
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
PDF
Tutorial - Speech Synthesis System
PDF
BanglaDocAnalyzer
PDF
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
PDF
SMATalk: Standard Malay Text to Speech Talk System
PPTX
Kuet dreamers
PDF
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PDF
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
PERFORMANCE ANALYSIS OF DIFFERENT ACOUSTIC FEATURES BASED ON LSTM FOR BANGLA ...
Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman
FORMANT ANALYSIS OF BANGLA VOWEL FOR AUTOMATIC SPEECH RECOGNITION
Concatenative bangla speech synthesizer model
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
Modeling of Speech Synthesis of Standard Arabic Using an Expert System
BSc 4th year project proposal final 16-5-22
An expert system for automatic reading of a text written in standard arabic
Speech synthesis
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
Tutorial - Speech Synthesis System
BanglaDocAnalyzer
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
SMATalk: Standard Malay Text to Speech Talk System
Kuet dreamers
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK
PUNJABI SPEECH SYNTHESIS SYSTEM USING HTK

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
cuic standard and advanced reporting.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
sap open course for s4hana steps from ECC to s4
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Dropbox Q2 2025 Financial Results & Investor Presentation
Mobile App Security Testing_ A Comprehensive Guide.pdf
Empathic Computing: Creating Shared Understanding
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
cuic standard and advanced reporting.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

A framework for bangla text to speech synthesis

  • 1. A Framework for Bangla Text to Speech Synthesis Authors K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi Presented By Sanjoy Dutta Department of Computer Science & Engineering Khulna University of Engineering and Technology, Khulna, Bangladesh. Authors
  • 2. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 2
  • 3. Problem Statement •Develop a framework for Bangla Text to Speech Synthesis. 3
  • 4. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 4
  • 5. Factors for Speech Synthesis in Bangla • Sequential flow of diphones A diphone is a set of two adjacent phonemes where the transition between two phonemes are modelled, usually from the middle of the first phoneme to the middle of the second phoneme. A phoneme is a sound or a group of different sounds perceived to have the same function by speakers of the language or dialect in question. Like in English for K/C phoneme: Skill, School. • Position vs. Pronunciation Three kinds of position occurs of consonant and vowels: Constant Vowel(CV) Vowel Constant(VC) Vowel Constant Vowel(VCV) 5
  • 6. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 6
  • 7. Proposed Framework Structure and Rules • Text Normalization: Transforming text into a single standard form. Used when converting text to speech, numbers, dates, acronyms, and abbreviations. Text Normalization for Position vs. Pronunciation. 7
  • 9. Normalization rules for ‘ - - - ’ 9
  • 11. Syllable Parser In Action 11
  • 12. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 12
  • 13. Audio File Selection and Normalization Total 39 consonants 11 vowels in Bangla After Reduction 28 independent consonants 8 (the vowel ’ ‘ is the exception) vowel 13
  • 14. Audio File Selection and Normalization Finally 224 (28*8) audio files for the syllables. 28 consonant against 5 vowels to generate 140 (28*5) diphones. In summary, we need (9 vowels, 28 consonants, 224 syllables and 140 diphones) 401 audio files to be created. 14
  • 15. Contents • Problem Statement • Factors for Speech Synthesis in Bangla • Proposed Framework • Rules and Structure Development • Syllable Parser Development • Audio File Selection and Normalization • Experimental Analysis & Results • Conclusion 15
  • 16. Experimental Analysis and Results Strategy of Analysis: Sample Input Test: Various News Articles from News Portals Listeners Selection: Anonymous Personals Chosen Randomly Accuracy Analysis: Accuracy = 𝑊𝑜𝑟𝑑𝑠 𝑙𝑖𝑠𝑡𝑒𝑛𝑒𝑟𝑠 𝑤𝑒𝑟𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 ℎ𝑒𝑎𝑟 𝑜𝑛 1𝑠𝑡 𝑎𝑡𝑡𝑒𝑚𝑝𝑡 𝑐𝑙𝑒𝑎𝑟𝑙𝑦∗100 𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑒𝑣𝑒𝑟𝑦 𝑠𝑎𝑚𝑝𝑙𝑒 16
  • 17. Experiment Result Listening Factors: • Duration Synchronization and Merging • Numerical Value like years Constrains in Sample 1: ‌ , , , , , , Constrains in Sample 2: , , , , , , 17
  • 18. Limitations and Future Works Detect Noun and Adjective words namely ( ) Noun and ( ) Adjective both words should follow the rule 3(a) . But they don't follow the rule 3(a) and their pronunciation is different. 18
  • 19. CONCLUSION We believe the proposed framework can be useful for Bangla TTS development to detect the Bangla words with minimum audio file requirement. 19