SlideShare a Scribd company logo
Performance analysis of Bangla
Speech Recognizer model using
Hidden Markov Model (HMM)
Submitted by:
Md. Abdullah-al-MAMUN
1
OUTLINEOUTLINE
 What is speech recognition ?What is speech recognition ?
 The Structure of ASRThe Structure of ASR
 Speech DatabaseSpeech Database
 Feature ExtractionFeature Extraction
 Hidden Markov ModelHidden Markov Model
 Forward algorithmForward algorithm
 Backward algorithmBackward algorithm
 Viterbi algorithmViterbi algorithm
 Training & RecogntionTraining & Recogntion
 ResultResult
 ConclusionsConclusions
 ReferencesReferences
2
What isWhat is SSpeechpeech RRecognitionecognition??
 In Computer Science, In Computer Science, Speech recognitionSpeech recognition is is
the translation of spoken words into text .the translation of spoken words into text .
 Process of converting acoustic signal capturedProcess of converting acoustic signal captured
by microphone to a set of words.by microphone to a set of words.
 Speech recognition known as “AutomaticSpeech recognition known as “Automatic
Speech Recognition (ASR) ”, “Speech to TextSpeech Recognition (ASR) ”, “Speech to Text
(STT)".(STT)".
3
Model ofModel of BBanglaangla SSpeechpeech
RRecognitionecognition
4
Fig -1 : Simple model of Bangla Speech Recognition
Database Signal
Interface
Feature
Extraction
Recognition
Databases
Training HMM
The Structure ofThe Structure of ASRASR System:System:
Figure 1 :Functional Scheme of an ASR SystemFigure 1 :Functional Scheme of an ASR System
Speech
samples
X Y
S
W*
5
Speech Database:Speech Database:
-A speech database is a collection ofA speech database is a collection of
recorded speech accessible on a computerrecorded speech accessible on a computer
and supported with the necessaryand supported with the necessary
transcriptions.transcriptions.
-The databases collect the observationsThe databases collect the observations
required for parameter estimations.required for parameter estimations.
-In this ASR system, I have used aboutIn this ASR system, I have used about
1200 keywords.1200 keywords.
6
Classification of KeywordsClassification of Keywords
Bengal Word
Independent Dependent
Vowel Consonant
Modifier
Character
Compound
Character
7
DDatabaseatabase CCreationreation PProcessrocess
Database
8
Speech Signal AnalysisSpeech Signal Analysis
Feature Extraction for ASR:Feature Extraction for ASR:
- The aim is to extract the voice features to- The aim is to extract the voice features to
distinguish different phonemes of a language.distinguish different phonemes of a language.
9
5
1
5
6
4
5
4
6
5
1
5
6
1
5
6
1
6
5
1
5
6
4
5
6
4
5
4
2
5
1
5
6
1
5
6
5
Feature
Extraction
MFCCMFCC extractionextraction
Pre-emphasis DFT
Mel filter
banks
Log(||2
) IDFT
Speech
signal
x(n)
WINDOW
x’
(n)
xt (n)
Xt(k)
Yt(m)
MFCC
yt
(m)
(k)
10
MFCC means Mel-frequency cepstral coefficients that
representation of the short-term power spectrum of a sound for
audio processing.
The MFCCs are the amplitudes of the resulting spectrum.
Speech waveform of aSpeech waveform of a
phoneme “ae”phoneme “ae”
After pre-emphasis andAfter pre-emphasis and
Hamming windowingHamming windowing
Power spectrumPower spectrum MFCCMFCC
Explanatory ExampleExplanatory Example
11
FFeatureeature VVector toector to P(O|M)P(O|M) viavia
HMMHMM
12
5
1
5
6
4
6
5
4
5
6
4
P(O|M)HMM
For each input word O the HMM generate a corresponding
probability P(O|M) that could be computed by the HMM.
HMM ModelHMM Model
13
HMM is specified by a five-tuples =λ ( , , , , )S O A BΠ
14
Elements of an HMMElements of an HMM
1) Set of hidden states1) Set of hidden states S={1.2., … … N}S={1.2., … … N}
2) Set of observation symbols2) Set of observation symbols O={oO={o11, o, o22, … … o, … … oMM}}
M: the number of observation symbolsM: the number of observation symbols
3) The initial state distribution3) The initial state distribution
4) State transition probability distribution4) State transition probability distribution
5) Observation symbol probability distribution in state j5) Observation symbol probability distribution in state j
1{ } ( | ), 1 ,ij ij t tA a a P s j s i i j N−= = = = ≤ ≤
{ ( )} ( ) ( | ) 1 ,1j j t k tB b k b k P X o s j j N k M= = = = ≤ ≤ ≤ ≤
0{ } ( ) 1i i P s i i Nπ π π= = = ≤ ≤
15
Three Basic Problems in HMMThree Basic Problems in HMM
 1.The Evaluation Problem1.The Evaluation Problem –Given a model–Given a model λλ =(A, B, π)=(A, B, π) and aand a
sequence of observations Osequence of observations O = (o= (o11, o, o22, o, o33,...o,...oMM )), what is the, what is the
probability P(O|probability P(O|λλ); i.e., the probability of the model that); i.e., the probability of the model that
generates the observations?generates the observations?
 2.The Decoding Problem2.The Decoding Problem – Given a model– Given a model λλ =(A, B, π)=(A, B, π) and aand a
sequence of observation Osequence of observation O = (o= (o11, o, o22, o, o33,...o,...oMM )), what is the, what is the
most likely state sequence in the model that produces themost likely state sequence in the model that produces the
observations?observations?
 3.The Learning Problem3.The Learning Problem –Given a model–Given a model λλ =(A, B, π)=(A, B, π) and aand a
set of observationsset of observations O = (oO = (o11, o, o22, o, o33,...o,...oMM )), how can we adjust, how can we adjust
the model parameterthe model parameter λλ to maximize the joint probabilityto maximize the joint probability
P(O|P(O|λλ)?)?
How to evaluate an HMM?
Forward Algorithm
How to Decode an HMM?
Viterbi Algorithm
How to Train an HMM?
Baum-Welch Algorithm
16
CalculateCalculate
PProbabilityrobability ( O| M )( O| M )
Trellis:
0.5
0.3
0.2
P(up)
P(down)
P(no-change)
0.3
0.3
0.4
0.7
0.1
0.2
0.1
0.6
0.3
0.179
0.036
0.008
0.35
0.02
0.09
0.35*0.2*0.3
0.02*0.5*0.7
0.09*0.4*0.7
0.02*0.2*0.3
0.09*0.5*0.3
0.35*0.6*0.7 0.179*0.6*0.7
0.008*0.5*0.7
0.036*0.4*0.7
0.6
0.5
0.4
0.2
0.3
0.1
0.2
0.2 transition matrix
0.5
0.2230.46
add probabilities !
Forward Calculations – OverviewForward Calculations – Overview
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
0.6
0.1
0.3
0.1
0.1
0.2
17
Forward Calculations (t=2)Forward Calculations (t=2)
S0
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2
NOTE: that α1 (2)+ α2 (2)
is the likelihood of the
observation.
1
2
1 1 13 11 2 23 21
2 1 13 12 2 23 22
(1) 1
(1) 0
(2) (1) (1) 0.21
(2) (1) (1) 0.09
b a b a
b a b a
α
α
α α α
α α α
=
=
= + =
= + =
0.6
0.1
0.3
0.1
0.1
0.2
18
Forward Calculations (t=3)Forward Calculations (t=3)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
α1(3)
0.6
0.1
0.3
0.1
0.1
0.2
19
Forward Calculations (t=4)Forward Calculations (t=4)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
S1
S2
0.6
0.1
0.3
0.1
0.1
0.2
20
Forward Calculation ofForward Calculation of
Likelihood FunctionLikelihood Function
t=1 t=2 t=3 t=4
α1(t) 1.0
π1 =1
0.21
α1(1) a11 b13
+α2(1) a21 b23
0.0462
α1(2)a11 b12
+α2(2)a21 b12
0.021294
α2(t) 0.0
π2 =0
0.09
α1(1) a12 b13
+α2(1) a22 b23
0.0378 0.010206
L(t)
p(K1… Kt)
1.0
α1(1) +α2(1)
0.3
α1(2) +α2(2)
0.084
α1(3) +α2(3)
0.0315
α1(4) +α2(4)
21
Backward Calculations – OverviewBackward Calculations – Overview
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
0.6
0.1
0.3
0.1
0.1
0.2
22
Backward Calculations (t=3)Backward Calculations (t=3)
S1
S2
TIME 3
0.6
0.1
0.3
0.1
0.1
0.2
23
Backward Calculations (t=2)Backward Calculations (t=2)
S1
S2
S1
S2
TIME 2 TIME 3 TIME 4
a22=0.5
a11=0.7
a12=0.3
a21=0.5
NOTE: that β1 (2)+ β2 (2)
is the likelihood the
observation/word sequence.
1
2
1
2
1 1 11 12 2 12 12
2 1 21 22 2 22 22
(4) 1
(4) 1
(3) 0.6
(3) 0.1
(2) (3) (3) 0.045
(2) (3) (3) 0.245
a b a b
a b a b
β
β
β
β
β β β
β β β
=
=
=
=
= + =
= + =
0.6
0.1
0.3
0.1
0.1
0.2
24
Backward Calculations (t=1)Backward Calculations (t=1)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a21=0.5
TIME 2 TIME 3 TIME 4
0.6
0.1
0.3
0.1
0.1
0.2
25
Backward Calculation ofBackward Calculation of
Likelihood FunctionLikelihood Function
t=1 t=2 t=3 t=4
β1(t) 0.0315 0.045
a11b11 β1(1) +
+ a12b21 β1(1)
0.6
b11
1
β2(t) 0.029 0.245
a11b11 β1(1) +
+ a12b21 β1(1)
0.1
b21
1
L(t)
p(Kt… KT)
0.0315
π1 β1(1) +
π2 β2(1)
0.290
β1(2) +β2(2)
0.7
β1(3) + β2(3)
1
26
27
CalculateCalculate
maxmaxSS Prob.Prob. state sequencestate sequence SS
0.35
0.09
0.02
P(up)
P(down)
P(no-change)
0.3
0.3
0.4
0.7
0.1
0.2
0.1
0.6
0.3
0.147
0.021
0.007
0.35*0.2*0.3
0.02*0.5*0.7
0.09*0.4*0.7
0.02*0.2*0.3
0.09*0.5*0.3
0.35*0.6*0.7 0.147*0.6*0.7
0.007*0.5*0.7
0.021*0.4*0.7
0.5
0.2
0.3
best
Select highest probability !
Viterbi Algorithm – OverviewViterbi Algorithm – Overview
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
0.6
0.1
0.3
0.1
0.1
0.2
28
Viterbi Algorithm (Forward Calculations t=2)Viterbi Algorithm (Forward Calculations t=2)
S0
S1
S2
S1
S2
π1=1
π2=0
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2
1 1
2 2
1 1 13 11 2 23 21
2 1 13 12 2 23 22
1
2
(1) 1
(1) 0
(2) max{ (1) , (1) } 0.21
(2) max{ (1) , (1) } 0.09
(2) 1
(2) 1
b a b a
b a b a
δ π
δ π
δ δ δ
δ δ δ
ψ
ψ
= =
= =
= =
= =
=
=
0.6
0.1
0.3
0.1
0.1
0.2
29
Viterbi Algorithm (Backtracking t=2)Viterbi Algorithm (Backtracking t=2)
S0
S1
S2
S1
S2
π1=1
π2=0
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2
1 1
2 2
1 1 13 11 2 23 21
2 1 13 12 2 23 22
1
2
(1) 1
(1) 0
(2) max{ (1) , (1) } 0.21
(2) max{ (1) , (1) } 0.09
(2) 1
(2) 1
b a b a
b a b a
δ π
δ π
δ δ δ
δ δ δ
ψ
ψ
= =
= =
= =
= =
=
=
0.6
0.1
0.3
0.1
0.1
0.2
30
Viterbi Algorithm (Forward Calculations)Viterbi Algorithm (Forward Calculations)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
0.6
0.1
0.3
0.1
0.1
0.2
31
Viterbi Algorithm (backtracking)Viterbi Algorithm (backtracking)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
0.6
0.1
0.3
0.1
0.1
0.2
32
Viterbi Algorithm (Forward Calculations t=4)Viterbi Algorithm (Forward Calculations t=4)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
S1
S2
0.6
0.1
0.3
0.1
0.1
0.2
33
Viterbi Algorithm (Backtracking to Obtain Labeling)Viterbi Algorithm (Backtracking to Obtain Labeling)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
S1
S2
0.6
0.1
0.3
0.1
0.1
0.2
34
ImplementingImplementing HMMHMM to speech Modelingto speech Modeling
((TrainingTraining andand RecognitionRecognition ))
- Building HMM speech models based on the- Building HMM speech models based on the
correspondence between the observation sequencescorrespondence between the observation sequences
YY and the state sequence (and the state sequence (SS).). (TRAINNING).(TRAINNING).
- Recognizing speech by the stored HMM models- Recognizing speech by the stored HMM models ΘΘ
and by the actual observation Y.and by the actual observation Y.
(RECOGNITION)(RECOGNITION)
Training HMM
Feature
Extraction
Recognition
W*Y
Y
S
Speech
Samples
Θ
35
RECOGNITIONRECOGNITION ProcessProcess
 Given an input speechGiven an input speech S=(sS=(s11,s,s22,…,s,…,sTT)) be the recognized .be the recognized .
 xxtt be the feature samples computed at timebe the feature samples computed at time tt, where the feature, where the feature
sequence from timesequence from time 11 toto tt is indicated as:is indicated as: X=(xX=(x11,x,x22,…,x,…,xtt ))..
 The recognized statesThe recognized states S*S* could be obtained by:could be obtained by:
S*=ArgMax P(S,X|S*=ArgMax P(S,X|ΦΦ))..
Dynamic Structure
Search Algorithm
S*
Static Structure Φ
St , P(xt,{st}|{st-1},Φ)
}St-1{
xt
36
ResultResult ((SSpeakerpeaker RRecognition)ecognition)
37
Table 1: Speaker recognition result
ResultResult ((IIsolatedsolated SRSR))
38
Table 3: Result for isolated speech recognition.
ResultResult ((CContinuousontinuous SRSR))
39
Table 3: Continuous Speech recognition result
ConclusionsConclusions
 No speech recognizer till now has 100%No speech recognizer till now has 100%
accuracy.accuracy.
 You should avoided poor quality microphoneYou should avoided poor quality microphone
consider using a better microphoneconsider using a better microphone
 On important matter is that , training theOn important matter is that , training the
computer will provide an even better experience.computer will provide an even better experience.
40
ThankThank
YouYou
41

More Related Content

PDF
Lec09- AI
PDF
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
PPTX
論文紹介 Fast imagetagging
PDF
Hidden Markov Random Field model and BFGS algorithm for Brain Image Segmentation
PDF
Ada boost brown boost performance with noisy data
PDF
Master thesis job shop generic time lag max plus
PDF
Parallel communicating flip pushdown automata systems
PDF
Parallel communicating flip pushdown automata systems communicating by stacks
Lec09- AI
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
論文紹介 Fast imagetagging
Hidden Markov Random Field model and BFGS algorithm for Brain Image Segmentation
Ada boost brown boost performance with noisy data
Master thesis job shop generic time lag max plus
Parallel communicating flip pushdown automata systems
Parallel communicating flip pushdown automata systems communicating by stacks

What's hot (19)

PPT
free Videos lecture in India
PDF
Tensor train to solve stochastic PDEs
PPT
Admission in india
PPT
Admmission in India
PDF
My presentation at University of Nottingham "Fast low-rank methods for solvin...
PDF
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
PDF
Estimation of the score vector and observed information matrix in intractable...
PDF
Microchip Mfg. problem
PPT
Chap09alg
PPT
Optimization toolbox presentation
PDF
Low-rank tensor methods for stochastic forward and inverse problems
PDF
Optimization
PPT
R/Finance 2009 Chicago
PDF
Conditional neural processes
PDF
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...
PDF
ARIC Team Seminar
PDF
Slides ensae-2016-8
PDF
Convex optimization methods
PDF
PECCS 2014
free Videos lecture in India
Tensor train to solve stochastic PDEs
Admission in india
Admmission in India
My presentation at University of Nottingham "Fast low-rank methods for solvin...
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
Estimation of the score vector and observed information matrix in intractable...
Microchip Mfg. problem
Chap09alg
Optimization toolbox presentation
Low-rank tensor methods for stochastic forward and inverse problems
Optimization
R/Finance 2009 Chicago
Conditional neural processes
FPGA Implementation of A New Chien Search Block for Reed-Solomon Codes RS (25...
ARIC Team Seminar
Slides ensae-2016-8
Convex optimization methods
PECCS 2014
Ad

Similar to Performance analysis of bangla speech recognizer model using hmm (20)

PDF
continious hmm.pdf
PDF
Andrey Kuznetsov and Vladislav Myasnikov - Using Efficient Linear Local Feat...
PPT
Geohydrology ii (3)
PPT
Aocr Hmm Presentation
PDF
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
PPTX
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
PPT
Learn Matlab
PDF
Iberspeech2012
PPT
Hidden Markov Model in Natural Language Processing
PPTX
Seminar psu 20.10.2013
PPT
19 algorithms-and-complexity-110627100203-phpapp02
PDF
2012 mdsp pr06  hmm
PDF
Fine Grained Complexity
PDF
ASR_final
PDF
presentazione
PDF
Perm winter school 2014.01.31
PPTX
Introduction to Algorithms and Asymptotic Notation
PDF
Efficient Implementation of Self-Organizing Map for Sparse Input Data
PPT
Rfid presentation in internet
PPTX
Vu_HPSC2012_02.pptx
continious hmm.pdf
Andrey Kuznetsov and Vladislav Myasnikov - Using Efficient Linear Local Feat...
Geohydrology ii (3)
Aocr Hmm Presentation
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
Learn Matlab
Iberspeech2012
Hidden Markov Model in Natural Language Processing
Seminar psu 20.10.2013
19 algorithms-and-complexity-110627100203-phpapp02
2012 mdsp pr06  hmm
Fine Grained Complexity
ASR_final
presentazione
Perm winter school 2014.01.31
Introduction to Algorithms and Asymptotic Notation
Efficient Implementation of Self-Organizing Map for Sparse Input Data
Rfid presentation in internet
Vu_HPSC2012_02.pptx
Ad

More from Abdullah al Mamun (20)

PPTX
Underfitting and Overfitting in Machine Learning
PPTX
Recurrent Neural Networks (RNNs)
PPTX
Random Forest
PPTX
Principal Component Analysis PCA
PPTX
Natural Language Processing (NLP)
PPTX
Naive Bayes
PPTX
Multilayer Perceptron Neural Network MLP
PPTX
Long Short Term Memory LSTM
PPTX
Linear Regression
PPTX
K-Nearest Neighbor(KNN)
PPTX
Hidden Markov Model (HMM)
PPTX
Ensemble Method (Bagging Boosting)
PPTX
Convolutional Neural Networks CNN
PPTX
Artificial Neural Network ANN
PPTX
Reinforcement Learning, Application and Q-Learning
PPTX
Session on evaluation of DevSecOps
PPTX
Artificial Intelligence: Classification, Applications, Opportunities, and Cha...
PPTX
DevOps Presentation.pptx
PPTX
Python Virtual Environment.pptx
PPTX
Artificial intelligence Presentation.pptx
Underfitting and Overfitting in Machine Learning
Recurrent Neural Networks (RNNs)
Random Forest
Principal Component Analysis PCA
Natural Language Processing (NLP)
Naive Bayes
Multilayer Perceptron Neural Network MLP
Long Short Term Memory LSTM
Linear Regression
K-Nearest Neighbor(KNN)
Hidden Markov Model (HMM)
Ensemble Method (Bagging Boosting)
Convolutional Neural Networks CNN
Artificial Neural Network ANN
Reinforcement Learning, Application and Q-Learning
Session on evaluation of DevSecOps
Artificial Intelligence: Classification, Applications, Opportunities, and Cha...
DevOps Presentation.pptx
Python Virtual Environment.pptx
Artificial intelligence Presentation.pptx

Recently uploaded (20)

PPT
Mechanical Engineering MATERIALS Selection
PDF
Well-logging-methods_new................
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Geodesy 1.pptx...............................................
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
PPT on Performance Review to get promotions
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Mechanical Engineering MATERIALS Selection
Well-logging-methods_new................
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Geodesy 1.pptx...............................................
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
bas. eng. economics group 4 presentation 1.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPT on Performance Review to get promotions
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Internet of Things (IOT) - A guide to understanding
OOP with Java - Java Introduction (Basics)
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS

Performance analysis of bangla speech recognizer model using hmm

  • 1. Performance analysis of Bangla Speech Recognizer model using Hidden Markov Model (HMM) Submitted by: Md. Abdullah-al-MAMUN 1
  • 2. OUTLINEOUTLINE  What is speech recognition ?What is speech recognition ?  The Structure of ASRThe Structure of ASR  Speech DatabaseSpeech Database  Feature ExtractionFeature Extraction  Hidden Markov ModelHidden Markov Model  Forward algorithmForward algorithm  Backward algorithmBackward algorithm  Viterbi algorithmViterbi algorithm  Training & RecogntionTraining & Recogntion  ResultResult  ConclusionsConclusions  ReferencesReferences 2
  • 3. What isWhat is SSpeechpeech RRecognitionecognition??  In Computer Science, In Computer Science, Speech recognitionSpeech recognition is is the translation of spoken words into text .the translation of spoken words into text .  Process of converting acoustic signal capturedProcess of converting acoustic signal captured by microphone to a set of words.by microphone to a set of words.  Speech recognition known as “AutomaticSpeech recognition known as “Automatic Speech Recognition (ASR) ”, “Speech to TextSpeech Recognition (ASR) ”, “Speech to Text (STT)".(STT)". 3
  • 4. Model ofModel of BBanglaangla SSpeechpeech RRecognitionecognition 4 Fig -1 : Simple model of Bangla Speech Recognition
  • 5. Database Signal Interface Feature Extraction Recognition Databases Training HMM The Structure ofThe Structure of ASRASR System:System: Figure 1 :Functional Scheme of an ASR SystemFigure 1 :Functional Scheme of an ASR System Speech samples X Y S W* 5
  • 6. Speech Database:Speech Database: -A speech database is a collection ofA speech database is a collection of recorded speech accessible on a computerrecorded speech accessible on a computer and supported with the necessaryand supported with the necessary transcriptions.transcriptions. -The databases collect the observationsThe databases collect the observations required for parameter estimations.required for parameter estimations. -In this ASR system, I have used aboutIn this ASR system, I have used about 1200 keywords.1200 keywords. 6
  • 7. Classification of KeywordsClassification of Keywords Bengal Word Independent Dependent Vowel Consonant Modifier Character Compound Character 7
  • 9. Speech Signal AnalysisSpeech Signal Analysis Feature Extraction for ASR:Feature Extraction for ASR: - The aim is to extract the voice features to- The aim is to extract the voice features to distinguish different phonemes of a language.distinguish different phonemes of a language. 9 5 1 5 6 4 5 4 6 5 1 5 6 1 5 6 1 6 5 1 5 6 4 5 6 4 5 4 2 5 1 5 6 1 5 6 5 Feature Extraction
  • 10. MFCCMFCC extractionextraction Pre-emphasis DFT Mel filter banks Log(||2 ) IDFT Speech signal x(n) WINDOW x’ (n) xt (n) Xt(k) Yt(m) MFCC yt (m) (k) 10 MFCC means Mel-frequency cepstral coefficients that representation of the short-term power spectrum of a sound for audio processing. The MFCCs are the amplitudes of the resulting spectrum.
  • 11. Speech waveform of aSpeech waveform of a phoneme “ae”phoneme “ae” After pre-emphasis andAfter pre-emphasis and Hamming windowingHamming windowing Power spectrumPower spectrum MFCCMFCC Explanatory ExampleExplanatory Example 11
  • 12. FFeatureeature VVector toector to P(O|M)P(O|M) viavia HMMHMM 12 5 1 5 6 4 6 5 4 5 6 4 P(O|M)HMM For each input word O the HMM generate a corresponding probability P(O|M) that could be computed by the HMM.
  • 13. HMM ModelHMM Model 13 HMM is specified by a five-tuples =λ ( , , , , )S O A BΠ
  • 14. 14 Elements of an HMMElements of an HMM 1) Set of hidden states1) Set of hidden states S={1.2., … … N}S={1.2., … … N} 2) Set of observation symbols2) Set of observation symbols O={oO={o11, o, o22, … … o, … … oMM}} M: the number of observation symbolsM: the number of observation symbols 3) The initial state distribution3) The initial state distribution 4) State transition probability distribution4) State transition probability distribution 5) Observation symbol probability distribution in state j5) Observation symbol probability distribution in state j 1{ } ( | ), 1 ,ij ij t tA a a P s j s i i j N−= = = = ≤ ≤ { ( )} ( ) ( | ) 1 ,1j j t k tB b k b k P X o s j j N k M= = = = ≤ ≤ ≤ ≤ 0{ } ( ) 1i i P s i i Nπ π π= = = ≤ ≤
  • 15. 15 Three Basic Problems in HMMThree Basic Problems in HMM  1.The Evaluation Problem1.The Evaluation Problem –Given a model–Given a model λλ =(A, B, π)=(A, B, π) and aand a sequence of observations Osequence of observations O = (o= (o11, o, o22, o, o33,...o,...oMM )), what is the, what is the probability P(O|probability P(O|λλ); i.e., the probability of the model that); i.e., the probability of the model that generates the observations?generates the observations?  2.The Decoding Problem2.The Decoding Problem – Given a model– Given a model λλ =(A, B, π)=(A, B, π) and aand a sequence of observation Osequence of observation O = (o= (o11, o, o22, o, o33,...o,...oMM )), what is the, what is the most likely state sequence in the model that produces themost likely state sequence in the model that produces the observations?observations?  3.The Learning Problem3.The Learning Problem –Given a model–Given a model λλ =(A, B, π)=(A, B, π) and aand a set of observationsset of observations O = (oO = (o11, o, o22, o, o33,...o,...oMM )), how can we adjust, how can we adjust the model parameterthe model parameter λλ to maximize the joint probabilityto maximize the joint probability P(O|P(O|λλ)?)? How to evaluate an HMM? Forward Algorithm How to Decode an HMM? Viterbi Algorithm How to Train an HMM? Baum-Welch Algorithm
  • 16. 16 CalculateCalculate PProbabilityrobability ( O| M )( O| M ) Trellis: 0.5 0.3 0.2 P(up) P(down) P(no-change) 0.3 0.3 0.4 0.7 0.1 0.2 0.1 0.6 0.3 0.179 0.036 0.008 0.35 0.02 0.09 0.35*0.2*0.3 0.02*0.5*0.7 0.09*0.4*0.7 0.02*0.2*0.3 0.09*0.5*0.3 0.35*0.6*0.7 0.179*0.6*0.7 0.008*0.5*0.7 0.036*0.4*0.7 0.6 0.5 0.4 0.2 0.3 0.1 0.2 0.2 transition matrix 0.5 0.2230.46 add probabilities !
  • 17. Forward Calculations – OverviewForward Calculations – Overview S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 TIME 3 TIME 4 0.6 0.1 0.3 0.1 0.1 0.2 17
  • 18. Forward Calculations (t=2)Forward Calculations (t=2) S0 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 NOTE: that α1 (2)+ α2 (2) is the likelihood of the observation. 1 2 1 1 13 11 2 23 21 2 1 13 12 2 23 22 (1) 1 (1) 0 (2) (1) (1) 0.21 (2) (1) (1) 0.09 b a b a b a b a α α α α α α α α = = = + = = + = 0.6 0.1 0.3 0.1 0.1 0.2 18
  • 19. Forward Calculations (t=3)Forward Calculations (t=3) S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 TIME 3 TIME 4 α1(3) 0.6 0.1 0.3 0.1 0.1 0.2 19
  • 20. Forward Calculations (t=4)Forward Calculations (t=4) S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 TIME 3 TIME 4 S1 S2 0.6 0.1 0.3 0.1 0.1 0.2 20
  • 21. Forward Calculation ofForward Calculation of Likelihood FunctionLikelihood Function t=1 t=2 t=3 t=4 α1(t) 1.0 π1 =1 0.21 α1(1) a11 b13 +α2(1) a21 b23 0.0462 α1(2)a11 b12 +α2(2)a21 b12 0.021294 α2(t) 0.0 π2 =0 0.09 α1(1) a12 b13 +α2(1) a22 b23 0.0378 0.010206 L(t) p(K1… Kt) 1.0 α1(1) +α2(1) 0.3 α1(2) +α2(2) 0.084 α1(3) +α2(3) 0.0315 α1(4) +α2(4) 21
  • 22. Backward Calculations – OverviewBackward Calculations – Overview S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 TIME 3 TIME 4 0.6 0.1 0.3 0.1 0.1 0.2 22
  • 23. Backward Calculations (t=3)Backward Calculations (t=3) S1 S2 TIME 3 0.6 0.1 0.3 0.1 0.1 0.2 23
  • 24. Backward Calculations (t=2)Backward Calculations (t=2) S1 S2 S1 S2 TIME 2 TIME 3 TIME 4 a22=0.5 a11=0.7 a12=0.3 a21=0.5 NOTE: that β1 (2)+ β2 (2) is the likelihood the observation/word sequence. 1 2 1 2 1 1 11 12 2 12 12 2 1 21 22 2 22 22 (4) 1 (4) 1 (3) 0.6 (3) 0.1 (2) (3) (3) 0.045 (2) (3) (3) 0.245 a b a b a b a b β β β β β β β β β β = = = = = + = = + = 0.6 0.1 0.3 0.1 0.1 0.2 24
  • 25. Backward Calculations (t=1)Backward Calculations (t=1) S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a21=0.5 TIME 2 TIME 3 TIME 4 0.6 0.1 0.3 0.1 0.1 0.2 25
  • 26. Backward Calculation ofBackward Calculation of Likelihood FunctionLikelihood Function t=1 t=2 t=3 t=4 β1(t) 0.0315 0.045 a11b11 β1(1) + + a12b21 β1(1) 0.6 b11 1 β2(t) 0.029 0.245 a11b11 β1(1) + + a12b21 β1(1) 0.1 b21 1 L(t) p(Kt… KT) 0.0315 π1 β1(1) + π2 β2(1) 0.290 β1(2) +β2(2) 0.7 β1(3) + β2(3) 1 26
  • 27. 27 CalculateCalculate maxmaxSS Prob.Prob. state sequencestate sequence SS 0.35 0.09 0.02 P(up) P(down) P(no-change) 0.3 0.3 0.4 0.7 0.1 0.2 0.1 0.6 0.3 0.147 0.021 0.007 0.35*0.2*0.3 0.02*0.5*0.7 0.09*0.4*0.7 0.02*0.2*0.3 0.09*0.5*0.3 0.35*0.6*0.7 0.147*0.6*0.7 0.007*0.5*0.7 0.021*0.4*0.7 0.5 0.2 0.3 best Select highest probability !
  • 28. Viterbi Algorithm – OverviewViterbi Algorithm – Overview S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 TIME 3 TIME 4 0.6 0.1 0.3 0.1 0.1 0.2 28
  • 29. Viterbi Algorithm (Forward Calculations t=2)Viterbi Algorithm (Forward Calculations t=2) S0 S1 S2 S1 S2 π1=1 π2=0 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 1 1 2 2 1 1 13 11 2 23 21 2 1 13 12 2 23 22 1 2 (1) 1 (1) 0 (2) max{ (1) , (1) } 0.21 (2) max{ (1) , (1) } 0.09 (2) 1 (2) 1 b a b a b a b a δ π δ π δ δ δ δ δ δ ψ ψ = = = = = = = = = = 0.6 0.1 0.3 0.1 0.1 0.2 29
  • 30. Viterbi Algorithm (Backtracking t=2)Viterbi Algorithm (Backtracking t=2) S0 S1 S2 S1 S2 π1=1 π2=0 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 1 1 2 2 1 1 13 11 2 23 21 2 1 13 12 2 23 22 1 2 (1) 1 (1) 0 (2) max{ (1) , (1) } 0.21 (2) max{ (1) , (1) } 0.09 (2) 1 (2) 1 b a b a b a b a δ π δ π δ δ δ δ δ δ ψ ψ = = = = = = = = = = 0.6 0.1 0.3 0.1 0.1 0.2 30
  • 31. Viterbi Algorithm (Forward Calculations)Viterbi Algorithm (Forward Calculations) S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 TIME 3 TIME 4 0.6 0.1 0.3 0.1 0.1 0.2 31
  • 32. Viterbi Algorithm (backtracking)Viterbi Algorithm (backtracking) S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 TIME 3 TIME 4 0.6 0.1 0.3 0.1 0.1 0.2 32
  • 33. Viterbi Algorithm (Forward Calculations t=4)Viterbi Algorithm (Forward Calculations t=4) S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 TIME 3 TIME 4 S1 S2 0.6 0.1 0.3 0.1 0.1 0.2 33
  • 34. Viterbi Algorithm (Backtracking to Obtain Labeling)Viterbi Algorithm (Backtracking to Obtain Labeling) S0 S1 S2 S1 S2 S1 S2 π1 π2 a12=0.3 a11=0.7 a22=0.5 a21=0.5 TIME 2 TIME 3 TIME 4 S1 S2 0.6 0.1 0.3 0.1 0.1 0.2 34
  • 35. ImplementingImplementing HMMHMM to speech Modelingto speech Modeling ((TrainingTraining andand RecognitionRecognition )) - Building HMM speech models based on the- Building HMM speech models based on the correspondence between the observation sequencescorrespondence between the observation sequences YY and the state sequence (and the state sequence (SS).). (TRAINNING).(TRAINNING). - Recognizing speech by the stored HMM models- Recognizing speech by the stored HMM models ΘΘ and by the actual observation Y.and by the actual observation Y. (RECOGNITION)(RECOGNITION) Training HMM Feature Extraction Recognition W*Y Y S Speech Samples Θ 35
  • 36. RECOGNITIONRECOGNITION ProcessProcess  Given an input speechGiven an input speech S=(sS=(s11,s,s22,…,s,…,sTT)) be the recognized .be the recognized .  xxtt be the feature samples computed at timebe the feature samples computed at time tt, where the feature, where the feature sequence from timesequence from time 11 toto tt is indicated as:is indicated as: X=(xX=(x11,x,x22,…,x,…,xtt ))..  The recognized statesThe recognized states S*S* could be obtained by:could be obtained by: S*=ArgMax P(S,X|S*=ArgMax P(S,X|ΦΦ)).. Dynamic Structure Search Algorithm S* Static Structure Φ St , P(xt,{st}|{st-1},Φ) }St-1{ xt 36
  • 38. ResultResult ((IIsolatedsolated SRSR)) 38 Table 3: Result for isolated speech recognition.
  • 39. ResultResult ((CContinuousontinuous SRSR)) 39 Table 3: Continuous Speech recognition result
  • 40. ConclusionsConclusions  No speech recognizer till now has 100%No speech recognizer till now has 100% accuracy.accuracy.  You should avoided poor quality microphoneYou should avoided poor quality microphone consider using a better microphoneconsider using a better microphone  On important matter is that , training theOn important matter is that , training the computer will provide an even better experience.computer will provide an even better experience. 40

Editor's Notes