Performance analysis of bangla speech recognizer model using hmm

Performance analysis of Bangla
Speech Recognizer model using
Hidden Markov Model (HMM)
Submitted by:
Md. Abdullah-al-MAMUN
1

OUTLINEOUTLINE
 What is speech recognition ?What is speech recognition ?
 The Structure of ASRThe Structure of ASR
 Speech DatabaseSpeech Database
 Feature ExtractionFeature Extraction
 Hidden Markov ModelHidden Markov Model
 Forward algorithmForward algorithm
 Backward algorithmBackward algorithm
 Viterbi algorithmViterbi algorithm
 Training & RecogntionTraining & Recogntion
 ResultResult
 ConclusionsConclusions
 ReferencesReferences
2

What isWhat is SSpeechpeech RRecognitionecognition??
 In Computer Science, In Computer Science, Speech recognitionSpeech recognition is is
the translation of spoken words into text .the translation of spoken words into text .
 Process of converting acoustic signal capturedProcess of converting acoustic signal captured
by microphone to a set of words.by microphone to a set of words.
 Speech recognition known as “AutomaticSpeech recognition known as “Automatic
Speech Recognition (ASR) ”, “Speech to TextSpeech Recognition (ASR) ”, “Speech to Text
(STT)".(STT)".
3

Model ofModel of BBanglaangla SSpeechpeech
RRecognitionecognition
4
Fig -1 : Simple model of Bangla Speech Recognition

Database Signal
Interface
Feature
Extraction
Recognition
Databases
Training HMM
The Structure ofThe Structure of ASRASR System:System:
Figure 1 :Functional Scheme of an ASR SystemFigure 1 :Functional Scheme of an ASR System
Speech
samples
X Y
S
W*
5

Speech Database:Speech Database:
-A speech database is a collection ofA speech database is a collection of
recorded speech accessible on a computerrecorded speech accessible on a computer
and supported with the necessaryand supported with the necessary
transcriptions.transcriptions.
-The databases collect the observationsThe databases collect the observations
required for parameter estimations.required for parameter estimations.
-In this ASR system, I have used aboutIn this ASR system, I have used about
1200 keywords.1200 keywords.
6

Classification of KeywordsClassification of Keywords
Bengal Word
Independent Dependent
Vowel Consonant
Modifier
Character
Compound
Character
7

DDatabaseatabase CCreationreation PProcessrocess
Database
8

Speech Signal AnalysisSpeech Signal Analysis
Feature Extraction for ASR:Feature Extraction for ASR:
- The aim is to extract the voice features to- The aim is to extract the voice features to
distinguish different phonemes of a language.distinguish different phonemes of a language.
9
5
1
5
6
4
5
4
6
5
1
5
6
1
5
6
1
6
5
1
5
6
4
5
6
4
5
4
2
5
1
5
6
1
5
6
5
Feature
Extraction

MFCCMFCC extractionextraction
Pre-emphasis DFT
Mel filter
banks
Log(||2
) IDFT
Speech
signal
x(n)
WINDOW
x’
(n)
xt (n)
Xt(k)
Yt(m)
MFCC
yt
(m)
(k)
10
MFCC means Mel-frequency cepstral coefficients that
representation of the short-term power spectrum of a sound for
audio processing.
The MFCCs are the amplitudes of the resulting spectrum.

Speech waveform of aSpeech waveform of a
phoneme “ae”phoneme “ae”
After pre-emphasis andAfter pre-emphasis and
Hamming windowingHamming windowing
Power spectrumPower spectrum MFCCMFCC
Explanatory ExampleExplanatory Example
11

FFeatureeature VVector toector to P(O|M)P(O|M) viavia
HMMHMM
12
5
1
5
6
4
6
5
4
5
6
4
P(O|M)HMM
For each input word O the HMM generate a corresponding
probability P(O|M) that could be computed by the HMM.

HMM ModelHMM Model
13
HMM is specified by a five-tuples =λ ( , , , , )S O A BΠ

14
Elements of an HMMElements of an HMM
1) Set of hidden states1) Set of hidden states S={1.2., … … N}S={1.2., … … N}
2) Set of observation symbols2) Set of observation symbols O={oO={o11, o, o22, … … o, … … oMM}}
M: the number of observation symbolsM: the number of observation symbols
3) The initial state distribution3) The initial state distribution
4) State transition probability distribution4) State transition probability distribution
5) Observation symbol probability distribution in state j5) Observation symbol probability distribution in state j
1{ } ( | ), 1 ,ij ij t tA a a P s j s i i j N−= = = = ≤ ≤
{ ( )} ( ) ( | ) 1 ,1j j t k tB b k b k P X o s j j N k M= = = = ≤ ≤ ≤ ≤
0{ } ( ) 1i i P s i i Nπ π π= = = ≤ ≤

15
Three Basic Problems in HMMThree Basic Problems in HMM
 1.The Evaluation Problem1.The Evaluation Problem –Given a model–Given a model λλ =(A, B, π)=(A, B, π) and aand a
sequence of observations Osequence of observations O = (o= (o11, o, o22, o, o33,...o,...oMM )), what is the, what is the
probability P(O|probability P(O|λλ); i.e., the probability of the model that); i.e., the probability of the model that
generates the observations?generates the observations?
 2.The Decoding Problem2.The Decoding Problem – Given a model– Given a model λλ =(A, B, π)=(A, B, π) and aand a
sequence of observation Osequence of observation O = (o= (o11, o, o22, o, o33,...o,...oMM )), what is the, what is the
most likely state sequence in the model that produces themost likely state sequence in the model that produces the
observations?observations?
 3.The Learning Problem3.The Learning Problem –Given a model–Given a model λλ =(A, B, π)=(A, B, π) and aand a
set of observationsset of observations O = (oO = (o11, o, o22, o, o33,...o,...oMM )), how can we adjust, how can we adjust
the model parameterthe model parameter λλ to maximize the joint probabilityto maximize the joint probability
P(O|P(O|λλ)?)?
How to evaluate an HMM?
Forward Algorithm
How to Decode an HMM?
Viterbi Algorithm
How to Train an HMM?
Baum-Welch Algorithm

16
CalculateCalculate
PProbabilityrobability ( O| M )( O| M )
Trellis:
0.5
0.3
0.2
P(up)
P(down)
P(no-change)
0.3
0.3
0.4
0.7
0.1
0.2
0.1
0.6
0.3
0.179
0.036
0.008
0.35
0.02
0.09
0.35*0.2*0.3
0.02*0.5*0.7
0.09*0.4*0.7
0.02*0.2*0.3
0.09*0.5*0.3
0.35*0.6*0.7 0.179*0.6*0.7
0.008*0.5*0.7
0.036*0.4*0.7
0.6
0.5
0.4
0.2
0.3
0.1
0.2
0.2 transition matrix
0.5
0.2230.46
add probabilities !

Forward Calculations – OverviewForward Calculations – Overview
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2 TIME 3 TIME 4
0.6
0.1
0.3
0.1
0.1
0.2
17

Forward Calculations (t=2)Forward Calculations (t=2)
S0
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2
NOTE: that α1 (2)+ α2 (2)
is the likelihood of the
observation.
1
2
1 1 13 11 2 23 21
2 1 13 12 2 23 22
(1) 1
(1) 0
(2) (1) (1) 0.21
(2) (1) (1) 0.09
b a b a
b a b a
α
α
α α α
α α α
=
=
= + =
= + =
0.6
0.1
0.3
0.1
0.1
0.2
18

S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
α1(3)
0.6
0.1
0.3
0.1
0.1
0.2
19

S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
S1
S2
0.6
0.1
0.3
0.1
0.1
0.2
20

Forward Calculation ofForward Calculation of
Likelihood FunctionLikelihood Function
t=1 t=2 t=3 t=4
α1(t) 1.0
π1 =1
0.21
α1(1) a11 b13
+α2(1) a21 b23
0.0462
α1(2)a11 b12
+α2(2)a21 b12
0.021294
α2(t) 0.0
π2 =0
0.09
α1(1) a12 b13
+α2(1) a22 b23
0.0378 0.010206
L(t)
p(K1… Kt)
1.0
α1(1) +α2(1)
0.3
α1(2) +α2(2)
0.084
α1(3) +α2(3)
0.0315
α1(4) +α2(4)
21

Backward Calculations – OverviewBackward Calculations – Overview
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
0.6
0.1
0.3
0.1
0.1
0.2
22

Backward Calculations (t=3)Backward Calculations (t=3)
S1
S2
TIME 3
0.6
0.1
0.3
0.1
0.1
0.2
23

S1
S2
S1
S2
a22=0.5
a11=0.7
a12=0.3
a21=0.5
NOTE: that β1 (2)+ β2 (2)
is the likelihood the
observation/word sequence.
1
2
1
2
1 1 11 12 2 12 12
2 1 21 22 2 22 22
(4) 1
(4) 1
(3) 0.6
(3) 0.1
(2) (3) (3) 0.045
(2) (3) (3) 0.245
a b a b
a b a b
β
β
β
β
β β β
β β β
=
=
=
=
= + =
= + =
0.6
0.1
0.3
0.1
0.1
0.2
24

S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a21=0.5
0.6
0.1
0.3
0.1
0.1
0.2
25

Backward Calculation ofBackward Calculation of
Likelihood FunctionLikelihood Function
t=1 t=2 t=3 t=4
β1(t) 0.0315 0.045
a11b11 β1(1) +
+ a12b21 β1(1)
0.6
b11
1
β2(t) 0.029 0.245
a11b11 β1(1) +
+ a12b21 β1(1)
0.1
b21
1
L(t)
p(Kt… KT)
0.0315
π1 β1(1) +
π2 β2(1)
0.290
β1(2) +β2(2)
0.7
β1(3) + β2(3)
1
26

27
CalculateCalculate
maxmaxSS Prob.Prob. state sequencestate sequence SS
0.35
0.09
0.02
P(up)
P(down)
P(no-change)
0.3
0.3
0.4
0.7
0.1
0.2
0.1
0.6
0.3
0.147
0.021
0.007
0.35*0.2*0.3
0.02*0.5*0.7
0.09*0.4*0.7
0.02*0.2*0.3
0.09*0.5*0.3
0.35*0.6*0.7 0.147*0.6*0.7
0.007*0.5*0.7
0.021*0.4*0.7
0.5
0.2
0.3
best
Select highest probability !

Viterbi Algorithm – OverviewViterbi Algorithm – Overview
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
0.6
0.1
0.3
0.1
0.1
0.2
28

Viterbi Algorithm (Forward Calculations t=2)Viterbi Algorithm (Forward Calculations t=2)
S0
S1
S2
S1
S2
π1=1
π2=0
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2
1 1
2 2
1 1 13 11 2 23 21
2 1 13 12 2 23 22
1
2
(1) 1
(1) 0
(2) max{ (1) , (1) } 0.21
(2) max{ (1) , (1) } 0.09
(2) 1
(2) 1
b a b a
b a b a
δ π
δ π
δ δ δ
δ δ δ
ψ
ψ
= =
= =
= =
= =
=
=
0.6
0.1
0.3
0.1
0.1
0.2
29

Viterbi Algorithm (Backtracking t=2)Viterbi Algorithm (Backtracking t=2)
S0
S1
S2
S1
S2
π1=1
π2=0
a12=0.3
a11=0.7
a22=0.5
a21=0.5
TIME 2
1 1
2 2
1 1 13 11 2 23 21
2 1 13 12 2 23 22
1
2
(1) 1
(1) 0
(2) max{ (1) , (1) } 0.21
(2) max{ (1) , (1) } 0.09
(2) 1
(2) 1
b a b a
b a b a
δ π
δ π
δ δ δ
δ δ δ
ψ
ψ
= =
= =
= =
= =
=
=
0.6
0.1
0.3
0.1
0.1
0.2
30

Viterbi Algorithm (Forward Calculations)Viterbi Algorithm (Forward Calculations)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
0.6
0.1
0.3
0.1
0.1
0.2
31

Viterbi Algorithm (backtracking)Viterbi Algorithm (backtracking)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
0.6
0.1
0.3
0.1
0.1
0.2
32

Viterbi Algorithm (Forward Calculations t=4)Viterbi Algorithm (Forward Calculations t=4)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
S1
S2
0.6
0.1
0.3
0.1
0.1
0.2
33

Viterbi Algorithm (Backtracking to Obtain Labeling)Viterbi Algorithm (Backtracking to Obtain Labeling)
S0
S1
S2
S1
S2
S1
S2
π1
π2
a12=0.3
a11=0.7
a22=0.5
a21=0.5
S1
S2
0.6
0.1
0.3
0.1
0.1
0.2
34

ImplementingImplementing HMMHMM to speech Modelingto speech Modeling
((TrainingTraining andand RecognitionRecognition ))
- Building HMM speech models based on the- Building HMM speech models based on the
correspondence between the observation sequencescorrespondence between the observation sequences
YY and the state sequence (and the state sequence (SS).). (TRAINNING).(TRAINNING).
- Recognizing speech by the stored HMM models- Recognizing speech by the stored HMM models ΘΘ
and by the actual observation Y.and by the actual observation Y.
(RECOGNITION)(RECOGNITION)
Training HMM
Feature
Extraction
Recognition
W*Y
Y
S
Speech
Samples
Θ
35

RECOGNITIONRECOGNITION ProcessProcess
 Given an input speechGiven an input speech S=(sS=(s11,s,s22,…,s,…,sTT)) be the recognized .be the recognized .
 xxtt be the feature samples computed at timebe the feature samples computed at time tt, where the feature, where the feature
sequence from timesequence from time 11 toto tt is indicated as:is indicated as: X=(xX=(x11,x,x22,…,x,…,xtt ))..
 The recognized statesThe recognized states S*S* could be obtained by:could be obtained by:
S*=ArgMax P(S,X|S*=ArgMax P(S,X|ΦΦ))..
Dynamic Structure
Search Algorithm
S*
Static Structure Φ
St , P(xt,{st}|{st-1},Φ)
}St-1{
xt
36

ResultResult ((SSpeakerpeaker RRecognition)ecognition)
37
Table 1: Speaker recognition result

ResultResult ((IIsolatedsolated SRSR))
38
Table 3: Result for isolated speech recognition.

ResultResult ((CContinuousontinuous SRSR))
39
Table 3: Continuous Speech recognition result

ConclusionsConclusions
 No speech recognizer till now has 100%No speech recognizer till now has 100%
accuracy.accuracy.
 You should avoided poor quality microphoneYou should avoided poor quality microphone
consider using a better microphoneconsider using a better microphone
 On important matter is that , training theOn important matter is that , training the
computer will provide an even better experience.computer will provide an even better experience.
40

Performance analysis of bangla speech recognizer model using hmm

More Related Content

What's hot (19)

Similar to Performance analysis of bangla speech recognizer model using hmm (20)

More from Abdullah al Mamun (20)

Recently uploaded (20)

Performance analysis of bangla speech recognizer model using hmm

Editor's Notes