SlideShare a Scribd company logo
Submit by:
Banzadio salazaku (ASU2013010100016)
Submit to: Mrs Kavita Jindal
1
1 Overview
Voice recognition is the process of automatically recognizing who is speaking on the
basis of individual information included in speech waves. This technique makes it
possible to use the speaker's voice to verify their identity and control access to services
such as voice dialing, banking by telephone, telephone shopping, database access
services, information services, voice mail, security control for confidential information
areas, and remote access to computers.
This document describes how to build a simple, yet complete and representative
automatic speaker recognition system. Such a speaker recognition system has potential
in many security applications. For example, users have to speak a PIN (Personal
Identification Number) in order to gain access to the laboratory door, or users have to
speak their credit card number over the telephone line to verify their identity. By
checking the voice characteristics of the input utterance, using an automatic speaker
recognition system similar to the one that we will describe, the system is able to add an
extra level of security.
1 Principles of Speaker Recognition
2
Speaker recognition can be classified into identification and verification. Speaker
identification is the process of determining which registered speaker provides a given
utterance. Speaker verification, on the other hand, is the process of accepting or rejecting
the identity claim of a speaker. Figure 1 shows the basic structures of speaker
identification and verification systems. The system that we will describe is classified as
text-independent speaker identification system since its task is to identify the person who
speaks regardless of what is saying.
At the highest level, all speaker recognition systems contain two main modules (refer
to Figure 1): feature extraction and feature matching. Feature extraction is the process
that extracts a small amount of data from the voice signal that can later be used to
represent each speaker. Feature matching involves the actual procedure to identify the
unknown speaker by comparing extracted features from his/her voice input with the ones
from a set of known speakers. We will discuss each module in detail in later sections.
(a) Speaker identification
(b) Speaker verification
3
Input
speech
Feature
extraction
Reference
model
(Speaker #1)
Similarity
Reference
model
(Speaker #N)
Similarity
Maximum
selection
Identification
result
(Speaker ID)
Reference
model
(Speaker #M)
Similarity
Input
speech
Feature
extraction
Verification
result
(Accept/Reject)
Decision
ThresholdSpeaker ID
(#M)
Figure 1. Basic structures of speaker recognition systems
All speaker recognition systems have to serve two distinguished phases. The first one
is referred to the enrolment or training phase, while the second one is referred to as the
operational or testing phase. In the training phase, each registered speaker has to provide
samples of their speech so that the system can build or train a reference model for that
speaker. In case of speaker verification systems, in addition, a speaker-specific threshold
is also computed from the training samples. In the testing phase, the input speech is
matched with stored reference model(s) and a recognition decision is made.
Speaker recognition is a difficult task. Automatic speaker recognition works based
on the premise that a person’s speech exhibits characteristics that are unique to the
speaker. However this task has been challenged by the highly variant of input speech
signals. The principle source of variance is the speaker himself/herself. Speech signals
in training and testing sessions can be greatly different due to many facts such as people
voice change with time, health conditions (e.g. the speaker has a cold), speaking rates,
and so on. There are also other factors, beyond speaker variability, that present a
challenge to speaker recognition technology. Examples of these are acoustical noise and
variations in recording environments (e.g. speaker uses different telephone handsets).
2 Speech Feature Extraction
2.1 Introduction
The purpose of this module is to convert the speech waveform, using digital signal
processing (DSP) tools, to a set of features (at a considerably lower information rate) for
further analysis. This is often referred as the signal-processing front end.
The speech signal is a slowly timed varying signal (it is called quasi-stationary). An
example of speech signal is shown in Figure 2. When examined over a sufficiently short
period of time (between 5 and 100 msec), its characteristics are fairly stationary.
However, over long periods of time (on the order of 1/5 seconds or more) the signal
4
characteristic change to reflect the different speech sounds being spoken. Therefore,
short-time spectral analysis is the most common way to characterize the speech signal.
Figure 2. Example of speech signal
A wide range of possibilities exist for parametrically representing the speech signal
for the speaker recognition task, such as Linear Prediction Coding (LPC), Mel-Frequency
Cepstrum Coefficients (MFCC), and others. MFCC is perhaps the best known and most
popular, and will be described in this paper.
MFCC’s are based on the known variation of the human ear’s critical bandwidths
with frequency, filters spaced linearly at low frequencies and logarithmically at high
frequencies have been used to capture the phonetically important characteristics of
speech. This is expressed in the mel-frequency scale, which is a linear frequency spacing
below 1000 Hz and a logarithmic spacing above 1000 Hz. The process of computing
MFCCs is described in more detail next.
2.2 Mel-frequency cepstrum coefficients processor
A block diagram of the structure of an MFCC processor is given in Figure 3. The
speech input is typically recorded at a sampling rate above 10000 Hz. This sampling
frequency was chosen to minimize the effects of aliasing in the analog-to-digital
conversion. These sampled signals can capture all frequencies up to 5 kHz, which cover
5
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Time (second)
most energy of sounds that are generated by humans. As been discussed previously, the
main purpose of the MFCC processor is to mimic the behavior of the human ears. In
addition, rather than the speech waveforms themselves, MFFC’s are shown to be less
susceptible to mentioned variations.
Figure 3. Block diagram of the MFCC processor
2.2.1 Frame Blocking
In this step the continuous speech signal is blocked into frames of N samples, with
adjacent frames being separated by M (M < N). The first frame consists of the first N
samples. The second frame begins M samples after the first frame, and overlaps it by N -
M samples and so on. This process continues until all the speech is accounted for within
one or more frames. Typical values for N and M are N = 256 (which is equivalent to ~
30 msec windowing and facilitate the fast radix-2 FFT) and M = 100.
2.2.2 Windowing
The next step in the processing is to window each individual frame so as to minimize
the signal discontinuities at the beginning and end of each frame. The concept here is to
minimize the spectral distortion by using the window to taper the signal to zero at the
beginning and end of each frame. If we define the window as 10),( −≤≤ Nnnw ,
where N is the number of samples in each frame, then the result of windowing is the
signal
10),()()( −≤≤= Nnnwnxny ll
Typically the Hamming window is used, which has the form:
10,
1
2
cos46.054.0)( −≤≤





−
−= Nn
N
n
nw
π
6
mel
cepstrum
mel
spectrum
framecontinuous
speech
Frame
Blocking
Windowing FFT spectrum
Mel-frequency
Wrapping
Cepstrum
2.2.3 Fast Fourier Transform (FFT)
The next processing step is the Fast Fourier Transform, which converts each frame of
N samples from the time domain into the frequency domain. The FFT is a fast algorithm
to implement the Discrete Fourier Transform (DFT), which is defined on the set of N
samples {xn}, as follow:
∑
−
=
−
−==
1
0
/2
1,...,2,1,0,
N
n
Nknj
nk NkexX π
In general Xk’s are complex numbers and we only consider their absolute values
(frequency magnitudes). The resulting sequence {Xk} is interpreted as follow: positive
frequencies 2/0 sFf <≤ correspond to values 12/0 −≤≤ Nn , while negative
frequencies 02/ <<− fFs correspond to 112/ −≤≤+ NnN . Here, Fs denotes the
sampling frequency.
The result after this step is often referred to as spectrum or periodogram.
2.2.4 Mel-frequency Wrapping
As mentioned above, psychophysical studies have shown that human perception of
the frequency contents of sounds for speech signals does not follow a linear scale. Thus
for each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured
on a scale called the ‘mel’ scale. The mel-frequency scale is a linear frequency spacing
below 1000 Hz and a logarithmic spacing above 1000 Hz.
7
0 1000 2000 3000 4000 5000 6000 7000
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Mel-spacedfilterbank
Frequency(Hz)
Figure 4. An example of mel-spaced filterbank
One approach to simulating the subjective spectrum is to use a filter bank, spaced
uniformly on the mel-scale (see Figure 4). That filter bank has a triangular bandpass
frequency response, and the spacing as well as the bandwidth is determined by a constant
mel frequency interval. The number of mel spectrum coefficients, K, is typically chosen
as 20. Note that this filter bank is applied in the frequency domain, thus it simply
amounts to applying the triangle-shape windows as in the Figure 4 to the spectrum. A
useful way of thinking about this mel-wrapping filter bank is to view each filter as a
histogram bin (where bins have overlap) in the frequency domain.
2.2.5 Cepstrum
In this final step, we convert the log mel spectrum back to time. The result is called
the mel frequency cepstrum coefficients (MFCC). The cepstral representation of the
speech spectrum provides a good representation of the local spectral properties of the
signal for the given frame analysis. Because the mel spectrum coefficients (and so their
logarithm) are real numbers, we can convert them to the time domain using the Discrete
Cosine Transform (DCT). Therefore if we denote those mel power spectrum coefficients
that are the result of the last step are 1,...,2,0,
~
0 −= KkS , we can calculate the
MFCC's, ,~
nc as
Note that we exclude the first component, ,~
0c from the DCT since it represents the
mean value of the input signal, which carried little speaker specific information.
2.3 Summary
By applying the procedure described above, for each speech frame of around 30msec
with overlap, a set of mel-frequency cepstrum coefficients is computed. These are result
of a cosine transform of the logarithm of the short-term power spectrum expressed on a
mel-frequency scale. This set of coefficients is called an acoustic vector. Therefore each
input utterance is transformed into a sequence of acoustic vectors. In the next section we
will see how those acoustic vectors can be used to represent and recognize the voice
characteristic of the speaker.
8
K-1n
K
knSc
K
k
kn ,...,1,0,
2
1
cos)
~
(log~
1
=









−= ∑
=
π
3 Feature Matching
3.1 Overview
The problem of speaker recognition belongs to a much broader topic in scientific and
engineering so called pattern recognition. The goal of pattern recognition is to classify
objects of interest into one of a number of categories or classes. The objects of interest
are generically called patterns and in our case are sequences of acoustic vectors that are
extracted from an input speech using the techniques described in the previous section.
The classes here refer to individual speakers. Since the classification procedure in our
case is applied on extracted features, it can be also referred to as feature matching.
Furthermore, if there exists some set of patterns that the individual classes of which
are already known, then one has a problem in supervised pattern recognition. These
patterns comprise the training set and are used to derive a classification algorithm. The
remaining patterns are then used to test the classification algorithm; these patterns are
collectively referred to as the test set. If the correct classes of the individual patterns in
the test set are also known, then one can evaluate the performance of the algorithm.
The state-of-the-art in feature matching techniques used in speaker recognition
include Dynamic Time Warping (DTW), Hidden Markov Modeling (HMM), and Vector
Quantization (VQ). In this project, the VQ approach will be used, due to ease of
implementation and high accuracy. VQ is a process of mapping vectors from a large
vector space to a finite number of regions in that space. Each region is called a cluster
and can be represented by its center called a codeword. The collection of all codewords
is called a codebook.
Figure 5 shows a conceptual diagram to illustrate this recognition process. In the
figure, only two speakers and two dimensions of the acoustic space are shown. The
circles refer to the acoustic vectors from the speaker 1 while the triangles are from the
speaker 2. In the training phase, using the clustering algorithm described in Section 4.2,
a speaker-specific VQ codebook is generated for each known speaker by clustering
his/her training acoustic vectors. The result codewords (centroids) are shown in Figure 5
by black circles and black triangles for speaker 1 and 2, respectively. The distance from
a vector to the closest codeword of a codebook is called a VQ-distortion. In the
recognition phase, an input utterance of an unknown voice is “vector-quantized” using
each trained codebook and the total VQ distortion is computed. The speaker
corresponding to the VQ codebook with smallest total distortion is identified as the
speaker of the input utterance.
9
Speaker 1
Speaker 1
centroid
sample
Speaker 2
centroid
sample
Speaker 2
VQ distortion
Figure 5. Conceptual diagram illustrating vector quantization codebook formation.
One speaker can be discriminated from another based of the location of centroids.
3.2 Clustering the Training Vectors
After the enrolment session, the acoustic vectors extracted from input speech of each
speaker provide a set of training vectors for that speaker. As described above, the next
important step is to build a speaker-specific VQ codebook for each speaker using those
training vectors. There is a well-know algorithm, namely LBG algorithm [Linde, Buzo
and Gray, 1980], for clustering a set of L training vectors into a set of M codebook
vectors. The algorithm is formally implemented by the following recursive procedure:
1. Design a 1-vector codebook; this is the centroid of the entire set of training vectors
(hence, no iteration is required here).
2. Double the size of the codebook by splitting each current codebook yn according to
the rule
)1( ε+=+
nn yy
)1( ε−=−
nn yy
where n varies from 1 to the current size of the codebook, and ε is a splitting
parameter (we choose ε =0.01).
3. Nearest-Neighbor Search: for each training vector, find the codeword in the current
codebook that is closest (in terms of similarity measurement), and assign that vector
to the corresponding cell (associated with the closest codeword).
4. Centroid Update: update the codeword in each cell using the centroid of the training
vectors assigned to that cell.
5. Iteration 1: repeat steps 3 and 4 until the average distance falls below a preset
threshold
6. Iteration 2: repeat steps 2, 3 and 4 until a codebook size of M is designed.
10
Intuitively, the LBG algorithm designs an M-vector codebook in stages. It starts first
by designing a 1-vector codebook, then uses a splitting technique on the codewords to
initialize the search for a 2-vector codebook, and continues the splitting process until the
desired M-vector codebook is obtained.
Figure 6 shows, in a flow diagram, the detailed steps of the LBG algorithm. “Cluster
vectors” is the nearest-neighbor search procedure which assigns each training vector to a
cluster associated with the closest codeword. “Find centroids” is the centroid update
procedure. “Compute D (distortion)” sums the distances of all training vectors in the
nearest-neighbor search so as to determine whether the procedure has converged.
Find
centroid
Split each
centroid
Cluster
vectors
Find
centroids
Compute D
(distortion)
ε<
−
D
D'D
Stop
D’ = D
m = 2*m
No
Yes
Yes
No
m < M
Figure 6. Flow diagram of the LBG algorithm
4 Project
As stated before, in this project we will experiment with the building and testing of an
automatic speaker recognition system. In order to build such a system, one have to go
through the steps that were described in previous sections. The most convenient
platform for this is the Matlab environment since many of the above tasks were already
implemented in Matlab. The project Web page given at the beginning provides a test
database and several helper functions to ease the development process. We supplied you
with two utility functions: melfb and disteu; and two main functions: train and test.
Download all of these files from the project Web page into your working folder. The
first two files can be treated as a black box, but the later two needs to be thoroughly
understood. In fact, your tasks are to write two missing functions: mfcc and vqlbg, which
will be called from the given main functions. In order to accomplish that, follow each
11
step in this section carefully and check your understanding by answering all the
questions.
4.1 Speech Data
Down load the ZIP file of the speech database from the project Web page. After
unzipping the file correctly, you will find two folders, TRAIN and TEST, each contains 8
files, named: S1.WAV, S2.WAV, …, S8.WAV; each is labeled after the ID of the
speaker. These files were recorded in Microsoft WAV format. In Windows systems, you
can listen to the recorded sounds by double clicking into the files.
Our goal is to train a voice model (or more specific, a VQ codebook in the MFCC
vector space) for each speaker S1 - S8 using the corresponding sound file in the TRAIN
folder. After this training step, the system would have knowledge of the voice
characteristic of each (known) speaker. Next, in the testing phase, the system will be
able to identify the (assumed unknown) speaker of each sound file in the TEST folder.
4.2 Speech Processing
In this phase you are required to write a Matlab function that reads a sound file and
turns it into a sequence of MFCC (acoustic vectors) using the speech processing steps
described previously. Many of those tasks are already provided by either standard or our
supplied Matlab functions. The Matlab functions that you would need are: wavread,
hamming, fft, dct and melfb (supplied function). Type help function_name at the Matlab
prompt for more information about these functions.
4.3 Vector Quantization
The result of the last section is that we transform speech signals into vectors in an
acoustic space. In this section, we will apply the VQ-based pattern recognition technique
to build speaker reference models from those vectors in the training phase and then can
identify any sequences of acoustic vectors uttered by unknown speakers.
4.4 Simulation and Evaluation
Now is the final part! Use the two supplied programs: train and test (which require
two functions mfcc and vqlbg that you just complete) to simulate the training and testing
procedure in speaker recognition system, respectively.
12
REFERENCES
[1] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall,
Englewood Cliffs, N.J., 1993.
[2] L.R Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-
Hall, Englewood Cliffs, N.J., 1978.
[3] S.B. Davis and P. Mermelstein, “Comparison of parametric representations for
monosyllabic word recognition in continuously spoken sentences”, IEEE
Transactions on Acoustics, Speech, Signal Processing, Vol. ASSP-28, No. 4,
August 1980.
13
PROGRAM
%% Project: Voice Recognition and Identification system
% By bukasa tshibangu, banzadio salazaku , mutumba maliro
%----------------------------------------------------------------------
----
char st; char st1; char st2; char st3;
disp('Project: Voice Recognition and Identification system');
disp('By bukasa tshibangu & banzadio salazaku & mutumba maliro ');
disp(' ');
pause(0.5);
disp('LOADING ');
pause(1);
disp('... ');
pause(1);
disp('... ');
pause(1);
disp('... ');
pause(1);
disp('... ');
% Preallocating array
str = {8}; fstr = {8}; nbtr = {8};
ste = {8}; fste = {8}; nbte = {8};
ctr = {8}; dtr={8};
cte = {8}; dte={8};
data = {drogba,4};
code = {8};
for i = 1:8
% Read audio data from train folder for performing operations
st=strcat('trains',num2str(i),'.wav');
[s1 fs1 nb1]=wavread(st);
str{i} = s1; fstr{i} = fs1; nbtr{i} = nb1;
% Read audio data from test folder for performing operations
st = strcat('tests',num2str(i),'.wav');
[st1 fst1 nbt1] = wavread(st);
ste{i} = st1; fste{i} = fst1; nbte{i} = nbt1;
14
% Compute MFCC of the audio data to be used in Speech Processing
for Train
% Folder
ctr{i} = mfcc(str{i},fstr{i});
% Compute MFCC of the audio data to be used in Speech Processing
for Test
% Folder
cte{i} = mfcc(ste{i},fste{i});
% Compute Vector Quantization of the audio data to be used in Speech
% Processing for Train Folder
dtr{i} = vqlbg(ctr{i},16);
% Compute Vector Quantization of the audio data to be used in Speech
% Processing for Test Folder
dte{i} = vqlbg(cte{i},16);
end
% For making Choice
ch=0;
poss=11;
while ch~=poss
ch=menu('Speaker Recognition System','1: Human speaker
recognition',...
'2: Technical data of samples',...
'3: Power Spectrum','4: Power Spectrum with different M and
N',...
'5: Mel-Spaced Filter Bank',...
'6: Spectrum before and after Mel-Frequency wrapping',...
'7: 2D plot of acoustic vectors',...
'8: Plot of VQ codewords','9: Recognition rate of the
computer',...
'10: Test with other speech files','11: Exit');
disp('
');
%------------------------------------------------------------------
----
%% 1: Human speaker recognition
if ch==1
disp('> 1: Human speaker recognition');
disp('Play each sound file in the TRAIN folder.');
disp('Can you distinguish the voices of those eight speakers?');
disp('Now play each sound in the TEST folder in a random order
without looking at the file name ');
disp('and try to identify the speaker using your knowledge of
their voices that you have just heard,');
disp('from the TRAIN folder. This is exactly what the computer
will do in our system.');
disp('
');
disp('
');
15
disp('All of us seem to be unable to recognise random people
just by listening to their voice. ');
disp('We also realize that we do not identify speakers by the
frequencies with which they use to talk, ');
disp('but rather by other characteristics, like accent, speed,
etc.');
pause(1);
ch2=0;
while ch2~=4
ch2=menu('Select Folder','Train','Test','User','Exit');
if ch2==1
ch3=0;
while ch3~=9
ch3=menu('Train :','Signal 1','Signal 2','Signal
3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
p=audioplayer(str{ch3},fstr{ch3},nbtr{ch3});
play(p);
end
end
end
if ch2==2
ch3=0;
while ch3~=9
ch3=menu('Test :','Signal 1','Signal 2','Signal
3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
p=audioplayer(ste{ch3},fste{ch3},nbte{ch3});
play(p);
end
end
close all;
end
if ch2==3
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
ch32=0;
while ch32 ~=2
ch32=menu('Database
Information','Database','Exit');
if ch32==1
st=strcat('Sound Database has :
#',num2str(sound_number),'words. Enter a database number : #');
prompt = {st};
16
dlg_title = 'Database Information';
num_lines = 1;
def = {'1'};
options.Resize='on';
options.WindowStyle='normal';
options.Interpreter='tex';
an =
inputdlg(prompt,dlg_title,num_lines,def);
an=cell2mat(an);
a = str2double(an);
if (isempty(an))
else
if (a <= sound_number)
st=strcat('u',num2str(an));
[s fs nb]=wavread(st);
p=audioplayer(s,fs,nb);
play(p);
else
warndlg('Invalid Word ','Warning');
end
end
end
close all;
end
else
warndlg('Database is empty.',' Warning ')
end
end
end
end
%------------------------------------------------------------------
----
%% 2: Technical data of samples
if ch==2
disp('> 2: Technical data of samples');
ch23=0;
while ch23~=4
ch23=menu('Select Folder','Train','Test','User','Exit');
if ch23==1
poss2=9;
ch2=0;
while ch2~=poss2
ch2=menu('Technical data of samples for :','Signal
1','Signal 2','Signal 3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch2~=9
17
t = 0:1/fstr{ch2}:(length(str{ch2}) -
1)/fstr{ch2};
plot(t, str{ch2}), axis([0, (length(str{ch2}) -
1)/fstr{ch2} -0.4 0.5]);
st=sprintf('Plot of signal s%d.wav',ch2);
title(st);
xlabel('Time [s]');
ylabel('Amplitude (normalized)')
end
end
close all
end
if ch23==2
poss2=9;
ch2=0;
while ch2~=poss2
ch2=menu('Technical data of samples for :','Signal
1','Signal 2','Signal 3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch2~=9
t = 0:1/fste{ch2}:(length(ste{ch2}) -
1)/fste{ch2};
plot(t, ste{ch2}), axis([0, (length(ste{ch2}) -
1)/fste{ch2} -0.4 0.5]);
st=sprintf('Plot of signal s%d.wav',ch2);
title(st);
xlabel('Time [s]');
ylabel('Amplitude (normalized)')
end
end
close all
end
if ch23==3
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
ch32=0;
while ch32 ~=2
ch32=menu('Database
Information','Database','Exit');
if ch32==1
st=strcat('Sound Database has :
#',num2str(sound_number),'words. Enter a database number : #');
prompt = {st};
dlg_title = 'Database Information';
num_lines = 1;
def = {'1'};
options.Resize='on';
options.WindowStyle='normal';
options.Interpreter='tex';
an =
inputdlg(prompt,dlg_title,num_lines,def);
18
an=cell2mat(an);
a = str2double(an);
if (isempty(an))
else
if (a <= sound_number)
st=strcat('u',num2str(an));
[s fs]=wavread(st);
t = 0:1/fs:(length(s) - 1)/fs;
plot(t, s), axis([0, (length(s) -
1)/fs -0.4 0.5]);
st=sprintf('Plot of signal %s',st);
title(st);
xlabel('Time [s]');
ylabel('Amplitude (normalized)')
else
warndlg('Invalid Word ','Warning');
end
end
end
end
close all;
else
warndlg('Database is empty.',' Warning ')
end
end
end
end
%------------------------------------------------------------------
----
%% 3: linear and logarithmic power spectrum plot
if ch==3
M = 100;
N = 256;
disp('> 3: Power Spectrum Plot');
disp(' ');
disp('>Linear and Logarithmic spectrum plot');
ch23=0;
while ch23~=4
ch23=menu('Select Folder','Train','Test','User','Exit');
if ch23==1
poss3=9;
ch3=0;
while ch3~=poss3
ch3=menu('Linear and Logarithmic Power Spectrum
Plot for : ','Signal 1','Signal 2','Signal 3',...
19
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
% 3 (linear)
frames = blockFrames(str{ch3}, fstr{ch3}, M, N);
t = N / 2;
tm = length(str{ch3}) / fstr{ch3};
subplot(121);
imagesc([0 tm], [0 fstr{ch3}/2],
abs(frames(1:t, :)).^2), axis xy;
title('Power Spectrum (M = 100, N = 256)');
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar;
% 3 (logarithmic)
subplot(122);
imagesc([0 tm], [0 fstr{ch3}/2], 20 *
log10(abs(frames(1:t, :)).^2)), axis xy;
title('Logarithmic Power Spectrum (M = 100, N =
256)');
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar;
% D=get(gcf,'Position');
% set(gcf,'Position',round([D(1)*.5 D(2)*.5
D(3)*2 D(4)*1.3]))
end
end
close all
end
if ch23==2
poss3=9;
ch3=0;
while ch3~=poss3
ch3=menu('Linear and Logarithmic Power Spectrum
Plot for : ','Signal 1','Signal 2','Signal 3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
% 3 (linear)
frames = blockFrames(ste{ch3}, fste{ch3}, M, N);
t = N / 2;
tm = length(ste{ch3}) / fste{ch3};
subplot(121);
imagesc([0 tm], [0 fste{ch3}/2],
abs(frames(1:t, :)).^2), axis xy;
title('Power Spectrum (M = 100, N = 256)');
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar;
% 3 (logarithmic)
subplot(122);
imagesc([0 tm], [0 fste{ch3}/2], 20 *
log10(abs(frames(1:t, :)).^2)), axis xy;
title('Logarithmic Power Spectrum (M = 100, N =
256)');
20
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar;
% D=get(gcf,'Position');
% set(gcf,'Position',round([D(1)*.5 D(2)*.5
D(3)*2 D(4)*1.3]))
end
end
close all;
end
if ch23==3
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
ch32=0;
while ch32 ~=2
ch32=menu('Database
Information','Database','Exit');
if ch32==1
st=strcat('Sound Database has :
#',num2str(sound_number),'words. Enter a database number : #');
prompt = {st};
dlg_title = 'Database Information';
num_lines = 1;
def = {'1'};
options.Resize='on';
options.WindowStyle='normal';
options.Interpreter='tex';
an =
inputdlg(prompt,dlg_title,num_lines,def);
an=cell2mat(an);
a = str2double(an);
if (isempty(an))
else
if (a <= sound_number)
st=strcat('u',num2str(an));
[s fs]=wavread(st);
frames = blockFrames(s, fs, M, N);
t = N / 2;
tm = length(s) / fs;
subplot(121);
imagesc([0 tm], [0 fs/2],
abs(frames(1:t, :)).^2), axis xy;
title('Power Spectrum (M = 100, N =
256)');
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar;
%Question 3 (logarithmic)
subplot(122);
21
imagesc([0 tm], [0 fs/2], 20 *
log10(abs(frames(1:t, :)).^2)), axis xy;
title('Logarithmic Power Spectrum (M =
100, N = 256)');
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar;
else
warndlg('Invalid Word ','Warning');
end
end
end
end
close all;
else
warndlg('Database is empty.',' Warning ')
end
end
end
end
%------------------------------------------------------------------
----
%% 4: Plots for different values for N
if ch==4
disp('> 4: Plots for different values for M and N');
lN = [128 256 512];
ch23=0;
while ch23~=4
ch23=menu('Select Folder','Train','Test','User','Exit');
if ch23==1
poss3=9;
ch3=0;
while ch3~=poss3
ch3=menu('Plots for different values of M and N for
:','Signal 1','Signal 2','Signal 3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
u=220;
for i = 1:length(lN)
N = lN(i);
M = round(N / 3);
frames = blockFrames(str{ch3}, fstr{ch3},
M, N);
t = N / 2;
tm = length(str{ch3}) / fstr{ch3};
temp = size(frames);
nbframes = temp(2);
u=u+1;
subplot(u)
22
imagesc([0 tm], [0 fstr{ch3}/2], 20 *
log10(abs(frames(1:t, :)).^2)), axis xy;
title(sprintf('Power Spectrum (M = %i, N =
%i, frames = %i)', M, N, nbframes));
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar
end
% D=get(gcf,'Position');
% set(gcf,'Position',round([D(1)*.5 D(2)*.5
D(3)*1.5 D(4)*1.5]))
end
end
close all
end
if ch23==2
poss3=9;
ch3=0;
while ch3~=poss3
ch3=menu('Plots for different values of M and N for
:','Signal 1','Signal 2','Signal 3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
u=220;
for i = 1:length(lN)
N = lN(i);
M = round(N / 3);
frames = blockFrames(ste{ch3}, fste{ch3},
M, N);
t = N / 2;
tm = length(ste{ch3}) / fste{ch3};
temp = size(frames);
nbframes = temp(2);
u=u+1;
subplot(u)
imagesc([0 tm], [0 fste{ch3}/2], 20 *
log10(abs(frames(1:t, :)).^2)), axis xy;
title(sprintf('Power Spectrum (M = %i, N =
%i, frames = %i)', M, N, nbframes));
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar
end
% D=get(gcf,'Position');
% set(gcf,'Position',round([D(1)*.5 D(2)*.5
D(3)*1.5 D(4)*1.5]))
end
end
close all;
end
if ch23==3
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
23
ch32=0;
while ch32 ~=2
ch32=menu('Database
Information','Database','Exit');
if ch32==1
st=strcat('Sound Database has :
#',num2str(sound_number),'words. Enter a database number : #');
prompt = {st};
dlg_title = 'Database Information';
num_lines = 1;
def = {'1'};
options.Resize='on';
options.WindowStyle='normal';
options.Interpreter='tex';
an =
inputdlg(prompt,dlg_title,num_lines,def);
an=cell2mat(an);
a = str2double(an);
if (isempty(an))
else
if (a <= sound_number)
st=strcat('u',num2str(an));
[s fs]=wavread(st);
u=220;
for i = 1:length(lN)
N = lN(i);
M = round(N / 3);
frames = blockFrames(s, fs, M,
N);
t = N / 2;
tm = length(s) / fs;
temp = size(frames);
nbframes = temp(2);
u=u+1;
subplot(u)
imagesc([0 tm], [0 fs/2], 20 *
log10(abs(frames(1:t, :)).^2)), axis xy;
title(sprintf('Power Spectrum
(M = %i, N = %i, frames = %i)', M, N, nbframes));
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar
end
else
warndlg('Invalid Word ','Warning');
end
end
end
end
close all;
else
24
warndlg('Database is empty.',' Warning ')
end
end
end
end
%------------------------------------------------------------------
----
%% 5: Mel Space
if ch==5
disp('> 5: Mel Space');
disp('
');
disp('Mel Space is function of sampling rate and since all
signals ');
disp('are recorded at same sampling rate so they have same Mel
Space.');
ch23=0;
while ch23~=4
ch23=menu('Select Folder','Train','Test','User','Exit');
if ch23==1
poss3=9;
ch3=0;
while ch3~=poss3
ch3=menu('Mel Space for :','Signal 1','Signal
2','Signal 3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
plot(linspace(0, (fstr{ch3}/2), 129),
(melfb(20, 256, fstr{ch3})));
title('Mel-Spaced Filterbank');
xlabel('Frequency [Hz]');
end
end
close all
end
if ch23==2
poss3=9;
ch3=0;
while ch3~=poss3
ch3=menu('Mel Space for :','Signal 1','Signal
2','Signal 3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
plot(linspace(0, (fste{ch3}/2), 129),
(melfb(20, 256, fste{ch3})));
title('Mel-Spaced Filterbank');
xlabel('Frequency [Hz]');
end
end
close all;
25
end
if ch23==3
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
ch32=0;
while ch32 ~=2
ch32=menu('Database
Information','Database','Exit');
if ch32==1
st=strcat('Sound Database has :
#',num2str(sound_number),'words. Enter a database number : #');
prompt = {st};
dlg_title = 'Database Information';
num_lines = 1;
def = {'1'};
options.Resize='on';
options.WindowStyle='normal';
options.Interpreter='tex';
an =
inputdlg(prompt,dlg_title,num_lines,def);
an=cell2mat(an);
a=str2double(an);
if (isempty(an))
else
if (a <= sound_number)
st=strcat('u',num2str(an));
[s fs]=wavread(st);
plot(linspace(0, (fs/2), 129),
(melfb(20, 256, fs)));
title('Mel-Spaced Filterbank');
xlabel('Frequency [Hz]');
else
warndlg('Invalid Word ','Warning');
end
end
end
end
close all;
else
warndlg('Database is empty.',' Warning ')
end
end
end
end
%------------------------------------------------------------------
----
%% 6: Modified spectrum
26
if ch==6
disp('> 6: Modified spectrum');
disp('
');
disp('Spectrum before and after Mel-Frequency wrapping');
M = 100;
N = 256;
n2 = 1 + floor(N / 2);
ch23=0;
while ch23~=4
ch23=menu('Select Folder','Train','Test','User','Exit');
if ch23==1
poss3=9;
ch3=0;
while ch3~=poss3
ch3=menu('Mel Space for :','Signal 1','Signal
2','Signal 3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
frames = blockFrames(str{ch3}, fstr{ch3}, M, N);
m = melfb(20, N, fstr{ch3});
z = m * abs(frames(1:n2, :)).^2;
tm = length(str{ch3}) / fstr{ch3};
subplot(121)
imagesc([0 tm], [0 fstr{ch3}/2],
abs(frames(1:n2, :)).^2), axis xy;
title('Power Spectrum unmodified');
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar;
subplot(122)
imagesc([0 tm], [0 20], z), axis xy;
title('Power Spectrum modified through Mel
Cepstrum filter');
xlabel('Time [s]');
ylabel('Number of Filter in Filter Bank');
% colorbar;D=get(gcf,'Position');
% set(gcf,'Position',[0 D(2) D(3)/2 D(4)])
end
end
close all
end
if ch23==2
poss3=9;
ch3=0;
while ch3~=poss3
ch3=menu('Mel Space for :','Signal 1','Signal
2','Signal 3',...
'Signal 4','Signal 5','Signal 6','Signal
7','Signal 8','Exit');
if ch3~=9
frames = blockFrames(str{ch3}, fstr{ch3}, M, N);
27
m = melfb(20, N, fstr{ch3});
z = m * abs(frames(1:n2, :)).^2;
tm = length(str{ch3}) / fstr{ch3};
subplot(121)
imagesc([0 tm], [0 fstr{ch3}/2],
abs(frames(1:n2, :)).^2), axis xy;
title('Power Spectrum unmodified');
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar;
subplot(122)
imagesc([0 tm], [0 20], z), axis xy;
title('Power Spectrum modified through Mel
Cepstrum filter');
xlabel('Time [s]');
ylabel('Number of Filter in Filter Bank');
% colorbar;D=get(gcf,'Position');
% set(gcf,'Position',[0 D(2) D(3)/2 D(4)])
end
end
close all;
end
if ch23==3
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
ch32=0;
while ch32 ~=2
ch32=menu('Database
Information','Database','Exit');
if ch32==1
st=strcat('Sound Database has :
#',num2str(sound_number),'words. Enter a database number : #');
prompt = {st};
dlg_title = 'Database Information';
num_lines = 1;
def = {'1'};
options.Resize='on';
options.WindowStyle='normal';
options.Interpreter='tex';
an =
inputdlg(prompt,dlg_title,num_lines,def);
an=cell2mat(an);
a=str2double(an);
if (isempty(an))
else
if (a <= sound_number)
st=strcat('u',num2str(an));
[s fs]=wavread(st);
frames = blockFrames(s, fs, M, N);
m = melfb(20, N, fs);
28
z = m * abs(frames(1:n2, :)).^2;
tm = length(s) / fs;
subplot(121)
imagesc([0 tm], [0 fs/2],
abs(frames(1:n2, :)).^2), axis xy;
title('Power Spectrum unmodified');
xlabel('Time [s]');
ylabel('Frequency [Hz]');
colorbar;
subplot(122)
imagesc([0 tm], [0 20], z), axis xy;
title('Power Spectrum modified
through Mel Cepstrum filter');
xlabel('Time [s]');
ylabel('Number of Filter in Filter
Bank');
colorbar;
else
warndlg('Invalid Word ','Warning');
end
end
end
end
close all;
else
warndlg('Database is empty.',' Warning ')
end
end
end
end
%------------------------------------------------------------------
----
%% 7: 2D plot of accustic vectors
if ch==7
disp('> 7: 2D plot of accustic vectors');
ch23=0;
while ch23~=4
ch23=menu('Select Folder','Train','Test','User','Exit');
if ch23==1
poss3=3;
ch3=0;
while ch3~=poss3
ch3=menu('2D plot of accustic vectors
representation : ','1. One Signal',...
'2. Two Signal','3. Exit');
if ch3==1
ch31=0;
while ch31~=9
ch31=menu('2D plot of accustic vectors
for :','Signal 1','Signal 2','Signal 3',...
29
'Signal 4','Signal 5','Signal
6','Signal 7','Signal 8','Exit');
if ch31~=9
plot(ctr{ch31}(5, :), ctr{ch31}(6, :),
'or');
xlabel('5th Dimension');
ylabel('6th Dimension');
st=sprintf('Signal %d ',ch31);
legend(st);
title('2D plot of accoustic vectors');
end
end
close all;
end
if ch3==2
ch32=0;
while ch32~=8
ch32=menu('2D plot of accustic vectors
for :','Signal 1 & Signal 2',...
'Signal 2 & Signal 3','Signal 3 &
Signal 4','Signal 4 & Signal 5',...
'Signal 5 & Signal 6','Signal 6 &
Signal 7','Signal 7 & Signal 8','Exit');
if ch32~=8
plot(ctr{ch32}(5, :), ctr{ch32}(6, :),
'or');
hold on;
plot(ctr{ch32+1}(5, :), ctr{ch32+1}
(6, :), 'xb');
xlabel('5th Dimension');
ylabel('6th Dimension');
st=sprintf('Signal %d,',ch32);
st1=sprintf('Signal %d', (ch32+1) );
legend(st,st1);
title('2D plot of accoustic vectors');
hold off
end
end
end
close all
end
end
if ch23==2
poss3=3;
ch3=0;
while ch3~=poss3
ch3=menu('2D plot of accustic vectors
representation : ','1. One Signal',...
'2. Two Signal','3. Exit');
if ch3==1
ch31=0;
while ch31~=9
ch31=menu('2D plot of accustic vectors
for :','Signal 1','Signal 2','Signal 3',...
30
'Signal 4','Signal 5','Signal
6','Signal 7','Signal 8','Exit');
if ch31~=9
plot(cte{ch31}(5, :), cte{ch31}(6, :),
'or');
xlabel('5th Dimension');
ylabel('6th Dimension');
st=sprintf('Signal %d ',ch31);
legend(st);
title('2D plot of accoustic vectors');
end
end
close all;
end
if ch3==2
ch32=0;
while ch32~=8
ch32=menu('2D plot of accustic vectors
for :','Signal 1 & Signal 2',...
'Signal 2 & Signal 3','Signal 3 &
Signal 4','Signal 4 & Signal 5',...
'Signal 5 & Signal 6','Signal 6 &
Signal 7','Signal 7 & Signal 8','Exit');
if ch32~=8
plot(cte{ch32}(5, :), cte{ch32}(6, :),
'or');
hold on;
plot(cte{ch32+1}(5, :), cte{ch32+1}
(6, :), 'xb');
xlabel('5th Dimension');
ylabel('6th Dimension');
st=sprintf('Signal %d,',ch32);
st1=sprintf('Signal %d', (ch32+1) );
legend(st,st1);
title('2D plot of accoustic vectors');
hold off
end
end
end
close all
end
end
if ch23==3
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
ch32=0;
while ch32 ~=2
ch32=menu('Database
Information','Database','Exit');
if ch32==1
31
st=strcat('Sound Database has :
#',num2str(sound_number),'words. Enter a database number : #');
prompt = {st};
dlg_title = 'Database Information';
num_lines = 1;
def = {'1'};
options.Resize='on';
options.WindowStyle='normal';
options.Interpreter='tex';
an =
inputdlg(prompt,dlg_title,num_lines,def);
an=cell2mat(an);
a=str2double(an);
if (isempty(an))
else
if (a <= sound_number)
st=strcat('u',num2str(an));
[s fs]=wavread(st);
c = mfcc(s, fs);
plot(c(5, :), c(6, :), 'or');
xlabel('5th Dimension');
ylabel('6th Dimension');
st1=sprintf('Signal %s.wav',st);
legend(st1);
title('2D plot of accoustic
vectors');
else
warndlg('Invalid Word ','Warning');
end
end
end
end
close all;
else
warndlg('Database is empty.',' Warning ')
end
end
end
end
%------------------------------------------------------------------
----
%% 8: Plot of the 2D trained VQ codewords
if ch==8
disp('> 8: Plot of the 2D trained VQ codewords');
ch23=0;
while ch23~=4
ch23=menu('Select Folder','Train','Test','User','Exit');
if ch23==1
poss3=3;
32
ch3=0;
while ch3~=poss3
ch3=menu('2D plot of accustic vectors
representation : ','1. One Signal',...
'2. Two Signal','3. Exit');
if ch3==1
ch31=0;
while ch31~=9
ch31=menu('2D plot of accustic vectors
for :','Signal 1','Signal 2','Signal 3',...
'Signal 4','Signal 5','Signal
6','Signal 7','Signal 8','Exit');
if ch31~=9
plot(ctr{ch31}(5, :), ctr{ch31}(6, :),
'xr')
hold on
plot(dtr{ch31}(5, :), dtr{ch31}(6, :),
'vk')
xlabel('5th Dimension');
ylabel('6th Dimension');
st=sprintf('Speaker %d',ch31);
st1=sprintf('Codebook %d', (ch31) );
legend(st,st1);
title('2D plot of accoustic vectors');
hold off
end
end
close all
end
if ch3==2
ch32=0;
while ch32~=8
ch32=menu('2D plot of accustic vectors
for :','Signal 1 & Signal 2',...
'Signal 2 & Signal 3','Signal 3 &
Signal 4','Signal 4 & Signal 5',...
'Signal 5 & Signal 6','Signal 6 &
Signal 7','Signal 7 & Signal 8','Exit');
if ch32~=8
plot(ctr{ch32}(5, :), ctr{ch32}(6, :),
'xr')
hold on
plot(dtr{ch32}(5, :), dtr{ch32}(6, :),
'vk')
plot(ctr{ch32+1}(5, :), ctr{ch32+1}
(6, :), 'xb')
plot(dtr{ch32+1}(5, :), dtr{ch32+1}
(6, :), '+k')
xlabel('5th Dimension');
ylabel('6th Dimension');
st=sprintf('Speaker %d',ch32);
st1=sprintf('Codebook %d',ch32 );
st2=sprintf('Speaker %d',(ch32+1) );
33
st3=sprintf('Codebook %d', (ch32+1) );
legend(st,st1,st2,st3);
title('2D plot of accoustic vectors');
hold off
end
end
end
close all
end
end
if ch23==2
poss3=3;
ch3=0;
while ch3~=poss3
ch3=menu('2D plot of accustic vectors
representation : ','1. One Signal',...
'2. Two Signal','3. Exit');
if ch3==1
ch31=0;
while ch31~=9
ch31=menu('2D plot of accustic vectors
for :','Signal 1','Signal 2','Signal 3',...
'Signal 4','Signal 5','Signal
6','Signal 7','Signal 8','Exit');
if ch31~=9
plot(cte{ch31}(5, :), cte{ch31}(6, :),
'xr')
hold on
plot(dte{ch31}(5, :), dte{ch31}(6, :),
'vk')
xlabel('5th Dimension');
ylabel('6th Dimension');
st=sprintf('Speaker %d',ch31);
st1=sprintf('Codebook %d', (ch31) );
legend(st,st1);
title('2D plot of accoustic vectors');
hold off
end
end
close all
end
if ch3==2
ch32=0;
while ch32~=8
ch32=menu('2D plot of accustic vectors
for :','Signal 1 & Signal 2',...
'Signal 2 & Signal 3','Signal 3 &
Signal 4','Signal 4 & Signal 5',...
'Signal 5 & Signal 6','Signal 6 &
Signal 7','Signal 7 & Signal 8','Exit');
if ch32~=8
34
plot(cte{ch32}(5, :), cte{ch32}(6, :),
'xr')
hold on
plot(dte{ch32}(5, :), dte{ch32}(6, :),
'vk')
plot(cte{ch32+1}(5, :), cte{ch32+1}
(6, :), 'xb')
plot(dte{ch32+1}(5, :), dte{ch32+1}
(6, :), '+k')
xlabel('5th Dimension');
ylabel('6th Dimension');
st=sprintf('Speaker %d',ch32);
st1=sprintf('Codebook %d',ch32 );
st2=sprintf('Speaker %d', (ch32+1) );
st3=sprintf('Codebook %d', (ch32+1) );
legend(st,st1,st2,st3);
title('2D plot of accoustic vectors');
hold off
end
end
end
close all
end
end
if ch23==3
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
ch32=0;
while ch32 ~=2
ch32=menu('Database
Information','Database','Exit');
if ch32==1
st=strcat('Sound Database has :
#',num2str(sound_number),'words. Enter a database number : #');
prompt = {st};
dlg_title = 'Database Information';
num_lines = 1;
def = {'1'};
options.Resize='on';
options.WindowStyle='normal';
options.Interpreter='tex';
an =
inputdlg(prompt,dlg_title,num_lines,def);
an=cell2mat(an);
a=str2double(an);
if (isempty(an))
else
if (a <= sound_number)
st=strcat('u',num2str(an));
[s fs]=wavread(st);
35
c = mfcc(s, fs);
d = vqlbg(c, 16);
plot(c(5, :), c(6, :), 'xr');
hold on
plot(d(5, :), d(6, :), 'vk');
xlabel('5th Dimension');
ylabel('6th Dimension');
st1=sprintf('Speaker %s',st);
st2=sprintf('Codebook %s',st);
legend(st1,st2);
title('2D plot of accoustic
vectors');
hold off
else
warndlg('Invalid Word ','Warning');
end
end
end
end
close all;
else
warndlg('Database is empty.',' Warning ')
end
end
end
end
%------------------------------------------------------------------
----
%% 9: Recognition rate of the computer
if ch==9
disp('> 9: Recognition rate of the computer')
%--------------------------------------------------------------
--------
%% 9.1 Loading from Test Folder for Comparison
% All 8 samples data values are loaded in file sounddatabase1.dat.
for sound_number = 1: 8
if size(ste{sound_number},2)==2
ste{sound_number}=ste{sound_number}(:,1);
end
ste{sound_number} = double( ste{sound_number} );
data{sound_number,1} = ste{sound_number};
data{sound_number,2} = sound_number;
st = sprintf('s%d.wav',sound_number);
data{sound_number,3} = st;
data{sound_number,4} = 'Test';
fs=fste{sound_number}; %#ok<NASGU>
nb=nbte{sound_number}; %#ok<NASGU>
if sound_number == 1;
save('sound_database1.dat','data','sound_number','fs','
nb');
else
36
save('sound_database1.dat','data','sound_number','fs','
nb','-append');
end
end
disp(' ');
disp('Sounds From TEST added to database for comparison');
disp(' ');
%--------------------------------------------------------------
--------
%% 9.2 Comparing one by one data from TRAIN FOLDER
disp('Comparing one by one data from TRAIN FOLDER');
disp(' ');
load('sound_database1.dat','-mat');
k =16;
for ii=1:sound_number
% Compute MFCC cofficients for each sound present in
database
v = mfcc(data{ii,1}, fstr{ii});
% Train VQ codebook
code{ii} = vqlbg(v, k);
end
flag1 = 0;
for classe = 1:8
st = sprintf('TrainS%d.wav to be compared',classe);
disp(st);
pause(0.5);
if size(str{classe},2)==2
str{classe}=str{classe}(:,1);
end
str{classe} = double(str{classe});
%----- code for speaker recognition -------
disp('MFCC cofficients computation and VQ codebook training
in progress...');
% Number of centroids required
disp(' ');
% Compute MFCC coefficients for input sound
v = mfcc(str{classe},fstr{classe});
% Current distance and sound ID initialization
distmin = Inf;
k1 = 0;
for ii=1:sound_number
d = disteu(v, code{ii});
dist = sum(min(d,[],2)) / size(d,1);
if dist < distmin
distmin = dist;
k1 = ii;
end
end
min_index = k1;
speech_id = data{min_index,2};
%-----------------------------------------
disp('Completed.');
disp('Matching sound:');
disp(' ');
37
message=strcat('File:',data{min_index,3});
disp(message);
message=strcat('Location:',data{min_index,4});
disp(message);
message = strcat('Recognized speaker ID:
',num2str(speech_id));
disp(message);
disp(' ');
if classe == speech_id
flag1 = flag1 + 1;
end
end
disp('
');
pause(0.5)
st1 = strcat('This prototype is', num2str(100*flag1/classe),'%
efficient in recognising these 8 different stored sounds in TEST and
TRAIN folders.');
msgbox(st1,'Success','help');
end
%------------------------------------------------------------------
---
if ch==10
disp('> 10: Test with other speech files')
msgbox('P.S. This prototype is for secondary security
usage.','NOTE','help');
pause(2);
msgbox('Kindly Note this works for the stored databases only.
This means that you can add sounds to the database by users and
Recognition will be done for the users entered. ','NOTE','help')
pause(2);
chos=0;
possibility=5;
while chos~=possibility,
chos=menu('Speaker Recognition System','Add a new sound
from microphone',...
'Speaker recognition from microphone',...
'Database Info','Delete database','Exit');
%----------------------------------------------------------
------------
%% 10.1 Add a new sound from microphone
if chos==1
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
classe = input('Insert a class number (sound ID)
that will be used for recognition:');
if isempty(classe)
classe = sound_number+1;
disp( num2str(classe) );
end
38
message=('The following parameters will be used
during recording:');
disp(message);
message=strcat('Sampling
frequency',num2str(samplingfrequency));
disp(message);
message=strcat('Bits per
sample',num2str(samplingbits));
disp(message);
durata = input('Insert the duration of the
recording (in seconds):');
if isempty(durata)
durata = 3;
disp( num2str(durata) );
end
micrecorder =
audiorecorder(samplingfrequency,samplingbits,1);
disp('Now, speak into microphone...');
record(micrecorder,durata);
while (isrecording(micrecorder)==1)
disp('Recording...');
pause(0.5);
end
disp('Recording stopped.');
y1 = getaudiodata(micrecorder);
y = getaudiodata(micrecorder, 'uint8');
if size(y,2)==2
y=y(:,1);
end
y = double(y);
sound_number = sound_number+1;
data{sound_number,1} = y;
data{sound_number,2} = classe;
data{sound_number,3} = 'Microphone';
data{sound_number,4} = 'Microphone';
st=strcat('u',num2str(sound_number));
wavwrite(y1,samplingfrequency,samplingbits,st)
save('sound_database.dat','data','sound_number','-
append');
msgbox('Sound added to database','Database
result','help');
disp('Sound added to database');
else
classe = input('Insert a class number (sound ID)
that will be used for recognition:');
if isempty(classe)
classe = 1;
disp( num2str(classe) );
end
durata = input('Insert the duration of the
recording (in seconds):');
if isempty(durata)
durata = 3;
disp( num2str(durata) );
39
end
samplingfrequency = input('Insert the sampling
frequency (22050 recommended):');
if isempty(samplingfrequency )
samplingfrequency = 22050;
disp( num2str(samplingfrequency) );
end
samplingbits = input('Insert the number of bits per
sample (8 recommended):');
if isempty(samplingbits )
samplingbits = 8;
disp( num2str(samplingbits) );
end
micrecorder =
audiorecorder(samplingfrequency,samplingbits,1);
disp('Now, speak into microphone...');
record(micrecorder,durata);
while (isrecording(micrecorder)==1)
disp('Recording...');
pause(0.5);
end
disp('Recording stopped.');
y1 = getaudiodata(micrecorder);
y = getaudiodata(micrecorder, 'uint8');
if size(y,2)==2
y=y(:,1);
end
y = double(y);
sound_number = 1;
data{sound_number,1} = y;
data{sound_number,2} = classe;
data{sound_number,3} = 'Microphone';
data{sound_number,4} = 'Microphone';
st=strcat('u',num2str(sound_number));
wavwrite(y1,samplingfrequency,samplingbits,st)
save('sound_database.dat','data','sound_number','sa
mplingfrequency','samplingbits');
msgbox('Sound added to database','Database
result','help');
disp('Sound added to database');
end
end
%----------------------------------------------------------
------------
%% 10.2 Voice Recognition from microphone
if chos==2
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
Fs = samplingfrequency;
durata = input('Insert the duration of the
recording (in seconds):');
40
if isempty(durata)
durata = 3;
disp( num2str(durata) );
end
micrecorder =
audiorecorder(samplingfrequency,samplingbits,1);
disp('Now, speak into microphone...');
record(micrecorder,durata);
while (isrecording(micrecorder)==1)
disp('Recording...');
pause(0.5);
end
disp('Recording stopped.');
y = getaudiodata(micrecorder);
st='v';
wavwrite(y,samplingfrequency,samplingbits,st);
y = getaudiodata(micrecorder, 'uint8');
% if the input sound is not mono
if size(y,2)==2
y=y(:,1);
end
y = double(y);
%----- code for speaker recognition -------
disp('MFCC cofficients computation and VQ codebook
training in progress...');
disp(' ');
% Number of centroids required
k =16;
for ii=1:sound_number
% Compute MFCC cofficients for each sound
present in database
v = mfcc(data{ii,1}, Fs);
% Train VQ codebook
code{ii} = vqlbg(v, k);
disp('...');
end
disp('Completed.');
% Compute MFCC coefficients for input sound
v = mfcc(y,Fs);
% Current distance and sound ID initialization
distmin = Inf;
k1 = 0;
for ii=1:sound_number
d = disteu(v, code{ii});
dist = sum(min(d,[],2)) / size(d,1);
message=strcat('For User #',num2str(ii),'
Dist : ',num2str(dist));
disp(message);
if dist < distmin
distmin = dist;
k1 = ii;
end
41
end
if distmin < ronaldo
min_index = k1;
speech_id = data{min_index,2};
%-----------------------------------------
disp('Matching sound:');
message=strcat('File:',data{min_index,3});
disp(message);
message=strcat('Location:',data{min_index,4});
disp(message);
message = strcat('Recognized speaker ID:
',num2str(speech_id));
disp(message);
msgbox(message,'Matching result','help');
ch3=0;
while ch3~=3
ch3=menu('Matched result
verification:','Recognized Sound','Recorded sound','Exit');
if ch3==1
st=strcat('u',num2str(speech_id));
[s fs nb]=wavread(st);
p=audioplayer(s,fs,nb);
play(p);
end
if ch3==2
[s fs nb]=wavread('v');
p=audioplayer(s,fs,nb);
play(p);
end
end
else
warndlg('Wrong User . No matching Result.','
Warning ')
end
else
warndlg('Database is empty. No matching is
possible.',' Warning ')
end
end
%----------------------------------------------------------
------------
%% 10.3 Database Info
if chos==3
if (exist('sound_database.dat','file')==2)
load('sound_database.dat','-mat');
message=strcat('Database has
#',num2str(sound_number),'words:');
disp(message);
disp(' ');
42
for ii=1:sound_number
message=strcat('Location:',data{ii,3});
disp(message);
message=strcat('File:',data{ii,4});
disp(message);
message=strcat('Sound ID:',num2str(data{ii,2}));
disp(message);
disp('-');
end
ch32=0;
while ch32 ~=2
ch32=menu('Database
Information','Database','Exit');
if ch32==1
st=strcat('Sound Database has :
#',num2str(sound_number),'words. Enter a database number : #');
prompt = {st};
dlg_title = 'Database Information';
num_lines = 1;
def = {'1'};
options.Resize='on';
options.WindowStyle='normal';
options.Interpreter='tex';
an =
inputdlg(prompt,dlg_title,num_lines,def);
an=cell2mat(an);
a=str2double(an);
if (isempty(an))
else
if (a <= sound_number)
st=strcat('u',num2str(an));
[s fs nb]=wavread(st);
p=audioplayer(s,fs,nb);
play(p);
else
warndlg('Invalid Word ','Warning');
end
end
end
end
else
warndlg('Database is empty.',' Warning ')
end
end
%----------------------------------------------------------
------------
%% 10.4 Delete database
if chos==4
43
%clc;
close all;
if (exist('sound_database.dat','file')==2)
button = questdlg('Do you really want to remove the
Database?');
if strcmp(button,'Yes')
load('sound_database.dat','-mat');
for ii=1:sound_number
st=strcat('u',num2str(ii),'.wav');
delete(st);
end
if (exist('v.wav','file')==2)
delete('v.wav');
end
delete('sound_database.dat');
msgbox('Database was succesfully removed from
the current directory.','Database removed','help');
end
else
warndlg('Database is empty.',' Warning ')
end
end
end
end
end
close all;
msgbox('Kindly motivate our efforts. Feel free to provide valuable
feedback.','Thank You','help');
end
%----------------------------------------------------------------------
----
function M3 = blockFrames(s, fs, m, n)
l = length(s);
nbFrame = floor((l - n) / m) + 1;
for i = 1:n
for j = 1:nbFrame
M(i, j) = s(((j - 1) * m) + i); %#ok<AGROW>
end
end
h = hamming(n);
M2 = diag(h) * M;
for i = 1:nbFrame
M3(:, i) = fft(M2(:, i)); %#ok<AGROW>
end
end
%----------------------------------------------------------------------
----
44
%% MFCC Function
function r = mfcc(s, fs)
m = 100;
n = 256;
frame=blockFrames(s, fs, m, n);
m = melfb(20, n, fs);
n2 = 1 + floor(n / 2);
z = m * abs(frame(1:n2, :)).^2;
r = dct(log(z));
end
%----------------------------------------------------------------------
----
function r = vqlbg(d,k)
e = .01;
r = mean(d, 2);
dpr = 10000;
for i = 1:log2(k)
r = [r*(1+e), r*(1-e)];
while (1 == 1)
z = disteu(d, r);
[m,ind] = min(z, [], 2);
t = 0;
for j = 1:2^i
r(:, j) = mean(d(:, find(ind == j)), 2); %#ok<FNDSB>
x = disteu(d(:, find(ind == j)), r(:, j)); %#ok<FNDSB>
for q = 1:length(x)
t = t + x(q);
end
end
if (((dpr - t)/t) < e)
break;
else
dpr = t;
end
end
end
end
%----------------------------------------------------------------------
----
45

More Related Content

PPT
Pill camera presentation
PPTX
NOISE FILTERS IN IMAGE PROCESSING
PDF
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
PPT
Speech Recognition in Artificail Inteligence
PPTX
OCR Presentation (Optical Character Recognition)
PPTX
Vector quantization
PPT
Huffman Coding
Pill camera presentation
NOISE FILTERS IN IMAGE PROCESSING
Emotion Speech Recognition - Convolutional Neural Network Capstone Project
Speech Recognition in Artificail Inteligence
OCR Presentation (Optical Character Recognition)
Vector quantization
Huffman Coding

What's hot (20)

PPTX
Image Smoothing using Frequency Domain Filters
PPTX
5. gray level transformation
PPTX
Histogram Specification or Matching Problem
PPT
silent sound technology
PPTX
Digital camera
PPTX
Optical Character Recognition (OCR) based Retrieval
PPTX
Speaker recognition systems
PDF
Speech emotion recognition
PPTX
Digital image processing
PDF
Report of PILL CAMERA
PPTX
Introduction to Image Compression
PPTX
Chapter 9 morphological image processing
PPTX
Face Recognition System for Door Unlocking
PPTX
Digital speech processing lecture1
PPTX
Object Detection & Tracking
PPT
Silent sound-technology ppt final
PPTX
Equalization
PDF
Digital Image Processing: Image Segmentation
PPTX
Subband Coding
PPT
Working of digital camera
Image Smoothing using Frequency Domain Filters
5. gray level transformation
Histogram Specification or Matching Problem
silent sound technology
Digital camera
Optical Character Recognition (OCR) based Retrieval
Speaker recognition systems
Speech emotion recognition
Digital image processing
Report of PILL CAMERA
Introduction to Image Compression
Chapter 9 morphological image processing
Face Recognition System for Door Unlocking
Digital speech processing lecture1
Object Detection & Tracking
Silent sound-technology ppt final
Equalization
Digital Image Processing: Image Segmentation
Subband Coding
Working of digital camera
Ad

Similar to Speaker recognition on matlab (20)

DOCX
Voice biometric recognition
PDF
Speaker Recognition System using MFCC and Vector Quantization Approach
PDF
ASR_final
PPT
Speech Recognition System By Matlab
PPTX
Speaker recognition using MFCC
PDF
Speaker and Speech Recognition for Secured Smart Home Applications
PDF
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
DOC
Speaker recognition.
PDF
Bachelors project summary
PDF
PDF
Classification of Language Speech Recognition System
PDF
Speaker Identification & Verification Using MFCC & SVM
PPTX
Speech based password authentication system on FPGA
PDF
Course report-islam-taharimul (1)
PDF
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
PPT
Automatic speech recognition
PPTX
Speech Signal Processing
PPTX
Voice Identification And Recognition System, Matlab
PDF
Speaker Recognition Using Vocal Tract Features
PDF
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
Voice biometric recognition
Speaker Recognition System using MFCC and Vector Quantization Approach
ASR_final
Speech Recognition System By Matlab
Speaker recognition using MFCC
Speaker and Speech Recognition for Secured Smart Home Applications
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
Speaker recognition.
Bachelors project summary
Classification of Language Speech Recognition System
Speaker Identification & Verification Using MFCC & SVM
Speech based password authentication system on FPGA
Course report-islam-taharimul (1)
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
Automatic speech recognition
Speech Signal Processing
Voice Identification And Recognition System, Matlab
Speaker Recognition Using Vocal Tract Features
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
Ad

More from Arcanjo Salazaku (6)

PPTX
Temperature Meter
PPT
sustainable development introduction, basics and importance
PDF
cop21 at paris
DOCX
Traffic light using plc
DOCX
Summer Training Program Report On Embedded system and robot
DOCX
digital clock atmega16
Temperature Meter
sustainable development introduction, basics and importance
cop21 at paris
Traffic light using plc
Summer Training Program Report On Embedded system and robot
digital clock atmega16

Recently uploaded (20)

PPTX
web development for engineering and engineering
PPTX
Geodesy 1.pptx...............................................
PPTX
additive manufacturing of ss316l using mig welding
PDF
PPT on Performance Review to get promotions
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPT
Mechanical Engineering MATERIALS Selection
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
web development for engineering and engineering
Geodesy 1.pptx...............................................
additive manufacturing of ss316l using mig welding
PPT on Performance Review to get promotions
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Internet of Things (IOT) - A guide to understanding
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
bas. eng. economics group 4 presentation 1.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Mechanical Engineering MATERIALS Selection
Model Code of Practice - Construction Work - 21102022 .pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Foundation to blockchain - A guide to Blockchain Tech
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf

Speaker recognition on matlab

  • 1. Submit by: Banzadio salazaku (ASU2013010100016) Submit to: Mrs Kavita Jindal 1
  • 2. 1 Overview Voice recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers. This document describes how to build a simple, yet complete and representative automatic speaker recognition system. Such a speaker recognition system has potential in many security applications. For example, users have to speak a PIN (Personal Identification Number) in order to gain access to the laboratory door, or users have to speak their credit card number over the telephone line to verify their identity. By checking the voice characteristics of the input utterance, using an automatic speaker recognition system similar to the one that we will describe, the system is able to add an extra level of security. 1 Principles of Speaker Recognition 2
  • 3. Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. Figure 1 shows the basic structures of speaker identification and verification systems. The system that we will describe is classified as text-independent speaker identification system since its task is to identify the person who speaks regardless of what is saying. At the highest level, all speaker recognition systems contain two main modules (refer to Figure 1): feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing extracted features from his/her voice input with the ones from a set of known speakers. We will discuss each module in detail in later sections. (a) Speaker identification (b) Speaker verification 3 Input speech Feature extraction Reference model (Speaker #1) Similarity Reference model (Speaker #N) Similarity Maximum selection Identification result (Speaker ID) Reference model (Speaker #M) Similarity Input speech Feature extraction Verification result (Accept/Reject) Decision ThresholdSpeaker ID (#M)
  • 4. Figure 1. Basic structures of speaker recognition systems All speaker recognition systems have to serve two distinguished phases. The first one is referred to the enrolment or training phase, while the second one is referred to as the operational or testing phase. In the training phase, each registered speaker has to provide samples of their speech so that the system can build or train a reference model for that speaker. In case of speaker verification systems, in addition, a speaker-specific threshold is also computed from the training samples. In the testing phase, the input speech is matched with stored reference model(s) and a recognition decision is made. Speaker recognition is a difficult task. Automatic speaker recognition works based on the premise that a person’s speech exhibits characteristics that are unique to the speaker. However this task has been challenged by the highly variant of input speech signals. The principle source of variance is the speaker himself/herself. Speech signals in training and testing sessions can be greatly different due to many facts such as people voice change with time, health conditions (e.g. the speaker has a cold), speaking rates, and so on. There are also other factors, beyond speaker variability, that present a challenge to speaker recognition technology. Examples of these are acoustical noise and variations in recording environments (e.g. speaker uses different telephone handsets). 2 Speech Feature Extraction 2.1 Introduction The purpose of this module is to convert the speech waveform, using digital signal processing (DSP) tools, to a set of features (at a considerably lower information rate) for further analysis. This is often referred as the signal-processing front end. The speech signal is a slowly timed varying signal (it is called quasi-stationary). An example of speech signal is shown in Figure 2. When examined over a sufficiently short period of time (between 5 and 100 msec), its characteristics are fairly stationary. However, over long periods of time (on the order of 1/5 seconds or more) the signal 4
  • 5. characteristic change to reflect the different speech sounds being spoken. Therefore, short-time spectral analysis is the most common way to characterize the speech signal. Figure 2. Example of speech signal A wide range of possibilities exist for parametrically representing the speech signal for the speaker recognition task, such as Linear Prediction Coding (LPC), Mel-Frequency Cepstrum Coefficients (MFCC), and others. MFCC is perhaps the best known and most popular, and will be described in this paper. MFCC’s are based on the known variation of the human ear’s critical bandwidths with frequency, filters spaced linearly at low frequencies and logarithmically at high frequencies have been used to capture the phonetically important characteristics of speech. This is expressed in the mel-frequency scale, which is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. The process of computing MFCCs is described in more detail next. 2.2 Mel-frequency cepstrum coefficients processor A block diagram of the structure of an MFCC processor is given in Figure 3. The speech input is typically recorded at a sampling rate above 10000 Hz. This sampling frequency was chosen to minimize the effects of aliasing in the analog-to-digital conversion. These sampled signals can capture all frequencies up to 5 kHz, which cover 5 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 Time (second)
  • 6. most energy of sounds that are generated by humans. As been discussed previously, the main purpose of the MFCC processor is to mimic the behavior of the human ears. In addition, rather than the speech waveforms themselves, MFFC’s are shown to be less susceptible to mentioned variations. Figure 3. Block diagram of the MFCC processor 2.2.1 Frame Blocking In this step the continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M (M < N). The first frame consists of the first N samples. The second frame begins M samples after the first frame, and overlaps it by N - M samples and so on. This process continues until all the speech is accounted for within one or more frames. Typical values for N and M are N = 256 (which is equivalent to ~ 30 msec windowing and facilitate the fast radix-2 FFT) and M = 100. 2.2.2 Windowing The next step in the processing is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. The concept here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame. If we define the window as 10),( −≤≤ Nnnw , where N is the number of samples in each frame, then the result of windowing is the signal 10),()()( −≤≤= Nnnwnxny ll Typically the Hamming window is used, which has the form: 10, 1 2 cos46.054.0)( −≤≤      − −= Nn N n nw π 6 mel cepstrum mel spectrum framecontinuous speech Frame Blocking Windowing FFT spectrum Mel-frequency Wrapping Cepstrum
  • 7. 2.2.3 Fast Fourier Transform (FFT) The next processing step is the Fast Fourier Transform, which converts each frame of N samples from the time domain into the frequency domain. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT), which is defined on the set of N samples {xn}, as follow: ∑ − = − −== 1 0 /2 1,...,2,1,0, N n Nknj nk NkexX π In general Xk’s are complex numbers and we only consider their absolute values (frequency magnitudes). The resulting sequence {Xk} is interpreted as follow: positive frequencies 2/0 sFf <≤ correspond to values 12/0 −≤≤ Nn , while negative frequencies 02/ <<− fFs correspond to 112/ −≤≤+ NnN . Here, Fs denotes the sampling frequency. The result after this step is often referred to as spectrum or periodogram. 2.2.4 Mel-frequency Wrapping As mentioned above, psychophysical studies have shown that human perception of the frequency contents of sounds for speech signals does not follow a linear scale. Thus for each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the ‘mel’ scale. The mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. 7 0 1000 2000 3000 4000 5000 6000 7000 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Mel-spacedfilterbank Frequency(Hz)
  • 8. Figure 4. An example of mel-spaced filterbank One approach to simulating the subjective spectrum is to use a filter bank, spaced uniformly on the mel-scale (see Figure 4). That filter bank has a triangular bandpass frequency response, and the spacing as well as the bandwidth is determined by a constant mel frequency interval. The number of mel spectrum coefficients, K, is typically chosen as 20. Note that this filter bank is applied in the frequency domain, thus it simply amounts to applying the triangle-shape windows as in the Figure 4 to the spectrum. A useful way of thinking about this mel-wrapping filter bank is to view each filter as a histogram bin (where bins have overlap) in the frequency domain. 2.2.5 Cepstrum In this final step, we convert the log mel spectrum back to time. The result is called the mel frequency cepstrum coefficients (MFCC). The cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame analysis. Because the mel spectrum coefficients (and so their logarithm) are real numbers, we can convert them to the time domain using the Discrete Cosine Transform (DCT). Therefore if we denote those mel power spectrum coefficients that are the result of the last step are 1,...,2,0, ~ 0 −= KkS , we can calculate the MFCC's, ,~ nc as Note that we exclude the first component, ,~ 0c from the DCT since it represents the mean value of the input signal, which carried little speaker specific information. 2.3 Summary By applying the procedure described above, for each speech frame of around 30msec with overlap, a set of mel-frequency cepstrum coefficients is computed. These are result of a cosine transform of the logarithm of the short-term power spectrum expressed on a mel-frequency scale. This set of coefficients is called an acoustic vector. Therefore each input utterance is transformed into a sequence of acoustic vectors. In the next section we will see how those acoustic vectors can be used to represent and recognize the voice characteristic of the speaker. 8 K-1n K knSc K k kn ,...,1,0, 2 1 cos) ~ (log~ 1 =          −= ∑ = π
  • 9. 3 Feature Matching 3.1 Overview The problem of speaker recognition belongs to a much broader topic in scientific and engineering so called pattern recognition. The goal of pattern recognition is to classify objects of interest into one of a number of categories or classes. The objects of interest are generically called patterns and in our case are sequences of acoustic vectors that are extracted from an input speech using the techniques described in the previous section. The classes here refer to individual speakers. Since the classification procedure in our case is applied on extracted features, it can be also referred to as feature matching. Furthermore, if there exists some set of patterns that the individual classes of which are already known, then one has a problem in supervised pattern recognition. These patterns comprise the training set and are used to derive a classification algorithm. The remaining patterns are then used to test the classification algorithm; these patterns are collectively referred to as the test set. If the correct classes of the individual patterns in the test set are also known, then one can evaluate the performance of the algorithm. The state-of-the-art in feature matching techniques used in speaker recognition include Dynamic Time Warping (DTW), Hidden Markov Modeling (HMM), and Vector Quantization (VQ). In this project, the VQ approach will be used, due to ease of implementation and high accuracy. VQ is a process of mapping vectors from a large vector space to a finite number of regions in that space. Each region is called a cluster and can be represented by its center called a codeword. The collection of all codewords is called a codebook. Figure 5 shows a conceptual diagram to illustrate this recognition process. In the figure, only two speakers and two dimensions of the acoustic space are shown. The circles refer to the acoustic vectors from the speaker 1 while the triangles are from the speaker 2. In the training phase, using the clustering algorithm described in Section 4.2, a speaker-specific VQ codebook is generated for each known speaker by clustering his/her training acoustic vectors. The result codewords (centroids) are shown in Figure 5 by black circles and black triangles for speaker 1 and 2, respectively. The distance from a vector to the closest codeword of a codebook is called a VQ-distortion. In the recognition phase, an input utterance of an unknown voice is “vector-quantized” using each trained codebook and the total VQ distortion is computed. The speaker corresponding to the VQ codebook with smallest total distortion is identified as the speaker of the input utterance. 9
  • 10. Speaker 1 Speaker 1 centroid sample Speaker 2 centroid sample Speaker 2 VQ distortion Figure 5. Conceptual diagram illustrating vector quantization codebook formation. One speaker can be discriminated from another based of the location of centroids. 3.2 Clustering the Training Vectors After the enrolment session, the acoustic vectors extracted from input speech of each speaker provide a set of training vectors for that speaker. As described above, the next important step is to build a speaker-specific VQ codebook for each speaker using those training vectors. There is a well-know algorithm, namely LBG algorithm [Linde, Buzo and Gray, 1980], for clustering a set of L training vectors into a set of M codebook vectors. The algorithm is formally implemented by the following recursive procedure: 1. Design a 1-vector codebook; this is the centroid of the entire set of training vectors (hence, no iteration is required here). 2. Double the size of the codebook by splitting each current codebook yn according to the rule )1( ε+=+ nn yy )1( ε−=− nn yy where n varies from 1 to the current size of the codebook, and ε is a splitting parameter (we choose ε =0.01). 3. Nearest-Neighbor Search: for each training vector, find the codeword in the current codebook that is closest (in terms of similarity measurement), and assign that vector to the corresponding cell (associated with the closest codeword). 4. Centroid Update: update the codeword in each cell using the centroid of the training vectors assigned to that cell. 5. Iteration 1: repeat steps 3 and 4 until the average distance falls below a preset threshold 6. Iteration 2: repeat steps 2, 3 and 4 until a codebook size of M is designed. 10
  • 11. Intuitively, the LBG algorithm designs an M-vector codebook in stages. It starts first by designing a 1-vector codebook, then uses a splitting technique on the codewords to initialize the search for a 2-vector codebook, and continues the splitting process until the desired M-vector codebook is obtained. Figure 6 shows, in a flow diagram, the detailed steps of the LBG algorithm. “Cluster vectors” is the nearest-neighbor search procedure which assigns each training vector to a cluster associated with the closest codeword. “Find centroids” is the centroid update procedure. “Compute D (distortion)” sums the distances of all training vectors in the nearest-neighbor search so as to determine whether the procedure has converged. Find centroid Split each centroid Cluster vectors Find centroids Compute D (distortion) ε< − D D'D Stop D’ = D m = 2*m No Yes Yes No m < M Figure 6. Flow diagram of the LBG algorithm 4 Project As stated before, in this project we will experiment with the building and testing of an automatic speaker recognition system. In order to build such a system, one have to go through the steps that were described in previous sections. The most convenient platform for this is the Matlab environment since many of the above tasks were already implemented in Matlab. The project Web page given at the beginning provides a test database and several helper functions to ease the development process. We supplied you with two utility functions: melfb and disteu; and two main functions: train and test. Download all of these files from the project Web page into your working folder. The first two files can be treated as a black box, but the later two needs to be thoroughly understood. In fact, your tasks are to write two missing functions: mfcc and vqlbg, which will be called from the given main functions. In order to accomplish that, follow each 11
  • 12. step in this section carefully and check your understanding by answering all the questions. 4.1 Speech Data Down load the ZIP file of the speech database from the project Web page. After unzipping the file correctly, you will find two folders, TRAIN and TEST, each contains 8 files, named: S1.WAV, S2.WAV, …, S8.WAV; each is labeled after the ID of the speaker. These files were recorded in Microsoft WAV format. In Windows systems, you can listen to the recorded sounds by double clicking into the files. Our goal is to train a voice model (or more specific, a VQ codebook in the MFCC vector space) for each speaker S1 - S8 using the corresponding sound file in the TRAIN folder. After this training step, the system would have knowledge of the voice characteristic of each (known) speaker. Next, in the testing phase, the system will be able to identify the (assumed unknown) speaker of each sound file in the TEST folder. 4.2 Speech Processing In this phase you are required to write a Matlab function that reads a sound file and turns it into a sequence of MFCC (acoustic vectors) using the speech processing steps described previously. Many of those tasks are already provided by either standard or our supplied Matlab functions. The Matlab functions that you would need are: wavread, hamming, fft, dct and melfb (supplied function). Type help function_name at the Matlab prompt for more information about these functions. 4.3 Vector Quantization The result of the last section is that we transform speech signals into vectors in an acoustic space. In this section, we will apply the VQ-based pattern recognition technique to build speaker reference models from those vectors in the training phase and then can identify any sequences of acoustic vectors uttered by unknown speakers. 4.4 Simulation and Evaluation Now is the final part! Use the two supplied programs: train and test (which require two functions mfcc and vqlbg that you just complete) to simulate the training and testing procedure in speaker recognition system, respectively. 12
  • 13. REFERENCES [1] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, N.J., 1993. [2] L.R Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice- Hall, Englewood Cliffs, N.J., 1978. [3] S.B. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Transactions on Acoustics, Speech, Signal Processing, Vol. ASSP-28, No. 4, August 1980. 13
  • 14. PROGRAM %% Project: Voice Recognition and Identification system % By bukasa tshibangu, banzadio salazaku , mutumba maliro %---------------------------------------------------------------------- ---- char st; char st1; char st2; char st3; disp('Project: Voice Recognition and Identification system'); disp('By bukasa tshibangu & banzadio salazaku & mutumba maliro '); disp(' '); pause(0.5); disp('LOADING '); pause(1); disp('... '); pause(1); disp('... '); pause(1); disp('... '); pause(1); disp('... '); % Preallocating array str = {8}; fstr = {8}; nbtr = {8}; ste = {8}; fste = {8}; nbte = {8}; ctr = {8}; dtr={8}; cte = {8}; dte={8}; data = {drogba,4}; code = {8}; for i = 1:8 % Read audio data from train folder for performing operations st=strcat('trains',num2str(i),'.wav'); [s1 fs1 nb1]=wavread(st); str{i} = s1; fstr{i} = fs1; nbtr{i} = nb1; % Read audio data from test folder for performing operations st = strcat('tests',num2str(i),'.wav'); [st1 fst1 nbt1] = wavread(st); ste{i} = st1; fste{i} = fst1; nbte{i} = nbt1; 14
  • 15. % Compute MFCC of the audio data to be used in Speech Processing for Train % Folder ctr{i} = mfcc(str{i},fstr{i}); % Compute MFCC of the audio data to be used in Speech Processing for Test % Folder cte{i} = mfcc(ste{i},fste{i}); % Compute Vector Quantization of the audio data to be used in Speech % Processing for Train Folder dtr{i} = vqlbg(ctr{i},16); % Compute Vector Quantization of the audio data to be used in Speech % Processing for Test Folder dte{i} = vqlbg(cte{i},16); end % For making Choice ch=0; poss=11; while ch~=poss ch=menu('Speaker Recognition System','1: Human speaker recognition',... '2: Technical data of samples',... '3: Power Spectrum','4: Power Spectrum with different M and N',... '5: Mel-Spaced Filter Bank',... '6: Spectrum before and after Mel-Frequency wrapping',... '7: 2D plot of acoustic vectors',... '8: Plot of VQ codewords','9: Recognition rate of the computer',... '10: Test with other speech files','11: Exit'); disp(' '); %------------------------------------------------------------------ ---- %% 1: Human speaker recognition if ch==1 disp('> 1: Human speaker recognition'); disp('Play each sound file in the TRAIN folder.'); disp('Can you distinguish the voices of those eight speakers?'); disp('Now play each sound in the TEST folder in a random order without looking at the file name '); disp('and try to identify the speaker using your knowledge of their voices that you have just heard,'); disp('from the TRAIN folder. This is exactly what the computer will do in our system.'); disp(' '); disp(' '); 15
  • 16. disp('All of us seem to be unable to recognise random people just by listening to their voice. '); disp('We also realize that we do not identify speakers by the frequencies with which they use to talk, '); disp('but rather by other characteristics, like accent, speed, etc.'); pause(1); ch2=0; while ch2~=4 ch2=menu('Select Folder','Train','Test','User','Exit'); if ch2==1 ch3=0; while ch3~=9 ch3=menu('Train :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 p=audioplayer(str{ch3},fstr{ch3},nbtr{ch3}); play(p); end end end if ch2==2 ch3=0; while ch3~=9 ch3=menu('Test :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 p=audioplayer(ste{ch3},fste{ch3},nbte{ch3}); play(p); end end close all; end if ch2==3 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); ch32=0; while ch32 ~=2 ch32=menu('Database Information','Database','Exit'); if ch32==1 st=strcat('Sound Database has : #',num2str(sound_number),'words. Enter a database number : #'); prompt = {st}; 16
  • 17. dlg_title = 'Database Information'; num_lines = 1; def = {'1'}; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; an = inputdlg(prompt,dlg_title,num_lines,def); an=cell2mat(an); a = str2double(an); if (isempty(an)) else if (a <= sound_number) st=strcat('u',num2str(an)); [s fs nb]=wavread(st); p=audioplayer(s,fs,nb); play(p); else warndlg('Invalid Word ','Warning'); end end end close all; end else warndlg('Database is empty.',' Warning ') end end end end %------------------------------------------------------------------ ---- %% 2: Technical data of samples if ch==2 disp('> 2: Technical data of samples'); ch23=0; while ch23~=4 ch23=menu('Select Folder','Train','Test','User','Exit'); if ch23==1 poss2=9; ch2=0; while ch2~=poss2 ch2=menu('Technical data of samples for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch2~=9 17
  • 18. t = 0:1/fstr{ch2}:(length(str{ch2}) - 1)/fstr{ch2}; plot(t, str{ch2}), axis([0, (length(str{ch2}) - 1)/fstr{ch2} -0.4 0.5]); st=sprintf('Plot of signal s%d.wav',ch2); title(st); xlabel('Time [s]'); ylabel('Amplitude (normalized)') end end close all end if ch23==2 poss2=9; ch2=0; while ch2~=poss2 ch2=menu('Technical data of samples for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch2~=9 t = 0:1/fste{ch2}:(length(ste{ch2}) - 1)/fste{ch2}; plot(t, ste{ch2}), axis([0, (length(ste{ch2}) - 1)/fste{ch2} -0.4 0.5]); st=sprintf('Plot of signal s%d.wav',ch2); title(st); xlabel('Time [s]'); ylabel('Amplitude (normalized)') end end close all end if ch23==3 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); ch32=0; while ch32 ~=2 ch32=menu('Database Information','Database','Exit'); if ch32==1 st=strcat('Sound Database has : #',num2str(sound_number),'words. Enter a database number : #'); prompt = {st}; dlg_title = 'Database Information'; num_lines = 1; def = {'1'}; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; an = inputdlg(prompt,dlg_title,num_lines,def); 18
  • 19. an=cell2mat(an); a = str2double(an); if (isempty(an)) else if (a <= sound_number) st=strcat('u',num2str(an)); [s fs]=wavread(st); t = 0:1/fs:(length(s) - 1)/fs; plot(t, s), axis([0, (length(s) - 1)/fs -0.4 0.5]); st=sprintf('Plot of signal %s',st); title(st); xlabel('Time [s]'); ylabel('Amplitude (normalized)') else warndlg('Invalid Word ','Warning'); end end end end close all; else warndlg('Database is empty.',' Warning ') end end end end %------------------------------------------------------------------ ---- %% 3: linear and logarithmic power spectrum plot if ch==3 M = 100; N = 256; disp('> 3: Power Spectrum Plot'); disp(' '); disp('>Linear and Logarithmic spectrum plot'); ch23=0; while ch23~=4 ch23=menu('Select Folder','Train','Test','User','Exit'); if ch23==1 poss3=9; ch3=0; while ch3~=poss3 ch3=menu('Linear and Logarithmic Power Spectrum Plot for : ','Signal 1','Signal 2','Signal 3',... 19
  • 20. 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 % 3 (linear) frames = blockFrames(str{ch3}, fstr{ch3}, M, N); t = N / 2; tm = length(str{ch3}) / fstr{ch3}; subplot(121); imagesc([0 tm], [0 fstr{ch3}/2], abs(frames(1:t, :)).^2), axis xy; title('Power Spectrum (M = 100, N = 256)'); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar; % 3 (logarithmic) subplot(122); imagesc([0 tm], [0 fstr{ch3}/2], 20 * log10(abs(frames(1:t, :)).^2)), axis xy; title('Logarithmic Power Spectrum (M = 100, N = 256)'); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar; % D=get(gcf,'Position'); % set(gcf,'Position',round([D(1)*.5 D(2)*.5 D(3)*2 D(4)*1.3])) end end close all end if ch23==2 poss3=9; ch3=0; while ch3~=poss3 ch3=menu('Linear and Logarithmic Power Spectrum Plot for : ','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 % 3 (linear) frames = blockFrames(ste{ch3}, fste{ch3}, M, N); t = N / 2; tm = length(ste{ch3}) / fste{ch3}; subplot(121); imagesc([0 tm], [0 fste{ch3}/2], abs(frames(1:t, :)).^2), axis xy; title('Power Spectrum (M = 100, N = 256)'); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar; % 3 (logarithmic) subplot(122); imagesc([0 tm], [0 fste{ch3}/2], 20 * log10(abs(frames(1:t, :)).^2)), axis xy; title('Logarithmic Power Spectrum (M = 100, N = 256)'); 20
  • 21. xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar; % D=get(gcf,'Position'); % set(gcf,'Position',round([D(1)*.5 D(2)*.5 D(3)*2 D(4)*1.3])) end end close all; end if ch23==3 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); ch32=0; while ch32 ~=2 ch32=menu('Database Information','Database','Exit'); if ch32==1 st=strcat('Sound Database has : #',num2str(sound_number),'words. Enter a database number : #'); prompt = {st}; dlg_title = 'Database Information'; num_lines = 1; def = {'1'}; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; an = inputdlg(prompt,dlg_title,num_lines,def); an=cell2mat(an); a = str2double(an); if (isempty(an)) else if (a <= sound_number) st=strcat('u',num2str(an)); [s fs]=wavread(st); frames = blockFrames(s, fs, M, N); t = N / 2; tm = length(s) / fs; subplot(121); imagesc([0 tm], [0 fs/2], abs(frames(1:t, :)).^2), axis xy; title('Power Spectrum (M = 100, N = 256)'); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar; %Question 3 (logarithmic) subplot(122); 21
  • 22. imagesc([0 tm], [0 fs/2], 20 * log10(abs(frames(1:t, :)).^2)), axis xy; title('Logarithmic Power Spectrum (M = 100, N = 256)'); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar; else warndlg('Invalid Word ','Warning'); end end end end close all; else warndlg('Database is empty.',' Warning ') end end end end %------------------------------------------------------------------ ---- %% 4: Plots for different values for N if ch==4 disp('> 4: Plots for different values for M and N'); lN = [128 256 512]; ch23=0; while ch23~=4 ch23=menu('Select Folder','Train','Test','User','Exit'); if ch23==1 poss3=9; ch3=0; while ch3~=poss3 ch3=menu('Plots for different values of M and N for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 u=220; for i = 1:length(lN) N = lN(i); M = round(N / 3); frames = blockFrames(str{ch3}, fstr{ch3}, M, N); t = N / 2; tm = length(str{ch3}) / fstr{ch3}; temp = size(frames); nbframes = temp(2); u=u+1; subplot(u) 22
  • 23. imagesc([0 tm], [0 fstr{ch3}/2], 20 * log10(abs(frames(1:t, :)).^2)), axis xy; title(sprintf('Power Spectrum (M = %i, N = %i, frames = %i)', M, N, nbframes)); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar end % D=get(gcf,'Position'); % set(gcf,'Position',round([D(1)*.5 D(2)*.5 D(3)*1.5 D(4)*1.5])) end end close all end if ch23==2 poss3=9; ch3=0; while ch3~=poss3 ch3=menu('Plots for different values of M and N for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 u=220; for i = 1:length(lN) N = lN(i); M = round(N / 3); frames = blockFrames(ste{ch3}, fste{ch3}, M, N); t = N / 2; tm = length(ste{ch3}) / fste{ch3}; temp = size(frames); nbframes = temp(2); u=u+1; subplot(u) imagesc([0 tm], [0 fste{ch3}/2], 20 * log10(abs(frames(1:t, :)).^2)), axis xy; title(sprintf('Power Spectrum (M = %i, N = %i, frames = %i)', M, N, nbframes)); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar end % D=get(gcf,'Position'); % set(gcf,'Position',round([D(1)*.5 D(2)*.5 D(3)*1.5 D(4)*1.5])) end end close all; end if ch23==3 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); 23
  • 24. ch32=0; while ch32 ~=2 ch32=menu('Database Information','Database','Exit'); if ch32==1 st=strcat('Sound Database has : #',num2str(sound_number),'words. Enter a database number : #'); prompt = {st}; dlg_title = 'Database Information'; num_lines = 1; def = {'1'}; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; an = inputdlg(prompt,dlg_title,num_lines,def); an=cell2mat(an); a = str2double(an); if (isempty(an)) else if (a <= sound_number) st=strcat('u',num2str(an)); [s fs]=wavread(st); u=220; for i = 1:length(lN) N = lN(i); M = round(N / 3); frames = blockFrames(s, fs, M, N); t = N / 2; tm = length(s) / fs; temp = size(frames); nbframes = temp(2); u=u+1; subplot(u) imagesc([0 tm], [0 fs/2], 20 * log10(abs(frames(1:t, :)).^2)), axis xy; title(sprintf('Power Spectrum (M = %i, N = %i, frames = %i)', M, N, nbframes)); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar end else warndlg('Invalid Word ','Warning'); end end end end close all; else 24
  • 25. warndlg('Database is empty.',' Warning ') end end end end %------------------------------------------------------------------ ---- %% 5: Mel Space if ch==5 disp('> 5: Mel Space'); disp(' '); disp('Mel Space is function of sampling rate and since all signals '); disp('are recorded at same sampling rate so they have same Mel Space.'); ch23=0; while ch23~=4 ch23=menu('Select Folder','Train','Test','User','Exit'); if ch23==1 poss3=9; ch3=0; while ch3~=poss3 ch3=menu('Mel Space for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 plot(linspace(0, (fstr{ch3}/2), 129), (melfb(20, 256, fstr{ch3}))); title('Mel-Spaced Filterbank'); xlabel('Frequency [Hz]'); end end close all end if ch23==2 poss3=9; ch3=0; while ch3~=poss3 ch3=menu('Mel Space for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 plot(linspace(0, (fste{ch3}/2), 129), (melfb(20, 256, fste{ch3}))); title('Mel-Spaced Filterbank'); xlabel('Frequency [Hz]'); end end close all; 25
  • 26. end if ch23==3 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); ch32=0; while ch32 ~=2 ch32=menu('Database Information','Database','Exit'); if ch32==1 st=strcat('Sound Database has : #',num2str(sound_number),'words. Enter a database number : #'); prompt = {st}; dlg_title = 'Database Information'; num_lines = 1; def = {'1'}; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; an = inputdlg(prompt,dlg_title,num_lines,def); an=cell2mat(an); a=str2double(an); if (isempty(an)) else if (a <= sound_number) st=strcat('u',num2str(an)); [s fs]=wavread(st); plot(linspace(0, (fs/2), 129), (melfb(20, 256, fs))); title('Mel-Spaced Filterbank'); xlabel('Frequency [Hz]'); else warndlg('Invalid Word ','Warning'); end end end end close all; else warndlg('Database is empty.',' Warning ') end end end end %------------------------------------------------------------------ ---- %% 6: Modified spectrum 26
  • 27. if ch==6 disp('> 6: Modified spectrum'); disp(' '); disp('Spectrum before and after Mel-Frequency wrapping'); M = 100; N = 256; n2 = 1 + floor(N / 2); ch23=0; while ch23~=4 ch23=menu('Select Folder','Train','Test','User','Exit'); if ch23==1 poss3=9; ch3=0; while ch3~=poss3 ch3=menu('Mel Space for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 frames = blockFrames(str{ch3}, fstr{ch3}, M, N); m = melfb(20, N, fstr{ch3}); z = m * abs(frames(1:n2, :)).^2; tm = length(str{ch3}) / fstr{ch3}; subplot(121) imagesc([0 tm], [0 fstr{ch3}/2], abs(frames(1:n2, :)).^2), axis xy; title('Power Spectrum unmodified'); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar; subplot(122) imagesc([0 tm], [0 20], z), axis xy; title('Power Spectrum modified through Mel Cepstrum filter'); xlabel('Time [s]'); ylabel('Number of Filter in Filter Bank'); % colorbar;D=get(gcf,'Position'); % set(gcf,'Position',[0 D(2) D(3)/2 D(4)]) end end close all end if ch23==2 poss3=9; ch3=0; while ch3~=poss3 ch3=menu('Mel Space for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch3~=9 frames = blockFrames(str{ch3}, fstr{ch3}, M, N); 27
  • 28. m = melfb(20, N, fstr{ch3}); z = m * abs(frames(1:n2, :)).^2; tm = length(str{ch3}) / fstr{ch3}; subplot(121) imagesc([0 tm], [0 fstr{ch3}/2], abs(frames(1:n2, :)).^2), axis xy; title('Power Spectrum unmodified'); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar; subplot(122) imagesc([0 tm], [0 20], z), axis xy; title('Power Spectrum modified through Mel Cepstrum filter'); xlabel('Time [s]'); ylabel('Number of Filter in Filter Bank'); % colorbar;D=get(gcf,'Position'); % set(gcf,'Position',[0 D(2) D(3)/2 D(4)]) end end close all; end if ch23==3 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); ch32=0; while ch32 ~=2 ch32=menu('Database Information','Database','Exit'); if ch32==1 st=strcat('Sound Database has : #',num2str(sound_number),'words. Enter a database number : #'); prompt = {st}; dlg_title = 'Database Information'; num_lines = 1; def = {'1'}; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; an = inputdlg(prompt,dlg_title,num_lines,def); an=cell2mat(an); a=str2double(an); if (isempty(an)) else if (a <= sound_number) st=strcat('u',num2str(an)); [s fs]=wavread(st); frames = blockFrames(s, fs, M, N); m = melfb(20, N, fs); 28
  • 29. z = m * abs(frames(1:n2, :)).^2; tm = length(s) / fs; subplot(121) imagesc([0 tm], [0 fs/2], abs(frames(1:n2, :)).^2), axis xy; title('Power Spectrum unmodified'); xlabel('Time [s]'); ylabel('Frequency [Hz]'); colorbar; subplot(122) imagesc([0 tm], [0 20], z), axis xy; title('Power Spectrum modified through Mel Cepstrum filter'); xlabel('Time [s]'); ylabel('Number of Filter in Filter Bank'); colorbar; else warndlg('Invalid Word ','Warning'); end end end end close all; else warndlg('Database is empty.',' Warning ') end end end end %------------------------------------------------------------------ ---- %% 7: 2D plot of accustic vectors if ch==7 disp('> 7: 2D plot of accustic vectors'); ch23=0; while ch23~=4 ch23=menu('Select Folder','Train','Test','User','Exit'); if ch23==1 poss3=3; ch3=0; while ch3~=poss3 ch3=menu('2D plot of accustic vectors representation : ','1. One Signal',... '2. Two Signal','3. Exit'); if ch3==1 ch31=0; while ch31~=9 ch31=menu('2D plot of accustic vectors for :','Signal 1','Signal 2','Signal 3',... 29
  • 30. 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch31~=9 plot(ctr{ch31}(5, :), ctr{ch31}(6, :), 'or'); xlabel('5th Dimension'); ylabel('6th Dimension'); st=sprintf('Signal %d ',ch31); legend(st); title('2D plot of accoustic vectors'); end end close all; end if ch3==2 ch32=0; while ch32~=8 ch32=menu('2D plot of accustic vectors for :','Signal 1 & Signal 2',... 'Signal 2 & Signal 3','Signal 3 & Signal 4','Signal 4 & Signal 5',... 'Signal 5 & Signal 6','Signal 6 & Signal 7','Signal 7 & Signal 8','Exit'); if ch32~=8 plot(ctr{ch32}(5, :), ctr{ch32}(6, :), 'or'); hold on; plot(ctr{ch32+1}(5, :), ctr{ch32+1} (6, :), 'xb'); xlabel('5th Dimension'); ylabel('6th Dimension'); st=sprintf('Signal %d,',ch32); st1=sprintf('Signal %d', (ch32+1) ); legend(st,st1); title('2D plot of accoustic vectors'); hold off end end end close all end end if ch23==2 poss3=3; ch3=0; while ch3~=poss3 ch3=menu('2D plot of accustic vectors representation : ','1. One Signal',... '2. Two Signal','3. Exit'); if ch3==1 ch31=0; while ch31~=9 ch31=menu('2D plot of accustic vectors for :','Signal 1','Signal 2','Signal 3',... 30
  • 31. 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch31~=9 plot(cte{ch31}(5, :), cte{ch31}(6, :), 'or'); xlabel('5th Dimension'); ylabel('6th Dimension'); st=sprintf('Signal %d ',ch31); legend(st); title('2D plot of accoustic vectors'); end end close all; end if ch3==2 ch32=0; while ch32~=8 ch32=menu('2D plot of accustic vectors for :','Signal 1 & Signal 2',... 'Signal 2 & Signal 3','Signal 3 & Signal 4','Signal 4 & Signal 5',... 'Signal 5 & Signal 6','Signal 6 & Signal 7','Signal 7 & Signal 8','Exit'); if ch32~=8 plot(cte{ch32}(5, :), cte{ch32}(6, :), 'or'); hold on; plot(cte{ch32+1}(5, :), cte{ch32+1} (6, :), 'xb'); xlabel('5th Dimension'); ylabel('6th Dimension'); st=sprintf('Signal %d,',ch32); st1=sprintf('Signal %d', (ch32+1) ); legend(st,st1); title('2D plot of accoustic vectors'); hold off end end end close all end end if ch23==3 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); ch32=0; while ch32 ~=2 ch32=menu('Database Information','Database','Exit'); if ch32==1 31
  • 32. st=strcat('Sound Database has : #',num2str(sound_number),'words. Enter a database number : #'); prompt = {st}; dlg_title = 'Database Information'; num_lines = 1; def = {'1'}; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; an = inputdlg(prompt,dlg_title,num_lines,def); an=cell2mat(an); a=str2double(an); if (isempty(an)) else if (a <= sound_number) st=strcat('u',num2str(an)); [s fs]=wavread(st); c = mfcc(s, fs); plot(c(5, :), c(6, :), 'or'); xlabel('5th Dimension'); ylabel('6th Dimension'); st1=sprintf('Signal %s.wav',st); legend(st1); title('2D plot of accoustic vectors'); else warndlg('Invalid Word ','Warning'); end end end end close all; else warndlg('Database is empty.',' Warning ') end end end end %------------------------------------------------------------------ ---- %% 8: Plot of the 2D trained VQ codewords if ch==8 disp('> 8: Plot of the 2D trained VQ codewords'); ch23=0; while ch23~=4 ch23=menu('Select Folder','Train','Test','User','Exit'); if ch23==1 poss3=3; 32
  • 33. ch3=0; while ch3~=poss3 ch3=menu('2D plot of accustic vectors representation : ','1. One Signal',... '2. Two Signal','3. Exit'); if ch3==1 ch31=0; while ch31~=9 ch31=menu('2D plot of accustic vectors for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch31~=9 plot(ctr{ch31}(5, :), ctr{ch31}(6, :), 'xr') hold on plot(dtr{ch31}(5, :), dtr{ch31}(6, :), 'vk') xlabel('5th Dimension'); ylabel('6th Dimension'); st=sprintf('Speaker %d',ch31); st1=sprintf('Codebook %d', (ch31) ); legend(st,st1); title('2D plot of accoustic vectors'); hold off end end close all end if ch3==2 ch32=0; while ch32~=8 ch32=menu('2D plot of accustic vectors for :','Signal 1 & Signal 2',... 'Signal 2 & Signal 3','Signal 3 & Signal 4','Signal 4 & Signal 5',... 'Signal 5 & Signal 6','Signal 6 & Signal 7','Signal 7 & Signal 8','Exit'); if ch32~=8 plot(ctr{ch32}(5, :), ctr{ch32}(6, :), 'xr') hold on plot(dtr{ch32}(5, :), dtr{ch32}(6, :), 'vk') plot(ctr{ch32+1}(5, :), ctr{ch32+1} (6, :), 'xb') plot(dtr{ch32+1}(5, :), dtr{ch32+1} (6, :), '+k') xlabel('5th Dimension'); ylabel('6th Dimension'); st=sprintf('Speaker %d',ch32); st1=sprintf('Codebook %d',ch32 ); st2=sprintf('Speaker %d',(ch32+1) ); 33
  • 34. st3=sprintf('Codebook %d', (ch32+1) ); legend(st,st1,st2,st3); title('2D plot of accoustic vectors'); hold off end end end close all end end if ch23==2 poss3=3; ch3=0; while ch3~=poss3 ch3=menu('2D plot of accustic vectors representation : ','1. One Signal',... '2. Two Signal','3. Exit'); if ch3==1 ch31=0; while ch31~=9 ch31=menu('2D plot of accustic vectors for :','Signal 1','Signal 2','Signal 3',... 'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit'); if ch31~=9 plot(cte{ch31}(5, :), cte{ch31}(6, :), 'xr') hold on plot(dte{ch31}(5, :), dte{ch31}(6, :), 'vk') xlabel('5th Dimension'); ylabel('6th Dimension'); st=sprintf('Speaker %d',ch31); st1=sprintf('Codebook %d', (ch31) ); legend(st,st1); title('2D plot of accoustic vectors'); hold off end end close all end if ch3==2 ch32=0; while ch32~=8 ch32=menu('2D plot of accustic vectors for :','Signal 1 & Signal 2',... 'Signal 2 & Signal 3','Signal 3 & Signal 4','Signal 4 & Signal 5',... 'Signal 5 & Signal 6','Signal 6 & Signal 7','Signal 7 & Signal 8','Exit'); if ch32~=8 34
  • 35. plot(cte{ch32}(5, :), cte{ch32}(6, :), 'xr') hold on plot(dte{ch32}(5, :), dte{ch32}(6, :), 'vk') plot(cte{ch32+1}(5, :), cte{ch32+1} (6, :), 'xb') plot(dte{ch32+1}(5, :), dte{ch32+1} (6, :), '+k') xlabel('5th Dimension'); ylabel('6th Dimension'); st=sprintf('Speaker %d',ch32); st1=sprintf('Codebook %d',ch32 ); st2=sprintf('Speaker %d', (ch32+1) ); st3=sprintf('Codebook %d', (ch32+1) ); legend(st,st1,st2,st3); title('2D plot of accoustic vectors'); hold off end end end close all end end if ch23==3 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); ch32=0; while ch32 ~=2 ch32=menu('Database Information','Database','Exit'); if ch32==1 st=strcat('Sound Database has : #',num2str(sound_number),'words. Enter a database number : #'); prompt = {st}; dlg_title = 'Database Information'; num_lines = 1; def = {'1'}; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; an = inputdlg(prompt,dlg_title,num_lines,def); an=cell2mat(an); a=str2double(an); if (isempty(an)) else if (a <= sound_number) st=strcat('u',num2str(an)); [s fs]=wavread(st); 35
  • 36. c = mfcc(s, fs); d = vqlbg(c, 16); plot(c(5, :), c(6, :), 'xr'); hold on plot(d(5, :), d(6, :), 'vk'); xlabel('5th Dimension'); ylabel('6th Dimension'); st1=sprintf('Speaker %s',st); st2=sprintf('Codebook %s',st); legend(st1,st2); title('2D plot of accoustic vectors'); hold off else warndlg('Invalid Word ','Warning'); end end end end close all; else warndlg('Database is empty.',' Warning ') end end end end %------------------------------------------------------------------ ---- %% 9: Recognition rate of the computer if ch==9 disp('> 9: Recognition rate of the computer') %-------------------------------------------------------------- -------- %% 9.1 Loading from Test Folder for Comparison % All 8 samples data values are loaded in file sounddatabase1.dat. for sound_number = 1: 8 if size(ste{sound_number},2)==2 ste{sound_number}=ste{sound_number}(:,1); end ste{sound_number} = double( ste{sound_number} ); data{sound_number,1} = ste{sound_number}; data{sound_number,2} = sound_number; st = sprintf('s%d.wav',sound_number); data{sound_number,3} = st; data{sound_number,4} = 'Test'; fs=fste{sound_number}; %#ok<NASGU> nb=nbte{sound_number}; %#ok<NASGU> if sound_number == 1; save('sound_database1.dat','data','sound_number','fs',' nb'); else 36
  • 37. save('sound_database1.dat','data','sound_number','fs',' nb','-append'); end end disp(' '); disp('Sounds From TEST added to database for comparison'); disp(' '); %-------------------------------------------------------------- -------- %% 9.2 Comparing one by one data from TRAIN FOLDER disp('Comparing one by one data from TRAIN FOLDER'); disp(' '); load('sound_database1.dat','-mat'); k =16; for ii=1:sound_number % Compute MFCC cofficients for each sound present in database v = mfcc(data{ii,1}, fstr{ii}); % Train VQ codebook code{ii} = vqlbg(v, k); end flag1 = 0; for classe = 1:8 st = sprintf('TrainS%d.wav to be compared',classe); disp(st); pause(0.5); if size(str{classe},2)==2 str{classe}=str{classe}(:,1); end str{classe} = double(str{classe}); %----- code for speaker recognition ------- disp('MFCC cofficients computation and VQ codebook training in progress...'); % Number of centroids required disp(' '); % Compute MFCC coefficients for input sound v = mfcc(str{classe},fstr{classe}); % Current distance and sound ID initialization distmin = Inf; k1 = 0; for ii=1:sound_number d = disteu(v, code{ii}); dist = sum(min(d,[],2)) / size(d,1); if dist < distmin distmin = dist; k1 = ii; end end min_index = k1; speech_id = data{min_index,2}; %----------------------------------------- disp('Completed.'); disp('Matching sound:'); disp(' '); 37
  • 38. message=strcat('File:',data{min_index,3}); disp(message); message=strcat('Location:',data{min_index,4}); disp(message); message = strcat('Recognized speaker ID: ',num2str(speech_id)); disp(message); disp(' '); if classe == speech_id flag1 = flag1 + 1; end end disp(' '); pause(0.5) st1 = strcat('This prototype is', num2str(100*flag1/classe),'% efficient in recognising these 8 different stored sounds in TEST and TRAIN folders.'); msgbox(st1,'Success','help'); end %------------------------------------------------------------------ --- if ch==10 disp('> 10: Test with other speech files') msgbox('P.S. This prototype is for secondary security usage.','NOTE','help'); pause(2); msgbox('Kindly Note this works for the stored databases only. This means that you can add sounds to the database by users and Recognition will be done for the users entered. ','NOTE','help') pause(2); chos=0; possibility=5; while chos~=possibility, chos=menu('Speaker Recognition System','Add a new sound from microphone',... 'Speaker recognition from microphone',... 'Database Info','Delete database','Exit'); %---------------------------------------------------------- ------------ %% 10.1 Add a new sound from microphone if chos==1 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); classe = input('Insert a class number (sound ID) that will be used for recognition:'); if isempty(classe) classe = sound_number+1; disp( num2str(classe) ); end 38
  • 39. message=('The following parameters will be used during recording:'); disp(message); message=strcat('Sampling frequency',num2str(samplingfrequency)); disp(message); message=strcat('Bits per sample',num2str(samplingbits)); disp(message); durata = input('Insert the duration of the recording (in seconds):'); if isempty(durata) durata = 3; disp( num2str(durata) ); end micrecorder = audiorecorder(samplingfrequency,samplingbits,1); disp('Now, speak into microphone...'); record(micrecorder,durata); while (isrecording(micrecorder)==1) disp('Recording...'); pause(0.5); end disp('Recording stopped.'); y1 = getaudiodata(micrecorder); y = getaudiodata(micrecorder, 'uint8'); if size(y,2)==2 y=y(:,1); end y = double(y); sound_number = sound_number+1; data{sound_number,1} = y; data{sound_number,2} = classe; data{sound_number,3} = 'Microphone'; data{sound_number,4} = 'Microphone'; st=strcat('u',num2str(sound_number)); wavwrite(y1,samplingfrequency,samplingbits,st) save('sound_database.dat','data','sound_number','- append'); msgbox('Sound added to database','Database result','help'); disp('Sound added to database'); else classe = input('Insert a class number (sound ID) that will be used for recognition:'); if isempty(classe) classe = 1; disp( num2str(classe) ); end durata = input('Insert the duration of the recording (in seconds):'); if isempty(durata) durata = 3; disp( num2str(durata) ); 39
  • 40. end samplingfrequency = input('Insert the sampling frequency (22050 recommended):'); if isempty(samplingfrequency ) samplingfrequency = 22050; disp( num2str(samplingfrequency) ); end samplingbits = input('Insert the number of bits per sample (8 recommended):'); if isempty(samplingbits ) samplingbits = 8; disp( num2str(samplingbits) ); end micrecorder = audiorecorder(samplingfrequency,samplingbits,1); disp('Now, speak into microphone...'); record(micrecorder,durata); while (isrecording(micrecorder)==1) disp('Recording...'); pause(0.5); end disp('Recording stopped.'); y1 = getaudiodata(micrecorder); y = getaudiodata(micrecorder, 'uint8'); if size(y,2)==2 y=y(:,1); end y = double(y); sound_number = 1; data{sound_number,1} = y; data{sound_number,2} = classe; data{sound_number,3} = 'Microphone'; data{sound_number,4} = 'Microphone'; st=strcat('u',num2str(sound_number)); wavwrite(y1,samplingfrequency,samplingbits,st) save('sound_database.dat','data','sound_number','sa mplingfrequency','samplingbits'); msgbox('Sound added to database','Database result','help'); disp('Sound added to database'); end end %---------------------------------------------------------- ------------ %% 10.2 Voice Recognition from microphone if chos==2 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); Fs = samplingfrequency; durata = input('Insert the duration of the recording (in seconds):'); 40
  • 41. if isempty(durata) durata = 3; disp( num2str(durata) ); end micrecorder = audiorecorder(samplingfrequency,samplingbits,1); disp('Now, speak into microphone...'); record(micrecorder,durata); while (isrecording(micrecorder)==1) disp('Recording...'); pause(0.5); end disp('Recording stopped.'); y = getaudiodata(micrecorder); st='v'; wavwrite(y,samplingfrequency,samplingbits,st); y = getaudiodata(micrecorder, 'uint8'); % if the input sound is not mono if size(y,2)==2 y=y(:,1); end y = double(y); %----- code for speaker recognition ------- disp('MFCC cofficients computation and VQ codebook training in progress...'); disp(' '); % Number of centroids required k =16; for ii=1:sound_number % Compute MFCC cofficients for each sound present in database v = mfcc(data{ii,1}, Fs); % Train VQ codebook code{ii} = vqlbg(v, k); disp('...'); end disp('Completed.'); % Compute MFCC coefficients for input sound v = mfcc(y,Fs); % Current distance and sound ID initialization distmin = Inf; k1 = 0; for ii=1:sound_number d = disteu(v, code{ii}); dist = sum(min(d,[],2)) / size(d,1); message=strcat('For User #',num2str(ii),' Dist : ',num2str(dist)); disp(message); if dist < distmin distmin = dist; k1 = ii; end 41
  • 42. end if distmin < ronaldo min_index = k1; speech_id = data{min_index,2}; %----------------------------------------- disp('Matching sound:'); message=strcat('File:',data{min_index,3}); disp(message); message=strcat('Location:',data{min_index,4}); disp(message); message = strcat('Recognized speaker ID: ',num2str(speech_id)); disp(message); msgbox(message,'Matching result','help'); ch3=0; while ch3~=3 ch3=menu('Matched result verification:','Recognized Sound','Recorded sound','Exit'); if ch3==1 st=strcat('u',num2str(speech_id)); [s fs nb]=wavread(st); p=audioplayer(s,fs,nb); play(p); end if ch3==2 [s fs nb]=wavread('v'); p=audioplayer(s,fs,nb); play(p); end end else warndlg('Wrong User . No matching Result.',' Warning ') end else warndlg('Database is empty. No matching is possible.',' Warning ') end end %---------------------------------------------------------- ------------ %% 10.3 Database Info if chos==3 if (exist('sound_database.dat','file')==2) load('sound_database.dat','-mat'); message=strcat('Database has #',num2str(sound_number),'words:'); disp(message); disp(' '); 42
  • 43. for ii=1:sound_number message=strcat('Location:',data{ii,3}); disp(message); message=strcat('File:',data{ii,4}); disp(message); message=strcat('Sound ID:',num2str(data{ii,2})); disp(message); disp('-'); end ch32=0; while ch32 ~=2 ch32=menu('Database Information','Database','Exit'); if ch32==1 st=strcat('Sound Database has : #',num2str(sound_number),'words. Enter a database number : #'); prompt = {st}; dlg_title = 'Database Information'; num_lines = 1; def = {'1'}; options.Resize='on'; options.WindowStyle='normal'; options.Interpreter='tex'; an = inputdlg(prompt,dlg_title,num_lines,def); an=cell2mat(an); a=str2double(an); if (isempty(an)) else if (a <= sound_number) st=strcat('u',num2str(an)); [s fs nb]=wavread(st); p=audioplayer(s,fs,nb); play(p); else warndlg('Invalid Word ','Warning'); end end end end else warndlg('Database is empty.',' Warning ') end end %---------------------------------------------------------- ------------ %% 10.4 Delete database if chos==4 43
  • 44. %clc; close all; if (exist('sound_database.dat','file')==2) button = questdlg('Do you really want to remove the Database?'); if strcmp(button,'Yes') load('sound_database.dat','-mat'); for ii=1:sound_number st=strcat('u',num2str(ii),'.wav'); delete(st); end if (exist('v.wav','file')==2) delete('v.wav'); end delete('sound_database.dat'); msgbox('Database was succesfully removed from the current directory.','Database removed','help'); end else warndlg('Database is empty.',' Warning ') end end end end end close all; msgbox('Kindly motivate our efforts. Feel free to provide valuable feedback.','Thank You','help'); end %---------------------------------------------------------------------- ---- function M3 = blockFrames(s, fs, m, n) l = length(s); nbFrame = floor((l - n) / m) + 1; for i = 1:n for j = 1:nbFrame M(i, j) = s(((j - 1) * m) + i); %#ok<AGROW> end end h = hamming(n); M2 = diag(h) * M; for i = 1:nbFrame M3(:, i) = fft(M2(:, i)); %#ok<AGROW> end end %---------------------------------------------------------------------- ---- 44
  • 45. %% MFCC Function function r = mfcc(s, fs) m = 100; n = 256; frame=blockFrames(s, fs, m, n); m = melfb(20, n, fs); n2 = 1 + floor(n / 2); z = m * abs(frame(1:n2, :)).^2; r = dct(log(z)); end %---------------------------------------------------------------------- ---- function r = vqlbg(d,k) e = .01; r = mean(d, 2); dpr = 10000; for i = 1:log2(k) r = [r*(1+e), r*(1-e)]; while (1 == 1) z = disteu(d, r); [m,ind] = min(z, [], 2); t = 0; for j = 1:2^i r(:, j) = mean(d(:, find(ind == j)), 2); %#ok<FNDSB> x = disteu(d(:, find(ind == j)), r(:, j)); %#ok<FNDSB> for q = 1:length(x) t = t + x(q); end end if (((dpr - t)/t) < e) break; else dpr = t; end end end end %---------------------------------------------------------------------- ---- 45