SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1950
Speaker Identification & Verification Using MFCC & SVM
Ahmed Sajjad1, Ayesha Shirazi2, Nagma Tabassum3, Mohd Saquib4, Naushad Sheikh5
1Professor, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering & Technology,
Nagpur, Maharashtra, India
2Student of Graduation, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering &
Technology, Nagpur, Maharashtra, India
3 Student of Graduation, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering
& Technology, Nagpur, Maharashtra, India
4Student of Graduation, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering &
Technology, Nagpur, Maharashtra, India
5Student of Graduation, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering &
Technology, Nagpur, Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Speaker recognition is a developing source of
security nowadays. Speaker recognition has a wide scope in
future applications such as voice dialing, database access
services, information services, security control, hospital,
laboratories, industries etc. Speakerrecognition istheprocess
of automatically verifying and identifying the person who is
speaking. This project is used to recognize the person who is
speaking. Speaker recognition has two major parts speaker
identification and speaker verification. The speaker
recognition can be done by using two methods that is text
dependent and text independent. This paper represents
speaker identificationandverificationusingspeechdependent
process. In this process first the features are extracted from
voice and then those features are matched or verified in order
to recognize the speaker. Here for feature extraction we are
using MFCC (Mel Frequency Cepstral Coefficient) technique as
it gives a great performance for making it robust, accurate,
faster and computationally efficient. Alsoforfeaturematching
SVM (Support Vector Machine) is used.
Key Words: Speaker recognition, speaker identification,
speaker verification, text dependent, text independent,
feature extraction, MFCC, SVM
1. INTRODUCTION
The most convenient way of communication since
ancient times is talking i.e. speaking to each other.
Whenever we speak to someone we convey
information in the form of words or voice or to be
prudent sayspeech as this project isrelatedtoit. When
air passes through the vocal tract of a person while
speaking, inhaling etc. the vocal folds reflects this air
which in turn produces speech. Hence speech is
produced due to vibration in vocal system of a human
body. Since every human being has a different vocal
tract they produce a different sounds or speech. The
aim of this project is to identify and hence verify
different speeches or person. This recognition of a
particular person through its speech automatically
using a biometric device is done by using MFCC and
SVM. MFCC and SVM are used as they give maximum
accuracy as compared to LPCC, LPC, HPC, etc. The
human pitch(an important characteristic of human
voice) varies with the change in background
noise(traffic, creeping of birds, unwanted sounds),
human emotions(stress, happiness, envy), human
health problems(cough, cold). These variations are
easily eliminated in MFCC and SVM giving a high
accuracy up to 95%. Also these are easily to workwith.
2.1 Speech Production
Speech is produced with the help of vocal folds. The vocal
system of a human being is responsible for the generationof
speech . The human vocal system consist of nasal cavity,lips,
teeth, glottis, tongue, palate, larynx, etc.
Figure 1: Human Speech Production System
“Speech is produced by air pressure waves emanating
(emitting) from the mouth & the nostrils of a speaker.” As
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1951
defined by Huang et al (2001)[1]. In other words speech is
the ability to express feelings & thoughts by fluent sounds &
gestures.
2.2 Speech Recognition
Speech recognition is nothingbutconveyinginformation toa
computer, having it recognizes what we are saying, and
finally doing this in real time.
Speech recognition has two functions identification
and verification respectively. Speech identification is the
process of identifying the speaker from the data base. It is a
1:N match. The voice given at the input is compared with
voice available in data base until the voice is matched. If the
voice is matches it means the speech is identified from N
database otherwise the output as ‘match not found’. Speech
verification is the process of accepting or rejecting the
identity of a speaker. It is a 1:1 match. This isa linearprocess
where the input voice is checked with only one data and the
result will be obtain as true or false, yes or no.
Figure 2: Speaker identification & verification
process
Speech recognition can be done by using two
processes:
Text Dependent: The text must be same at the time of
feeding (preparing database) and while giving the input for
recognition. This is know as text dependent process. In this
process we can also use phrases or pins.
Text Independent: The process is said to be text
independent, when the text at the time of feeding and
verification is different. In this case there is no restriction
over text
3. FEATURE EXTRACTION
It is the first and very important process of speaker
recognition. It extracts the primary information from the
speech and removes the other unnecessary data like
background noise, other interruptions (stress, emotions,
environmental conditions).
3.1 MFCC
Mel Frequency Cepstral Coefficient was introducedbyDavis
and Mermelstein in the 1980s. MFCC is most popular
technique and commonly used in most of the application of
speech signal for feature extraction[2]. We use MFCC
because it is analogous to human hearing mechanism.
The MFCC consists of five major steps: pre-processing,
windowing, FFT (Fast Fourier Transform), mel-frequency
wrapping and cepstrum. The input signal is given to the
MFCC and we get the desired coefficient known as MFCC.
Speech waveform (Input signal)
Spectrum
Mel spectrum
MFCC
Figure 3: Extraction process of MFCC
Pre-processing:pre-processingincludesfiltering,filteringis
converting the given voice signal in a form which is suitable
for the computer. Pre-processing is segregating the voice
part from the unvoiced part.
Windowing: It is usedfor minimizingthespectral distortion.
For this we are using hamming window which is set to make
frame blocking at 20-25 ms in order to achieve a stationary
behavior. Hamming window provides continuity at the
beginning and end of the each frame. It provides a better
frequency resolution. The result of windowing is given as
Y(n) = X(n) x w(n)
Speaker
identification
Speaker 1
Speaker 2
Speaker 3
Speaker 4
Is this Speaker 1?Verification
Pre-processing
Windowing
FFT
Mel-frequency wrapping
Cepstrum
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1952
Where,
Y(n) – output signal
X(n) – input signal
w(n) – hamming window
FFT (Fast Fourier Transform): FFT is the most important
step of MFCC is to construct the fast fourier transform of
each frame which extract componentsfromthesignalsatthe
rate of 10 ms. Fast fourier transform converts each N
number of samples from time domain to frequency domain.
The sizes of FFT are 512, 1024, 2048. It is used to obtain
magnitude frequency response.
Mel-Frequency Wrapping: According to a psychological
survey human presentation of frequency content of voice or
speech is not proportional or can say does notfollowa linear
scale. For measurement of different pitch mel scale is used.
“One mel is defined as one thousands of the pitch of a 1kHz
tone[1].” Mel scale frequency can be approximated by
equation:
B(f) = 2595 log10 (1 + f / 700)
The simulation of spectrum is done by using filter bank. The
triangular band pass frequency response is use as a filter
bank. The position of filter bank is equally spaced by using
mel-scale.
Cepstrum: The final step of MFCC is cepstrum in this step,
Mel spectrumcoefficients areconverted into timedomainby
using DCT (Discrete Cosine Transform). The result will be
obtained as MFCC.
4. FEATURE MATCHING
Feature matching is the process of identifying feature from
two similar database. One knows as source and the other
known as target.
4.1 SVM
SVM was developed by Vapinik in 1998. It is one of the most
important developments in patternrecognitioninthelast10
years[3]. As this technique givesmoreaccuracyascompared
to techniques like neural network, vector quantization etc.
Figure 4: Linear Support Vector Machine
SVM is a simple and effective algorithm. It is a linear
classifier[4] i.e. it can contain only two components ata time
and gives a proportional output. Also it can be known as a
comparator as it has a binary output it gives output as yes or
no, accept or reject, 0 or 1 etc. In this project, we are using
more than two components for better efficiency, hence we
are using N number of SVM.
5. CONCLUSIONS
This paper describes a procedure for speaker recognition
using MFCC and SVM. MFCC is used for feature extraction
whereas SVM is used forfeatureverification.Theimportance
of MFCC and SVM and why they are widely used is properly
described in this paper. Instead of SVM techniques like
GMM(Gaussian Mixture Model) and HMM(Hidden Markov
Model) can be used in future as they are easier to use,
require less data and gives better accuracy. The future
application of this project are voice dialing in mobilephones
and telephones, hands free dialing in Wireless Bluetooth
headsets, biometric login to telephone aided shopping
systems and numeric entry modules.
ACKNOWLEDGEMENT
I would like to thank Prof. Dr. Ahmed Sajjad Khan for the
giving the idea of this project.
REFERENCES
[1] Huang, X, Acero, A. & Hon, H. “Spoken language
processing – A guide to theory, algorithm, prentice hall
PTR”, New Jersey (2001).
[2] Jyoti B. Ramgire and Prof. Sumati M. Jagdale, “A survey
on speaker recognition with various feature extraction
and classification techniques”, IRJET, Volume 3, Issue 4,
April 2016, pp. 709-712.
[3] Geeta Nijhawan and M.K. Soni, “Speaker recognition
using support vector machine”, International Journal of
Computer Application, Volume 87-No.2, February 2014.
[4] Simon Haykin, McMaster University, Hamilton, Ontario,
Canada, Neural Networks a ComprehensiveFoundation,
2nd edition, pp. 256-347.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1953
BIOGRAPHIES
Prof. Dr. Ahmed Sajjad Khan
(PhD-Cellular Automata Modelling
and processing speech signal)
Ayesha Shirazi
Graduation Student
Nagpur University
Nagma Tabassum Shekh
Graduation Student
Nagpur University
Mohammad Saquib
Graduation Student
Nagpur University
Naushad Sheikh
Graduation Student
Nagpur University

More Related Content

PPTX
SPEAKER VERIFICATION
PPTX
Speaker identification
PDF
D04812125
PDF
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
PPTX
Speech Signal Analysis
PDF
Speaker Recognition System using MFCC and Vector Quantization Approach
PDF
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
PPT
Automatic speech recognition
SPEAKER VERIFICATION
Speaker identification
D04812125
Comparison of Feature Extraction MFCC and LPC in Automatic Speech Recognition...
Speech Signal Analysis
Speaker Recognition System using MFCC and Vector Quantization Approach
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
Automatic speech recognition

What's hot (20)

PDF
Speaker identification using mel frequency
PPT
Environmental Sound detection Using MFCC technique
PDF
IRJET- Voice based Gender Recognition
PDF
On the use of voice activity detection in speech emotion recognition
PPT
multirate signal processing for speech
PDF
Hindi digits recognition system on speech data collected in different natural...
PDF
Wavelet Based Noise Robust Features for Speaker Recognition
PPTX
COLEA : A MATLAB Tool for Speech Analysis
PDF
Real Time Speaker Identification System – Design, Implementation and Validation
DOCX
speech enhancement
PDF
Dy36749754
PDF
Course report-islam-taharimul (1)
PPTX
Speaker recognition in android
PPTX
Text-Independent Speaker Verification
PDF
Speech Recognition No Code
DOC
Speaker recognition.
PDF
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
PDF
Human Emotion Recognition From Speech
PPTX
Speech Signal Processing
PDF
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
Speaker identification using mel frequency
Environmental Sound detection Using MFCC technique
IRJET- Voice based Gender Recognition
On the use of voice activity detection in speech emotion recognition
multirate signal processing for speech
Hindi digits recognition system on speech data collected in different natural...
Wavelet Based Noise Robust Features for Speaker Recognition
COLEA : A MATLAB Tool for Speech Analysis
Real Time Speaker Identification System – Design, Implementation and Validation
speech enhancement
Dy36749754
Course report-islam-taharimul (1)
Speaker recognition in android
Text-Independent Speaker Verification
Speech Recognition No Code
Speaker recognition.
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Human Emotion Recognition From Speech
Speech Signal Processing
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
Ad

Similar to Speaker Identification & Verification Using MFCC & SVM (20)

PDF
A Review On Speech Feature Techniques And Classification Techniques
PDF
Intelligent Arabic letters speech recognition system based on mel frequency c...
PDF
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
PDF
Voice Signal Synthesis using Non Negative Matrix Factorization
PDF
A comparison of different support vector machine kernels for artificial speec...
PDF
Speech Recognized Automation System Using Speaker Identification through Wire...
PDF
Speech Recognized Automation System Using Speaker Identification through Wire...
PDF
Ijetcas14 426
PDF
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
PDF
A survey on Enhancements in Speech Recognition
PDF
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
PDF
IRJET- Emotion recognition using Speech Signal: A Review
PDF
ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION
DOCX
Voice biometric recognition
DOC
Speaker recognition on matlab
PDF
Av4103298302
PDF
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
PDF
Voice Recognition Based Automation System for Medical Applications and for Ph...
PDF
Voice Recognition Based Automation System for Medical Applications and for Ph...
PDF
50120140502007
A Review On Speech Feature Techniques And Classification Techniques
Intelligent Arabic letters speech recognition system based on mel frequency c...
IRJET- Device Activation based on Voice Recognition using Mel Frequency Cepst...
Voice Signal Synthesis using Non Negative Matrix Factorization
A comparison of different support vector machine kernels for artificial speec...
Speech Recognized Automation System Using Speaker Identification through Wire...
Speech Recognized Automation System Using Speaker Identification through Wire...
Ijetcas14 426
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
A survey on Enhancements in Speech Recognition
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
IRJET- Emotion recognition using Speech Signal: A Review
ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION
Voice biometric recognition
Speaker recognition on matlab
Av4103298302
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
Voice Recognition Based Automation System for Medical Applications and for Ph...
Voice Recognition Based Automation System for Medical Applications and for Ph...
50120140502007
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Welding lecture in detail for understanding
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Digital Logic Computer Design lecture notes
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Sustainable Sites - Green Building Construction
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
web development for engineering and engineering
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Automation-in-Manufacturing-Chapter-Introduction.pdf
CH1 Production IntroductoryConcepts.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Welding lecture in detail for understanding
CYBER-CRIMES AND SECURITY A guide to understanding
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Digital Logic Computer Design lecture notes
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Sustainable Sites - Green Building Construction
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
web development for engineering and engineering
Model Code of Practice - Construction Work - 21102022 .pdf

Speaker Identification & Verification Using MFCC & SVM

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1950 Speaker Identification & Verification Using MFCC & SVM Ahmed Sajjad1, Ayesha Shirazi2, Nagma Tabassum3, Mohd Saquib4, Naushad Sheikh5 1Professor, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering & Technology, Nagpur, Maharashtra, India 2Student of Graduation, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering & Technology, Nagpur, Maharashtra, India 3 Student of Graduation, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering & Technology, Nagpur, Maharashtra, India 4Student of Graduation, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering & Technology, Nagpur, Maharashtra, India 5Student of Graduation, Dept. of Electronics & Telecommunication Engineering, Anjuman College of Engineering & Technology, Nagpur, Maharashtra, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Speaker recognition is a developing source of security nowadays. Speaker recognition has a wide scope in future applications such as voice dialing, database access services, information services, security control, hospital, laboratories, industries etc. Speakerrecognition istheprocess of automatically verifying and identifying the person who is speaking. This project is used to recognize the person who is speaking. Speaker recognition has two major parts speaker identification and speaker verification. The speaker recognition can be done by using two methods that is text dependent and text independent. This paper represents speaker identificationandverificationusingspeechdependent process. In this process first the features are extracted from voice and then those features are matched or verified in order to recognize the speaker. Here for feature extraction we are using MFCC (Mel Frequency Cepstral Coefficient) technique as it gives a great performance for making it robust, accurate, faster and computationally efficient. Alsoforfeaturematching SVM (Support Vector Machine) is used. Key Words: Speaker recognition, speaker identification, speaker verification, text dependent, text independent, feature extraction, MFCC, SVM 1. INTRODUCTION The most convenient way of communication since ancient times is talking i.e. speaking to each other. Whenever we speak to someone we convey information in the form of words or voice or to be prudent sayspeech as this project isrelatedtoit. When air passes through the vocal tract of a person while speaking, inhaling etc. the vocal folds reflects this air which in turn produces speech. Hence speech is produced due to vibration in vocal system of a human body. Since every human being has a different vocal tract they produce a different sounds or speech. The aim of this project is to identify and hence verify different speeches or person. This recognition of a particular person through its speech automatically using a biometric device is done by using MFCC and SVM. MFCC and SVM are used as they give maximum accuracy as compared to LPCC, LPC, HPC, etc. The human pitch(an important characteristic of human voice) varies with the change in background noise(traffic, creeping of birds, unwanted sounds), human emotions(stress, happiness, envy), human health problems(cough, cold). These variations are easily eliminated in MFCC and SVM giving a high accuracy up to 95%. Also these are easily to workwith. 2.1 Speech Production Speech is produced with the help of vocal folds. The vocal system of a human being is responsible for the generationof speech . The human vocal system consist of nasal cavity,lips, teeth, glottis, tongue, palate, larynx, etc. Figure 1: Human Speech Production System “Speech is produced by air pressure waves emanating (emitting) from the mouth & the nostrils of a speaker.” As
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1951 defined by Huang et al (2001)[1]. In other words speech is the ability to express feelings & thoughts by fluent sounds & gestures. 2.2 Speech Recognition Speech recognition is nothingbutconveyinginformation toa computer, having it recognizes what we are saying, and finally doing this in real time. Speech recognition has two functions identification and verification respectively. Speech identification is the process of identifying the speaker from the data base. It is a 1:N match. The voice given at the input is compared with voice available in data base until the voice is matched. If the voice is matches it means the speech is identified from N database otherwise the output as ‘match not found’. Speech verification is the process of accepting or rejecting the identity of a speaker. It is a 1:1 match. This isa linearprocess where the input voice is checked with only one data and the result will be obtain as true or false, yes or no. Figure 2: Speaker identification & verification process Speech recognition can be done by using two processes: Text Dependent: The text must be same at the time of feeding (preparing database) and while giving the input for recognition. This is know as text dependent process. In this process we can also use phrases or pins. Text Independent: The process is said to be text independent, when the text at the time of feeding and verification is different. In this case there is no restriction over text 3. FEATURE EXTRACTION It is the first and very important process of speaker recognition. It extracts the primary information from the speech and removes the other unnecessary data like background noise, other interruptions (stress, emotions, environmental conditions). 3.1 MFCC Mel Frequency Cepstral Coefficient was introducedbyDavis and Mermelstein in the 1980s. MFCC is most popular technique and commonly used in most of the application of speech signal for feature extraction[2]. We use MFCC because it is analogous to human hearing mechanism. The MFCC consists of five major steps: pre-processing, windowing, FFT (Fast Fourier Transform), mel-frequency wrapping and cepstrum. The input signal is given to the MFCC and we get the desired coefficient known as MFCC. Speech waveform (Input signal) Spectrum Mel spectrum MFCC Figure 3: Extraction process of MFCC Pre-processing:pre-processingincludesfiltering,filteringis converting the given voice signal in a form which is suitable for the computer. Pre-processing is segregating the voice part from the unvoiced part. Windowing: It is usedfor minimizingthespectral distortion. For this we are using hamming window which is set to make frame blocking at 20-25 ms in order to achieve a stationary behavior. Hamming window provides continuity at the beginning and end of the each frame. It provides a better frequency resolution. The result of windowing is given as Y(n) = X(n) x w(n) Speaker identification Speaker 1 Speaker 2 Speaker 3 Speaker 4 Is this Speaker 1?Verification Pre-processing Windowing FFT Mel-frequency wrapping Cepstrum
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1952 Where, Y(n) – output signal X(n) – input signal w(n) – hamming window FFT (Fast Fourier Transform): FFT is the most important step of MFCC is to construct the fast fourier transform of each frame which extract componentsfromthesignalsatthe rate of 10 ms. Fast fourier transform converts each N number of samples from time domain to frequency domain. The sizes of FFT are 512, 1024, 2048. It is used to obtain magnitude frequency response. Mel-Frequency Wrapping: According to a psychological survey human presentation of frequency content of voice or speech is not proportional or can say does notfollowa linear scale. For measurement of different pitch mel scale is used. “One mel is defined as one thousands of the pitch of a 1kHz tone[1].” Mel scale frequency can be approximated by equation: B(f) = 2595 log10 (1 + f / 700) The simulation of spectrum is done by using filter bank. The triangular band pass frequency response is use as a filter bank. The position of filter bank is equally spaced by using mel-scale. Cepstrum: The final step of MFCC is cepstrum in this step, Mel spectrumcoefficients areconverted into timedomainby using DCT (Discrete Cosine Transform). The result will be obtained as MFCC. 4. FEATURE MATCHING Feature matching is the process of identifying feature from two similar database. One knows as source and the other known as target. 4.1 SVM SVM was developed by Vapinik in 1998. It is one of the most important developments in patternrecognitioninthelast10 years[3]. As this technique givesmoreaccuracyascompared to techniques like neural network, vector quantization etc. Figure 4: Linear Support Vector Machine SVM is a simple and effective algorithm. It is a linear classifier[4] i.e. it can contain only two components ata time and gives a proportional output. Also it can be known as a comparator as it has a binary output it gives output as yes or no, accept or reject, 0 or 1 etc. In this project, we are using more than two components for better efficiency, hence we are using N number of SVM. 5. CONCLUSIONS This paper describes a procedure for speaker recognition using MFCC and SVM. MFCC is used for feature extraction whereas SVM is used forfeatureverification.Theimportance of MFCC and SVM and why they are widely used is properly described in this paper. Instead of SVM techniques like GMM(Gaussian Mixture Model) and HMM(Hidden Markov Model) can be used in future as they are easier to use, require less data and gives better accuracy. The future application of this project are voice dialing in mobilephones and telephones, hands free dialing in Wireless Bluetooth headsets, biometric login to telephone aided shopping systems and numeric entry modules. ACKNOWLEDGEMENT I would like to thank Prof. Dr. Ahmed Sajjad Khan for the giving the idea of this project. REFERENCES [1] Huang, X, Acero, A. & Hon, H. “Spoken language processing – A guide to theory, algorithm, prentice hall PTR”, New Jersey (2001). [2] Jyoti B. Ramgire and Prof. Sumati M. Jagdale, “A survey on speaker recognition with various feature extraction and classification techniques”, IRJET, Volume 3, Issue 4, April 2016, pp. 709-712. [3] Geeta Nijhawan and M.K. Soni, “Speaker recognition using support vector machine”, International Journal of Computer Application, Volume 87-No.2, February 2014. [4] Simon Haykin, McMaster University, Hamilton, Ontario, Canada, Neural Networks a ComprehensiveFoundation, 2nd edition, pp. 256-347.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02 | Feb -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1953 BIOGRAPHIES Prof. Dr. Ahmed Sajjad Khan (PhD-Cellular Automata Modelling and processing speech signal) Ayesha Shirazi Graduation Student Nagpur University Nagma Tabassum Shekh Graduation Student Nagpur University Mohammad Saquib Graduation Student Nagpur University Naushad Sheikh Graduation Student Nagpur University