SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1986
A REVIEW ON REPLAY SPOOF DETECTION IN AUTOMATIC SPEAKER
VERIFICATION SYSTEM
Ajila A1, Smitha K S2
1M. Tech Student, Dept. of ECE, LBS Institute for Women, Kerala, India
2Assistant Professor, Dept. of ECE, LBS Institute for Women, Kerala, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Spoof detection in the Automatic Speaker
Verification (ASV) system is an essential problem nowadays.
Among spoofing, replay possesses a greater threat to the ASV
system. This paper presents a survey on spoofing detection
under the case of replay. Replay attacks in theASVsystemlead
to the performance degradation of the entire system. There
have been many methods developed for detectingreplayspoof
in past works. This paper reviews the performance of the best
anti-spoofing techniques used in ASV systems.
Key Words: Automatic Speaker Verification (ASV), Spoof
detection, Replay, Countermeasures.
1. INTRODUCTION
Voice is one of the most important human biometricsused in
everyday communication. Its unique characteristics play a
major role in conveying the identity of an individual. Voice
biometrics is considered as a behavioral characteristic. The
ASV system consists of two major parts namely, speaker
verification and spoof detection system. Speakerverification
accepts or rejects the claimed identity based on speech
sample and spoof detection system checks whether the
speech sample is genuine or spoofed. Like any other
biometrics, ASV is also vulnerable to spoofing attacks [1]. In
an ASV system, nine possible attack points are classified as
direct attacks, also known as spoofing attacks and indirect
attacks. For an indirect attack, the attacker needs access to
the inside of the ASV system. There are mainly five spoofing
attacks namely, impersonation, Voice Conversion (VC),
Speech Synthesis (SS), twins and replay. Impersonation
attacks are made by performing human-altered voices,
where attacker tries to imitate exactly like the target
speaker. In VC attacks, the attacker tries to replicate the
target speaker's voice by using any computer-aided
technologies. SS, often referred to as Text-To-Speech (TTS)
uses a technique where the speech is produced from the
input text. In the twin's attack, the attacker tries to fools the
system by providing a speech sample of his/her twin. The
twin attacks are relatively low when compared with other
spoof attacks. A replay attack is the easiest and simplest
among spoofing since it does not need any computer
expertise or complex algorithms.
The main blocks of an ASV system are pre-processing,
feature extraction, classifier, anddecision.Inpre-processing,
the input raw signals are processed toincreasethe efficiency
in upcoming stages. The pre-processing techniques
commonly used are noise removal, pre-emphasis, etc. The
raw signals are transformed into some sort of parametric
representation in the feature extraction stage. Feature
extraction provides an understandable representation of an
input signal. Commonly used feature extractionmethods are
Constant Q Cepstral Coefficients (CQCC), Mel-Frequency
Cepstral Coefficients (MFCC), etc. When the features are
extracted the next step is to decide whethertheinputspeech
is genuine or spoofed. A classifier helps to do this task.
Gaussian Mixture Model (GMM), Support Vector Machine
(SVM), Convolutional Neural Networks (CNN) are some of
the classifiers employed. Finally, the decision stage, where
the input signal is accepted or rejected by the ASV system.
This paper studies the works related to the ASVspoof 2017
challenge [2]. The baseline system implemented was the
CQCC with GMM [3].
2. EXISTING METHODS OF SPEECH SPOOF
DETECTION
In [4], Tharshini Gunendradasan, Buddhi Wickramasinghe,
Phu Ngoc Le, Eliathamby Ambikairajah and Julien Epps
propose a work which explains the use of spectral centroid
based Frequency Modulation (FM) features which they
called as Spectral Centroid Deviation (SCD) for the replay
attack detection. They also extracted the Spectral Centroid
Magnitude Coefficient (SCMC) featuresfromthefront-endof
SCD along with Spectral Centroid Features (SCF). The work
employs GMM as the back-end classifier.Theyintroduced an
FM feature extraction based onLinearPredictiveCoefficients
(LPC) model and the feature characteristics for genuine and
spoofed speech were examined. An Equal Error Rate (ERR)
of 15.68%, 12.34%, and 11.45% was obtainedforSCMCwith
GMM, SCF with GMM and SCD with GMM systems
respectively. The fusion score of the above three systems
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1987
produced an EER of 9.20%. This work provides an EER
improvement of 60% than the CQCC baseline system.
Prasad A. Tapkir, Ankur T. Patil, Neil Shah, and Hemant A.
Patil in [5], proposed new feature sets called Magnitude
based Spectral RootCepstral Coefficients(MSRCC)andPhase
based Spectral Root Cepstral Coefficients (PSRCC). The
classifiers they opted was GMM along with CNN. They
conducted a study on both development set and evaluation
set with MSRCC and PSRCC with GMM classifiers. An EER of
8.53% and 18.61% is obtained for MSRCC-GMM in the
development set and evaluation set respectively. An EER of
35.53% is obtained for PSRCC-GMM in the development set
and 24.35% is obtained for PSRCC-GMM in the evaluation
set. The fused system MSRCC+PSRCC withGMMgaveanEER
of 6.58% and 10.65% in the development set and evaluation
set respectively. When used CNN as a classifier,MSRCC-CNN
gave an EER of 3.05% in the development set and 24.84% in
the evaluation set. For PSRC-CNN the EER for development
set and evaluation set were 36.21% and 26.81%
respectively. The fused system, MSRCC+PSRCC with CNN
gave an EER of 2.63% in development set and 17.76% in the
evaluation set.
Sarfaraz Jelil, Rohan Kumar Das, S. R. M. Prasanna, and Rohit
Sinha in Spoof Detection Using Source, Instantaneous
Frequency and Cepstral Features [6] uses a combination of
features like glottal closure instants, epoch strength and the
peak to sidelobe ratio of Hilbert envelopeoflinearprediction
residual along with Instantaneous Frequency Cosine
Coefficients (IFCC), CQCC and MFCC. This system used GMM
as a classifier. First, they performed the five individual
feature extraction methods. System 1 (S1) was based on
Epoch Features (EF) calculated from the glottal activity
regions. System 2 (S2) used the features of Peak toSideLobe
Ratio consisting of Mean and Skewness (PSRMS). System 3
(S3), System 4 (S4) and System 5 (S5) are based on IFCC
features, CQCC features, and MFCCfeaturesrespectively. The
EER scores of the evaluation set in the systems S1, S2, S3, S4,
and S5 are 28.66%, 28.90%, 35.19%, 19.58%, and 23.55%
respectively. They also done variousfusionsoftheabovefive
systems. The fusion score of the combined systems (S1 + S2
+ S3 + S4 + S5) gave an EER of 5.31% in development setand
13.95% in evaluation set.
In Audio Replay Attack Detection Using High-Frequency
Features [7], Marcin Witkowski, Stanisław Kacprzak, Piotr
Zelasko, Konrad Kowalczyk, Jakub Gałka proposed a system
by detecting the replay attacks that was found in the high-
frequency band of the replay recordings. Their work was
based on modeling the sub-band spectrum and alsoderiving
features from the linear prediction analysis. The high-
frequency features like Inverse Mel Frequency Cepstral
Coefficients (IMFCC), Linear Prediction Cepstral Coefficients
(LPCC), and Linear Prediction Cepstral Coefficients residual
(LPCCres) are selected for feature extraction and GMM as a
classifier. The work was conducted in various frequency
ranges, ranging from 16 to 8000 Hz. An EER of 4.48% was
obtained for IMFCC, 3.38% for cepstrum and 6.37% for
LPCCres in the evaluation set. A relative reduction in EER of
30% was obtained for the evaluation set.
Galina Lavrentyeva, Sergey Novoselov, Egor Malykh,
Alexander Kozlov, Oleg Kudashev and Vadim Shchemelinin
in their work [8] proposed an anti-spoofing system. They
investigated the efficiency of the deep learning approaches
like CNN and Residual Neural Network (RNN).Thestudywas
based on the SVM i-vector, Light Convolution Neural
Network (LCNN) and the fusion of CNN and RNN systems.
LCNN was conducted in three systems namely, LCNNFFT
which is the truncated Fast FourierTransform(FFT)system,
LCNNCQT which is the Constant Q Transform (CQT) and
LCNNSW
FFT which is the sliding window of the FFT system.
These systems were used to estimate the GMM likelihood
ratio scores. Amongthese,LCNN withtruncatedFFTfeatures
shows the best result with 7.37% EER and the fusion set
system provided 6.73% EER in the evaluation set. These
results show that there is a relative improvement of about
72% of the baseline system.
In [9], Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, and
Ming Li proposeda multiplereplayspoofingcountermeasure
system. With the help of parametric sound reverberatorand
phase shifter, they converted the genuine speech signal into
a replay speech signal then they replaced the general CQCC
input with the spectrogram and this spectrogram is fed as
the input to the deep residual network (ResNet). Fully-
connected Deep Neural Network (FDNN) and Bi-directional
Long-Short Term Memory (BLSTM) are employed as the
classifiers. The BLSTM got an EER of 40.08% and the fusion
score of CQCC-GMM (baseline), DA-CQCC-GMM (augmented
CQCC-GMM) and ResNet gave anEERof16.39%.Thissystem
shows an increment of 26% from the baseline system.
3. CONCLUSIONS
Voice biometrics are used for applications like telephone
banking where security is the key. Since it is vulnerable to
various attacks, it is important to maintain efficient
countermeasures. The ASVspoof 2017 challenge mainly
focuses on the replay attacks in speech. This work aims to
provide a detailed description of various replay spoof
detection methods. The researches show that introducing
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1988
efficient feature extraction and classifier techniques can
make the spoof detection a lot effective.
REFERENCES
[1] Singh, Madhusudan, and Debadatta Pati. "Usefulness of
linear prediction residual for replay attack detection." AEU-
International JournalofElectronicsandCommunications 110:
152837, 2019.
[2] T. Kinnunen et al., "The ASVspoof 2017 Challenge:
Assessing the Limits of Replay Spoofing Attack Detection,"
Proc. in INTERSPEECH, pp.2-6, 2017.
[3] T. Kinnunen et al., "ASVspoof 2017: automatic speaker
verification spoofing and countermeasures challenge
evaluation plan," Training, vol. 10, pp. 1508, 2017.
[4] Tharshini Gunendradasan, BuddhiWickramasinghe,Phu
Ngoc Le, Eliathamby Ambikairajah and Julien Epps,
“Detection of Replay-Spoofing Attacks using Frequency
Modulation Features”, in INTERSPEECH, Hyderabad, pp.
636–640, 2018.
[5] Prasad A Tapkir, Ankur T. Patil, Neil Shah, Hemant A.
Patil, “Novel Spectral Root Cepstral Features for Replay
Spoof Detection, "APSIPA Annual Summit and Conference,
Honolulu, Hawaii, USA, pp. 1945-1950, 2018.
[6] S. Jelil, R. K. Das, S. M. Prasanna, and R. Sinha, “Spoof
detectionusingsource,instantaneousfrequencyandcepstral
features,” in INTERSPEECH, Stockholm, Sweden, pp. 22–26,
2017.
[7] M. Witkowski, S. Kacprzak, P. Å˙zelasko, K. Kowalczyk,
and J. GaÅ´Cka, “Audio replay attack detection using high-
frequency features,” in INTERSPEECH, Stockholm, Sweden,
pp. 27–31, 2017.
[8] Galina Lavrentyeva, Sergey Novoselov, Egor Malykh,
Alexander Kozlov, Oleg Kudashev and Vadim Shchemelinin,
“Audio replay attack detection with deep learning
frameworks”, in INTERSPEECH, Stockholm, Sweden, 2017.
[9] W. Cai, D. Cai, W. Liu, G. Li, and M. Li, “Countermeasures
for automatic speaker verificationreplayspoofingattack:On
data augmentation,featurerepresentation,classificationand
fusion,” in INTERSPEECH, Stockholm, Sweden, pp. 17–21,
2017.

More Related Content

PDF
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
PDF
IRJET- Intrusion Detection using IP Binding in Real Network
PDF
Performance analysis of the convolutional recurrent neural network on acousti...
PDF
Parameter Estimation of Software Reliability Growth Models Using Simulated An...
PDF
IRJET- Machine Learning and Deep Learning Methods for Cybersecurity
PDF
Paper for the Journal of Networks and Systems Management - JNSM 2000
PDF
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...
Robust Fault-Tolerant Training Strategy Using Neural Network to Perform Funct...
IRJET- Intrusion Detection using IP Binding in Real Network
Performance analysis of the convolutional recurrent neural network on acousti...
Parameter Estimation of Software Reliability Growth Models Using Simulated An...
IRJET- Machine Learning and Deep Learning Methods for Cybersecurity
Paper for the Journal of Networks and Systems Management - JNSM 2000
Comparison of Fine-tuning and Extension Strategies for Deep Convolutional Neu...

What's hot (20)

PDF
IRJET- A Secured Method of Data Aggregation for Wireless Sensor Networks in t...
PDF
Voice Signal Synthesis using Non Negative Matrix Factorization
PDF
Design and Implementation of Proportional Integral Observer based Linear Mode...
PDF
Detection of malicious attacks by Meta classification algorithms
PDF
IRJET- Genetic Algorithm based Intrusion Detection-Survey
PDF
Towards an objective comparison of feature extraction techniques for automati...
PDF
D046062030
PDF
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...
PDF
C0413016018
PDF
Ijecet 06 09_010
PPTX
Data fusion
PDF
Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction
PDF
Yolinda chiramba Survey Paper
PDF
Multisensor Data Fusion : Techno Briefing
PPT
Multisensor data fusion for defense application
PDF
IRJET- Lung Cancer Detection using Grey Level Co-Occurrence Matrix
PDF
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...
PPTX
Policy Based reinforcement Learning for time series Anomaly detection
PDF
Sound event detection using deep neural networks
IRJET- A Secured Method of Data Aggregation for Wireless Sensor Networks in t...
Voice Signal Synthesis using Non Negative Matrix Factorization
Design and Implementation of Proportional Integral Observer based Linear Mode...
Detection of malicious attacks by Meta classification algorithms
IRJET- Genetic Algorithm based Intrusion Detection-Survey
Towards an objective comparison of feature extraction techniques for automati...
D046062030
Minkowski Distance based Feature Selection Algorithm for Effective Intrusion ...
C0413016018
Ijecet 06 09_010
Data fusion
Intrusion Detection System Based on K-Star Classifier and Feature Set Reduction
Yolinda chiramba Survey Paper
Multisensor Data Fusion : Techno Briefing
Multisensor data fusion for defense application
IRJET- Lung Cancer Detection using Grey Level Co-Occurrence Matrix
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...
Policy Based reinforcement Learning for time series Anomaly detection
Sound event detection using deep neural networks
Ad

Similar to IRJET - A Review on Replay Spoof Detection in Automatic Speaker Verification System (20)

PDF
Discriminative deep learning based hybrid spectro-temporal features for synth...
PDF
Deep feature synthesis approach using selective graph attention for replay at...
PPTX
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
PDF
Bayesian distance metric learning and its application in automatic speaker re...
PDF
A comparison of different support vector machine kernels for artificial speec...
PPTX
ITIMP40.pptx
PDF
50120140502007
PDF
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
PDF
Adaptive wavelet thresholding with robust hybrid features for text-independe...
PDF
spoofing-overview.pdf
PDF
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
PDF
An overview of ASVspoof by Bhusan Chettri.pdf
PDF
an-overview-of--spoofing-by-Bhusan-Chettri.pdf
PPTX
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
PDF
PDF
High level speaker specific features modeling in automatic speaker recognitio...
PDF
Speaker Identification
PDF
Limited Data Speaker Verification: Fusion of Features
PDF
Classification of vehicles based on audio signals
Discriminative deep learning based hybrid spectro-temporal features for synth...
Deep feature synthesis approach using selective graph attention for replay at...
Galina Lavrentyeva - Anti-spoofing Methods for Automatic Speaker Verification...
Bayesian distance metric learning and its application in automatic speaker re...
A comparison of different support vector machine kernels for artificial speec...
ITIMP40.pptx
50120140502007
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
Adaptive wavelet thresholding with robust hybrid features for text-independe...
spoofing-overview.pdf
SPEAKER VERIFICATION USING ACOUSTIC AND PROSODIC FEATURES
An overview of ASVspoof by Bhusan Chettri.pdf
an-overview-of--spoofing-by-Bhusan-Chettri.pdf
Odyssey 2022: Investigating self-supervised front ends for speech spoofing co...
High level speaker specific features modeling in automatic speaker recognitio...
Speaker Identification
Limited Data Speaker Verification: Fusion of Features
Classification of vehicles based on audio signals
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
DOCX
573137875-Attendance-Management-System-original
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
composite construction of structures.pdf
PPTX
additive manufacturing of ss316l using mig welding
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
web development for engineering and engineering
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
Sustainable Sites - Green Building Construction
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Well-logging-methods_new................
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
OOP with Java - Java Introduction (Basics)
573137875-Attendance-Management-System-original
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Embodied AI: Ushering in the Next Era of Intelligent Systems
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
composite construction of structures.pdf
additive manufacturing of ss316l using mig welding
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
web development for engineering and engineering
Model Code of Practice - Construction Work - 21102022 .pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Safety Seminar civil to be ensured for safe working.
Sustainable Sites - Green Building Construction
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Well-logging-methods_new................
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf

IRJET - A Review on Replay Spoof Detection in Automatic Speaker Verification System

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1986 A REVIEW ON REPLAY SPOOF DETECTION IN AUTOMATIC SPEAKER VERIFICATION SYSTEM Ajila A1, Smitha K S2 1M. Tech Student, Dept. of ECE, LBS Institute for Women, Kerala, India 2Assistant Professor, Dept. of ECE, LBS Institute for Women, Kerala, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Spoof detection in the Automatic Speaker Verification (ASV) system is an essential problem nowadays. Among spoofing, replay possesses a greater threat to the ASV system. This paper presents a survey on spoofing detection under the case of replay. Replay attacks in theASVsystemlead to the performance degradation of the entire system. There have been many methods developed for detectingreplayspoof in past works. This paper reviews the performance of the best anti-spoofing techniques used in ASV systems. Key Words: Automatic Speaker Verification (ASV), Spoof detection, Replay, Countermeasures. 1. INTRODUCTION Voice is one of the most important human biometricsused in everyday communication. Its unique characteristics play a major role in conveying the identity of an individual. Voice biometrics is considered as a behavioral characteristic. The ASV system consists of two major parts namely, speaker verification and spoof detection system. Speakerverification accepts or rejects the claimed identity based on speech sample and spoof detection system checks whether the speech sample is genuine or spoofed. Like any other biometrics, ASV is also vulnerable to spoofing attacks [1]. In an ASV system, nine possible attack points are classified as direct attacks, also known as spoofing attacks and indirect attacks. For an indirect attack, the attacker needs access to the inside of the ASV system. There are mainly five spoofing attacks namely, impersonation, Voice Conversion (VC), Speech Synthesis (SS), twins and replay. Impersonation attacks are made by performing human-altered voices, where attacker tries to imitate exactly like the target speaker. In VC attacks, the attacker tries to replicate the target speaker's voice by using any computer-aided technologies. SS, often referred to as Text-To-Speech (TTS) uses a technique where the speech is produced from the input text. In the twin's attack, the attacker tries to fools the system by providing a speech sample of his/her twin. The twin attacks are relatively low when compared with other spoof attacks. A replay attack is the easiest and simplest among spoofing since it does not need any computer expertise or complex algorithms. The main blocks of an ASV system are pre-processing, feature extraction, classifier, anddecision.Inpre-processing, the input raw signals are processed toincreasethe efficiency in upcoming stages. The pre-processing techniques commonly used are noise removal, pre-emphasis, etc. The raw signals are transformed into some sort of parametric representation in the feature extraction stage. Feature extraction provides an understandable representation of an input signal. Commonly used feature extractionmethods are Constant Q Cepstral Coefficients (CQCC), Mel-Frequency Cepstral Coefficients (MFCC), etc. When the features are extracted the next step is to decide whethertheinputspeech is genuine or spoofed. A classifier helps to do this task. Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Convolutional Neural Networks (CNN) are some of the classifiers employed. Finally, the decision stage, where the input signal is accepted or rejected by the ASV system. This paper studies the works related to the ASVspoof 2017 challenge [2]. The baseline system implemented was the CQCC with GMM [3]. 2. EXISTING METHODS OF SPEECH SPOOF DETECTION In [4], Tharshini Gunendradasan, Buddhi Wickramasinghe, Phu Ngoc Le, Eliathamby Ambikairajah and Julien Epps propose a work which explains the use of spectral centroid based Frequency Modulation (FM) features which they called as Spectral Centroid Deviation (SCD) for the replay attack detection. They also extracted the Spectral Centroid Magnitude Coefficient (SCMC) featuresfromthefront-endof SCD along with Spectral Centroid Features (SCF). The work employs GMM as the back-end classifier.Theyintroduced an FM feature extraction based onLinearPredictiveCoefficients (LPC) model and the feature characteristics for genuine and spoofed speech were examined. An Equal Error Rate (ERR) of 15.68%, 12.34%, and 11.45% was obtainedforSCMCwith GMM, SCF with GMM and SCD with GMM systems respectively. The fusion score of the above three systems
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1987 produced an EER of 9.20%. This work provides an EER improvement of 60% than the CQCC baseline system. Prasad A. Tapkir, Ankur T. Patil, Neil Shah, and Hemant A. Patil in [5], proposed new feature sets called Magnitude based Spectral RootCepstral Coefficients(MSRCC)andPhase based Spectral Root Cepstral Coefficients (PSRCC). The classifiers they opted was GMM along with CNN. They conducted a study on both development set and evaluation set with MSRCC and PSRCC with GMM classifiers. An EER of 8.53% and 18.61% is obtained for MSRCC-GMM in the development set and evaluation set respectively. An EER of 35.53% is obtained for PSRCC-GMM in the development set and 24.35% is obtained for PSRCC-GMM in the evaluation set. The fused system MSRCC+PSRCC withGMMgaveanEER of 6.58% and 10.65% in the development set and evaluation set respectively. When used CNN as a classifier,MSRCC-CNN gave an EER of 3.05% in the development set and 24.84% in the evaluation set. For PSRC-CNN the EER for development set and evaluation set were 36.21% and 26.81% respectively. The fused system, MSRCC+PSRCC with CNN gave an EER of 2.63% in development set and 17.76% in the evaluation set. Sarfaraz Jelil, Rohan Kumar Das, S. R. M. Prasanna, and Rohit Sinha in Spoof Detection Using Source, Instantaneous Frequency and Cepstral Features [6] uses a combination of features like glottal closure instants, epoch strength and the peak to sidelobe ratio of Hilbert envelopeoflinearprediction residual along with Instantaneous Frequency Cosine Coefficients (IFCC), CQCC and MFCC. This system used GMM as a classifier. First, they performed the five individual feature extraction methods. System 1 (S1) was based on Epoch Features (EF) calculated from the glottal activity regions. System 2 (S2) used the features of Peak toSideLobe Ratio consisting of Mean and Skewness (PSRMS). System 3 (S3), System 4 (S4) and System 5 (S5) are based on IFCC features, CQCC features, and MFCCfeaturesrespectively. The EER scores of the evaluation set in the systems S1, S2, S3, S4, and S5 are 28.66%, 28.90%, 35.19%, 19.58%, and 23.55% respectively. They also done variousfusionsoftheabovefive systems. The fusion score of the combined systems (S1 + S2 + S3 + S4 + S5) gave an EER of 5.31% in development setand 13.95% in evaluation set. In Audio Replay Attack Detection Using High-Frequency Features [7], Marcin Witkowski, Stanisław Kacprzak, Piotr Zelasko, Konrad Kowalczyk, Jakub Gałka proposed a system by detecting the replay attacks that was found in the high- frequency band of the replay recordings. Their work was based on modeling the sub-band spectrum and alsoderiving features from the linear prediction analysis. The high- frequency features like Inverse Mel Frequency Cepstral Coefficients (IMFCC), Linear Prediction Cepstral Coefficients (LPCC), and Linear Prediction Cepstral Coefficients residual (LPCCres) are selected for feature extraction and GMM as a classifier. The work was conducted in various frequency ranges, ranging from 16 to 8000 Hz. An EER of 4.48% was obtained for IMFCC, 3.38% for cepstrum and 6.37% for LPCCres in the evaluation set. A relative reduction in EER of 30% was obtained for the evaluation set. Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev and Vadim Shchemelinin in their work [8] proposed an anti-spoofing system. They investigated the efficiency of the deep learning approaches like CNN and Residual Neural Network (RNN).Thestudywas based on the SVM i-vector, Light Convolution Neural Network (LCNN) and the fusion of CNN and RNN systems. LCNN was conducted in three systems namely, LCNNFFT which is the truncated Fast FourierTransform(FFT)system, LCNNCQT which is the Constant Q Transform (CQT) and LCNNSW FFT which is the sliding window of the FFT system. These systems were used to estimate the GMM likelihood ratio scores. Amongthese,LCNN withtruncatedFFTfeatures shows the best result with 7.37% EER and the fusion set system provided 6.73% EER in the evaluation set. These results show that there is a relative improvement of about 72% of the baseline system. In [9], Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, and Ming Li proposeda multiplereplayspoofingcountermeasure system. With the help of parametric sound reverberatorand phase shifter, they converted the genuine speech signal into a replay speech signal then they replaced the general CQCC input with the spectrogram and this spectrogram is fed as the input to the deep residual network (ResNet). Fully- connected Deep Neural Network (FDNN) and Bi-directional Long-Short Term Memory (BLSTM) are employed as the classifiers. The BLSTM got an EER of 40.08% and the fusion score of CQCC-GMM (baseline), DA-CQCC-GMM (augmented CQCC-GMM) and ResNet gave anEERof16.39%.Thissystem shows an increment of 26% from the baseline system. 3. CONCLUSIONS Voice biometrics are used for applications like telephone banking where security is the key. Since it is vulnerable to various attacks, it is important to maintain efficient countermeasures. The ASVspoof 2017 challenge mainly focuses on the replay attacks in speech. This work aims to provide a detailed description of various replay spoof detection methods. The researches show that introducing
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 02 | Feb 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 1988 efficient feature extraction and classifier techniques can make the spoof detection a lot effective. REFERENCES [1] Singh, Madhusudan, and Debadatta Pati. "Usefulness of linear prediction residual for replay attack detection." AEU- International JournalofElectronicsandCommunications 110: 152837, 2019. [2] T. Kinnunen et al., "The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection," Proc. in INTERSPEECH, pp.2-6, 2017. [3] T. Kinnunen et al., "ASVspoof 2017: automatic speaker verification spoofing and countermeasures challenge evaluation plan," Training, vol. 10, pp. 1508, 2017. [4] Tharshini Gunendradasan, BuddhiWickramasinghe,Phu Ngoc Le, Eliathamby Ambikairajah and Julien Epps, “Detection of Replay-Spoofing Attacks using Frequency Modulation Features”, in INTERSPEECH, Hyderabad, pp. 636–640, 2018. [5] Prasad A Tapkir, Ankur T. Patil, Neil Shah, Hemant A. Patil, “Novel Spectral Root Cepstral Features for Replay Spoof Detection, "APSIPA Annual Summit and Conference, Honolulu, Hawaii, USA, pp. 1945-1950, 2018. [6] S. Jelil, R. K. Das, S. M. Prasanna, and R. Sinha, “Spoof detectionusingsource,instantaneousfrequencyandcepstral features,” in INTERSPEECH, Stockholm, Sweden, pp. 22–26, 2017. [7] M. Witkowski, S. Kacprzak, P. Å˙zelasko, K. Kowalczyk, and J. GaÅ´Cka, “Audio replay attack detection using high- frequency features,” in INTERSPEECH, Stockholm, Sweden, pp. 27–31, 2017. [8] Galina Lavrentyeva, Sergey Novoselov, Egor Malykh, Alexander Kozlov, Oleg Kudashev and Vadim Shchemelinin, “Audio replay attack detection with deep learning frameworks”, in INTERSPEECH, Stockholm, Sweden, 2017. [9] W. Cai, D. Cai, W. Liu, G. Li, and M. Li, “Countermeasures for automatic speaker verificationreplayspoofingattack:On data augmentation,featurerepresentation,classificationand fusion,” in INTERSPEECH, Stockholm, Sweden, pp. 17–21, 2017.