International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1212
ADVERSE ENVIRONMENT AND ITS COMPENSATION APPROACHES IN
SPEECH RECOGNITION
Saumya Sah1
1Department of Electronics and Communication Engineering,Birla Institute of Technology, Mesra, Ranchi, India
-----------------------------------------------------------------------***--------------------------------------------------------------------
ABSTRACT: The paper focuses on the causes of the low
performances of the speech recognition system due to the
stressful and noisy environment in real time applications
like car, airplane cockpit, etc. This paper deals with the
major problems like Adverse Environment and other issues
that is noise, distortion of speech recognition system which
causes the failure in real time applications. The adverse
environment is simply the mismatched between the
environments of training and testing of the speech signal.
The paper briefly summarized different approaches to
deals with the adverse environment.
Key Words: Cause of speech variation, Noise, Source of
Noise, Distortion, Adverse Environment.
1. INTRODUCTION
The voice of the human being is the primary mode of
communication among humans. The researchers are
working to improve the accuracy level so that the voice
or speech could be use in real time applications [1].
Speech Recognition is the process of enabling a
computer to identify and respond to the sounds
produced by human in their speech. Speech recognition
system often fails to provide an excellent accuracy level
in real applications whereas it provides a high accuracy
in experimental level. This is mainly occurring due to the
speech recognition system which is trained in normal
and clean environment but applied in stressful and noisy
environment [2]. Most of the speech recognition system
degrades rapidly in their performance due to the
presence of noise and distortion in the speech signal. The
speech recognition systems are designed by keeping in
mind the prediction or assumptions of the ambient
conditions, like background speech style, transducers,
low noise, channels etc. [2, 3]. The different causes of the
variation in speech due to which the distortion occurred
are discussed in below section [4].
1.1 Causes of Speech Variation:
Few causes of speech variations are discussed below.
Mainly there are three parameters over which the
speech of human can be varied from one another and
could produce noise in the speech.
These are:
a) Environment:
It is the acoustic properties of the environment including
the impact on a speaker’s voice [5]. There are two type of
environment noise: Speech Correlated noise and
uncorrelated noise.
b) Speaker
Speaker itself could be the cause of variation in speech.
There may be two reasons for this noise to produce.
These are: Attributes of speakers which includes
dialect, gender, age and the other one is Manner of
speaking which includes breath and lip noise, stress,
Lombard effect, rate of delivery, level and dynamic
range, pitch, cooperativeness.
c) Input Equipment
Input Equipments are used to deliver human voice to a
place distant to the user are one of the cause of speech
variation. These are due to microphone, distance to
microphone, filter, recording equipment and
transmission system which includes channel distortion,
noise, and echo.
2. ADVERSE ENVIRONMENT
The use of the term “adverse environments” implies
unknown, mismatched and often severe differences in
variables and conditions between training and testing. A
speech recognizer often encounters three main causes of
adverse conditions: noise, distortion, and (human)
articulation effects.
2.1 Noise
By nature, noise is usually considered additive. It can be
wideband (assumed to be Gaussian distributed and
white) or coloured, stationary or non-stationary. High
levels of ambient noise are one of the primary concerns
for speech recognizer.
2.2 Source of Noise
The source of additive noise can be found in several
environments including the office, car and the aircraft
cockpit.
The sound pressure level (SPL) in a normal personal
office is around 45-50dBA. In a business office it is
around 15-20dB higher than the personal office. Inside
an automobile, the noise level due to engine, cooling fan,
wind tire and road is usually considerably high,
particularly when the automobile is moving at a high
speed. In the cockpit of a modern jet fighter aircraft, SPLs
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1213
of 90dB or more across the speech frequency band have
been reported.
There are many type of noises like electric noise and
quantization noise, which are present inside of any
electronic speech recognition system, are at a low level
below the concern threshold. Noise due to transmissions
and switching equipments in a telephone network is a
factor which is affecting recognizer performance. At the
noise level of equal to the noise level of cockpit, the
speech signal is barely intelligible even to a human
listener, not to mention an automatic speech recognition
machine.
2.3 Distortion
The speech signal before recorded and processed for
speech recognition, undergoes a series of spectral
distortions. The microphone transducer can also distort
the speech spectrum depending on its type and
mounting positions. The transducer configuration used
in testing is different from that used during training of
the reference patterns, the mismatch in spectral
distortion becomes one of the major problems.
It was reported that a large vocabulary speech
recognition system with a performance of 85% word
accuracy in a matched transducer condition could only
achieve less than 19% word accuracy when a different
microphone was used during testing.
Telephones channels are characterized by a high
frequency pre emphasis with significantly decrease the
dynamic range of the speech signal and increase the
signal correlation. There is a wide range of variation in
attenuation in the frequency spectrum and this can cause
a spectral mismatch unless the telephone channel is
measured and adapted to beforehand. Figure below
compares the amplitude of frequency response for two
different telephone channels. The distance between the
microphone and the speaker’s mouth can cause a poor
SNR and the acoustics of the room or the microphone
may be differ from the original set up in which the
reference patterns were created.
Fig-1: Frequency Response for two different telephone
channels.
Fig-2: Frequency Response for two different microphone
transducers.
2.4 Articulation Effects
There are many factors which affect the manner of
speaking of individual talker. The main difficultly in
dealing with the articulation effects is the lack of
understanding that how to quantify them.
In terms of spectral characteristics, we can measure or
model the effects to improve the recognizer design with
noise or channel distortion which usually does not vary
as rapidly as the speech itself. For example, a white noise
is usually specified by its power level, where as a channel
is often represented by its frequency response.
2.5 Lombard Effect
Lombard Effect is the characteristics changes in
articulation due to environmental influence. The
Lombard Effect is manifested in such changes as:
1. Louder speech (increases energy).
2. Energy migration in the frequency domain, F1
increases while F2 decreases.
3. Changes in spectral balance.
A speaker dependent isolated word recognizer that had
accuracy of 91% when trained and tested in clean
condition could only achieve accuracy of 62% when the
test utterances contained the Lombard Effect, even
though the test tokens were free from masking noise.
3. DEALING WITH ADVERSE EFFECTS
A speech recognizer that uses reference patterns trained
from speech with the corrupting noise whose
characteristics are approximately unknown performs
more robustly than one that uses clean reference
patterns. The key idea is to train the recognizer with a
multi-style training schemes in which speech signals of
talker of different talking styles are used as the training
data.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1214
In an isolated word recognition experiment using hidden
Markov models had reported that the error rate of a
recognizer, when tested with the speech of talkers
having various talking styles was reduced by more than a
factor of 2, from 17.5% using only normally spoken
reference pattern to 6.9% with multi style training.
Multi-style training seemed to be most effective with the
speech which exhibits the Lombard effect and with
speech produced under high emotional conditions.
Some of the methods and algorithm that have been
proposed to combat the unknown environment in which
speech recognizer must operate:
1. Signal enhancement pre-processing
When the adverse condition is alone due to the additive
noise, one can use speech enhancement methods to
suppress the noise before applying the recognition
algorithms.
One of the most widely used signal enhancement method
is adaptive noise cancellation using two signal sources.
Fig-3: Schematic of the two input noise cancelling
approach.
In the above figure x is the clean signal whereas y1 and y2
are two noise signals and z is the noisy signal. The
technique adaptively adjusts the filter coefficient so that
on subtracting the filtered (noise) reference input from
the primary (speech plus noise) input leads to an output
signal with minimum energy. This technique requires
noise component in the corrupted signal and the
reference noise have high coherence.
2. Special transducer arrangement
A noise cancelling microphone is a specially designed
microphone in which both sides of the diaphragm are
exposed to the sound field in such a way that sounds
coming from a relatively large distance (far field) are
cancelled because the sound pressure causes virtually no
net force on the diaphragm. For sound sources close to
the Microphone, the back of the diaphragm is effectively
shaded from the sound field and the sound pressure is
received only by the front of the diaphragm. It can be
effective in suppressing low frequency noises and can be
used in Car, automobile or aircraft cockpit environment.
3. Noise masking and adaptive models
In the presence of broadband noise, certain regions of
the speech spectrum which are of lower level will be
more affected by the noise. This makes the calculation of
the spectral distortion difficult as more corrupted
regions represent less reliable spectral measurements.
By recognizing this difficulty, the use of noise masking in
conjunction with a filter-bank analyser is suggested. The
key idea is to choose first, for each channel of the filter-
bank output, the masking noise level is the greater of the
noise level in the reference signal and that in the testing
signal, and then replace that channel output by the mask
value if it is below the corresponding mask level. This
helps prevent spurious distortion accumulation because
those channels that are determined to have been
seriously corrupted by noise will have the same spectral
value in both the training and the testing tokens. The
technique of noise compensation was also employed by
adapting the spectral prototypes to the noise condition
in the autocorrelation domain. The underlying
assumption, unlike the above masking model, was that
the power spectra of speech and noise are additive and
so are the autocorrelations.
4. Stress compensation
The purpose of stress compensation is to provide offset
for the spectral distortion caused by extraordinary
speaking effort due to the talker’s reaction to ambient
conditions. Because of the difficulties in modeling such
characteristic changes, no analytical result is available.
Several proposed heuristic techniques are noteworthy
because of their effectiveness. A traditional template-
based system that incorporates both vector quantization
and dynamic time warping. Another stress compensation
technique is operating in the cepstral domain.
5. Robust distortion measures
The idea of robust distortion measures is to emphasize
selectively and automatically the distortion pertaining to
certain regions of the spectrum that are less corrupted
by noise. A noise compensation scheme can be
interpreted as implicitly defining a robust distortion.
Many distortion measures are there:
1. Cepstral distance
2. Likelihood ratio distortion
3. Weighted Likelihood Ratio distortion
4. Asymmetrically Weighted Likelihood Ratio distortion
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1215
6. Novel representation of speech
Apart from the above discussed schemes and algorithms,
there are other attempts to compensate the noise
problem by finding characteristic representations of
speech that are invariant or resistant to noise
corruption. In other words, the approach is to find a
robust “front-end” that will reliably produce
measurements of speech even in the presence of noise.
There are two attempts; one is from a signal processing
viewpoint and the other tries to duplicate the human
auditory capability.
IV CONCLUSION
Different approaches and algorithms have been
proposed to cope with ambient noise, distortions and
other noise-induced problems in speech recognition
system. These approaches can be categorized as: signal
enhancement preprocessing, special transducer
arrangements, noise masking, stress compensation,
robust distortion measures and novel representations of
speech. Proper adoption of these methods proves
beneficial when dealing with noisy recognition problems.
V REFERENCES
[1]. Chavan, Karishma, and Ujwalla Gawande.
"Speech recognition in noisy environment,
issues and challenges: A review." In Soft-
Computing and Networks Security (ICSNS),
2015 International Conference on, pp. 1-5. IEEE,
2015.
[2]. Rajasekaran, P., G. Doddington, and J. Picone.
"Recognition of speech under stress and in
noise." In Acoustics, Speech, and Signal
Processing, IEEE International Conference on
ICASSP'86., vol. 11, pp. 733-736. IEEE, 1986.
[3]. Starks, David Ross. Speech recognition in
adverse environments: Improvements to
IMELDA. University of Ottawa (Canada), 1995.
[4]. Rabiner, Lawrence R., and Biing-Hwang Juang.
Fundamentals of speech recognition. Vol. 14.
Englewood Cliffs: PTR Prentice Hall, 1993.
[5]. Shrawankar, Urmila, and Vilas M. Thakare.
"Adverse conditions and ASR techniques for
robust speech user interface." arXiv preprint
arXiv:1303.5515 (2013)
[6]. Juang, B. H. "Speech recognition in adverse
environments." Computer speech & language 5,
no. 3 (1991): 275-294.
[7]. Rahim, Mazin G., Biing-Hwang Juang, Wu Chou,
and Eric Buhrke. "Signal conditioning
techniques for robust speech recognition." IEEE
Signal Processing Letters 3, no. 4 (1996): 107-
109.

More Related Content

PDF
IRJET- Survey on Efficient Signal Processing Techniques for Speech Enhancement
PDF
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
PDF
50120140502007
PDF
Implementation of Digital Hearing AID for Sensory Neural Impairment
PDF
Implementation of a Digital Hearing Aid with User-Settable Frequency Response...
PDF
A Survey On Audio Watermarking Methods
PDF
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
PDF
IRJET- A Review on Microstrip Filter for the Application in Communication System
IRJET- Survey on Efficient Signal Processing Techniques for Speech Enhancement
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
50120140502007
Implementation of Digital Hearing AID for Sensory Neural Impairment
Implementation of a Digital Hearing Aid with User-Settable Frequency Response...
A Survey On Audio Watermarking Methods
A NOVEL METHOD FOR OBTAINING A BETTER QUALITY SPEECH SIGNAL FOR COCHLEAR IMPL...
IRJET- A Review on Microstrip Filter for the Application in Communication System

What's hot (6)

PDF
Coad_M_GoForth2Column
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
DOC
Types of audio conferencing and its uses
PPTX
Video tele conferencing
PDF
LNCS 5448 - A First Look at Media Conferencing Traffic in the ...
PDF
Adaptive wavelet thresholding with robust hybrid features for text-independe...
Coad_M_GoForth2Column
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
Types of audio conferencing and its uses
Video tele conferencing
LNCS 5448 - A First Look at Media Conferencing Traffic in the ...
Adaptive wavelet thresholding with robust hybrid features for text-independe...
Ad

Similar to IRJET- Adverse Environment and its Compensation Approaches in Speech Recognition (20)

PDF
Review Paper on Noise Reduction Using Different Techniques
PDF
01 8445 speech enhancement
PDF
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
PDF
Kc3517481754
PDF
Effect of Speech enhancement using spectral subtraction on various noisy envi...
PDF
WEARABLE VIBRATION-BASED DEVICE FOR HEARING-IMPAIRED PEOPLE USING ACOUSTIC SC...
PDF
seminar
PDF
Performance enhancement of dct based speaker recognition using wavelet de noi...
PDF
Speaker Recognition System using MFCC and Vector Quantization Approach
PDF
PDF
PDF
IRJET- A Review for Reduction of Noise by Wavelet Transform in Audio Signals
PDF
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
PDF
A Review on Speech Enhancement System using FIR Filter
PDF
IRJET- Assessment of Vehicular Noise Pollution at Different Squares in Jabalp...
PDF
Dy36749754
PDF
IRJET- Assessment of Noise Pollution in Commercial and Residential Areas and ...
PDF
Teleconferencing plugin for noisy environment
PDF
Nd2421622165
PDF
article_EURASIP
Review Paper on Noise Reduction Using Different Techniques
01 8445 speech enhancement
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
Kc3517481754
Effect of Speech enhancement using spectral subtraction on various noisy envi...
WEARABLE VIBRATION-BASED DEVICE FOR HEARING-IMPAIRED PEOPLE USING ACOUSTIC SC...
seminar
Performance enhancement of dct based speaker recognition using wavelet de noi...
Speaker Recognition System using MFCC and Vector Quantization Approach
IRJET- A Review for Reduction of Noise by Wavelet Transform in Audio Signals
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
A Review on Speech Enhancement System using FIR Filter
IRJET- Assessment of Vehicular Noise Pollution at Different Squares in Jabalp...
Dy36749754
IRJET- Assessment of Noise Pollution in Commercial and Residential Areas and ...
Teleconferencing plugin for noisy environment
Nd2421622165
article_EURASIP
Ad

More from IRJET Journal (20)

PDF
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
PDF
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
PDF
Kiona – A Smart Society Automation Project
PDF
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
PDF
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
PDF
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
PDF
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
PDF
BRAIN TUMOUR DETECTION AND CLASSIFICATION
PDF
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
PDF
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
PDF
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
PDF
Breast Cancer Detection using Computer Vision
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
PDF
Auto-Charging E-Vehicle with its battery Management.
PDF
Analysis of high energy charge particle in the Heliosphere
PDF
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Kiona – A Smart Society Automation Project
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
BRAIN TUMOUR DETECTION AND CLASSIFICATION
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Breast Cancer Detection using Computer Vision
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
Auto-Charging E-Vehicle with its battery Management.
Analysis of high energy charge particle in the Heliosphere
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...

Recently uploaded (20)

PPTX
Software Engineering and software moduleing
PPTX
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Feature types and data preprocessing steps
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PDF
Design Guidelines and solutions for Plastics parts
PDF
Soil Improvement Techniques Note - Rabbi
PDF
August -2025_Top10 Read_Articles_ijait.pdf
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
Information Storage and Retrieval Techniques Unit III
PPTX
Module 8- Technological and Communication Skills.pptx
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
CyberSecurity Mobile and Wireless Devices
Software Engineering and software moduleing
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Feature types and data preprocessing steps
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Design Guidelines and solutions for Plastics parts
Soil Improvement Techniques Note - Rabbi
August -2025_Top10 Read_Articles_ijait.pdf
Fundamentals of Mechanical Engineering.pptx
Information Storage and Retrieval Techniques Unit III
Module 8- Technological and Communication Skills.pptx
Visual Aids for Exploratory Data Analysis.pdf
August 2025 - Top 10 Read Articles in Network Security & Its Applications
Exploratory_Data_Analysis_Fundamentals.pdf
Management Information system : MIS-e-Business Systems.pptx
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
CyberSecurity Mobile and Wireless Devices

IRJET- Adverse Environment and its Compensation Approaches in Speech Recognition

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1212 ADVERSE ENVIRONMENT AND ITS COMPENSATION APPROACHES IN SPEECH RECOGNITION Saumya Sah1 1Department of Electronics and Communication Engineering,Birla Institute of Technology, Mesra, Ranchi, India -----------------------------------------------------------------------***-------------------------------------------------------------------- ABSTRACT: The paper focuses on the causes of the low performances of the speech recognition system due to the stressful and noisy environment in real time applications like car, airplane cockpit, etc. This paper deals with the major problems like Adverse Environment and other issues that is noise, distortion of speech recognition system which causes the failure in real time applications. The adverse environment is simply the mismatched between the environments of training and testing of the speech signal. The paper briefly summarized different approaches to deals with the adverse environment. Key Words: Cause of speech variation, Noise, Source of Noise, Distortion, Adverse Environment. 1. INTRODUCTION The voice of the human being is the primary mode of communication among humans. The researchers are working to improve the accuracy level so that the voice or speech could be use in real time applications [1]. Speech Recognition is the process of enabling a computer to identify and respond to the sounds produced by human in their speech. Speech recognition system often fails to provide an excellent accuracy level in real applications whereas it provides a high accuracy in experimental level. This is mainly occurring due to the speech recognition system which is trained in normal and clean environment but applied in stressful and noisy environment [2]. Most of the speech recognition system degrades rapidly in their performance due to the presence of noise and distortion in the speech signal. The speech recognition systems are designed by keeping in mind the prediction or assumptions of the ambient conditions, like background speech style, transducers, low noise, channels etc. [2, 3]. The different causes of the variation in speech due to which the distortion occurred are discussed in below section [4]. 1.1 Causes of Speech Variation: Few causes of speech variations are discussed below. Mainly there are three parameters over which the speech of human can be varied from one another and could produce noise in the speech. These are: a) Environment: It is the acoustic properties of the environment including the impact on a speaker’s voice [5]. There are two type of environment noise: Speech Correlated noise and uncorrelated noise. b) Speaker Speaker itself could be the cause of variation in speech. There may be two reasons for this noise to produce. These are: Attributes of speakers which includes dialect, gender, age and the other one is Manner of speaking which includes breath and lip noise, stress, Lombard effect, rate of delivery, level and dynamic range, pitch, cooperativeness. c) Input Equipment Input Equipments are used to deliver human voice to a place distant to the user are one of the cause of speech variation. These are due to microphone, distance to microphone, filter, recording equipment and transmission system which includes channel distortion, noise, and echo. 2. ADVERSE ENVIRONMENT The use of the term “adverse environments” implies unknown, mismatched and often severe differences in variables and conditions between training and testing. A speech recognizer often encounters three main causes of adverse conditions: noise, distortion, and (human) articulation effects. 2.1 Noise By nature, noise is usually considered additive. It can be wideband (assumed to be Gaussian distributed and white) or coloured, stationary or non-stationary. High levels of ambient noise are one of the primary concerns for speech recognizer. 2.2 Source of Noise The source of additive noise can be found in several environments including the office, car and the aircraft cockpit. The sound pressure level (SPL) in a normal personal office is around 45-50dBA. In a business office it is around 15-20dB higher than the personal office. Inside an automobile, the noise level due to engine, cooling fan, wind tire and road is usually considerably high, particularly when the automobile is moving at a high speed. In the cockpit of a modern jet fighter aircraft, SPLs
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1213 of 90dB or more across the speech frequency band have been reported. There are many type of noises like electric noise and quantization noise, which are present inside of any electronic speech recognition system, are at a low level below the concern threshold. Noise due to transmissions and switching equipments in a telephone network is a factor which is affecting recognizer performance. At the noise level of equal to the noise level of cockpit, the speech signal is barely intelligible even to a human listener, not to mention an automatic speech recognition machine. 2.3 Distortion The speech signal before recorded and processed for speech recognition, undergoes a series of spectral distortions. The microphone transducer can also distort the speech spectrum depending on its type and mounting positions. The transducer configuration used in testing is different from that used during training of the reference patterns, the mismatch in spectral distortion becomes one of the major problems. It was reported that a large vocabulary speech recognition system with a performance of 85% word accuracy in a matched transducer condition could only achieve less than 19% word accuracy when a different microphone was used during testing. Telephones channels are characterized by a high frequency pre emphasis with significantly decrease the dynamic range of the speech signal and increase the signal correlation. There is a wide range of variation in attenuation in the frequency spectrum and this can cause a spectral mismatch unless the telephone channel is measured and adapted to beforehand. Figure below compares the amplitude of frequency response for two different telephone channels. The distance between the microphone and the speaker’s mouth can cause a poor SNR and the acoustics of the room or the microphone may be differ from the original set up in which the reference patterns were created. Fig-1: Frequency Response for two different telephone channels. Fig-2: Frequency Response for two different microphone transducers. 2.4 Articulation Effects There are many factors which affect the manner of speaking of individual talker. The main difficultly in dealing with the articulation effects is the lack of understanding that how to quantify them. In terms of spectral characteristics, we can measure or model the effects to improve the recognizer design with noise or channel distortion which usually does not vary as rapidly as the speech itself. For example, a white noise is usually specified by its power level, where as a channel is often represented by its frequency response. 2.5 Lombard Effect Lombard Effect is the characteristics changes in articulation due to environmental influence. The Lombard Effect is manifested in such changes as: 1. Louder speech (increases energy). 2. Energy migration in the frequency domain, F1 increases while F2 decreases. 3. Changes in spectral balance. A speaker dependent isolated word recognizer that had accuracy of 91% when trained and tested in clean condition could only achieve accuracy of 62% when the test utterances contained the Lombard Effect, even though the test tokens were free from masking noise. 3. DEALING WITH ADVERSE EFFECTS A speech recognizer that uses reference patterns trained from speech with the corrupting noise whose characteristics are approximately unknown performs more robustly than one that uses clean reference patterns. The key idea is to train the recognizer with a multi-style training schemes in which speech signals of talker of different talking styles are used as the training data.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1214 In an isolated word recognition experiment using hidden Markov models had reported that the error rate of a recognizer, when tested with the speech of talkers having various talking styles was reduced by more than a factor of 2, from 17.5% using only normally spoken reference pattern to 6.9% with multi style training. Multi-style training seemed to be most effective with the speech which exhibits the Lombard effect and with speech produced under high emotional conditions. Some of the methods and algorithm that have been proposed to combat the unknown environment in which speech recognizer must operate: 1. Signal enhancement pre-processing When the adverse condition is alone due to the additive noise, one can use speech enhancement methods to suppress the noise before applying the recognition algorithms. One of the most widely used signal enhancement method is adaptive noise cancellation using two signal sources. Fig-3: Schematic of the two input noise cancelling approach. In the above figure x is the clean signal whereas y1 and y2 are two noise signals and z is the noisy signal. The technique adaptively adjusts the filter coefficient so that on subtracting the filtered (noise) reference input from the primary (speech plus noise) input leads to an output signal with minimum energy. This technique requires noise component in the corrupted signal and the reference noise have high coherence. 2. Special transducer arrangement A noise cancelling microphone is a specially designed microphone in which both sides of the diaphragm are exposed to the sound field in such a way that sounds coming from a relatively large distance (far field) are cancelled because the sound pressure causes virtually no net force on the diaphragm. For sound sources close to the Microphone, the back of the diaphragm is effectively shaded from the sound field and the sound pressure is received only by the front of the diaphragm. It can be effective in suppressing low frequency noises and can be used in Car, automobile or aircraft cockpit environment. 3. Noise masking and adaptive models In the presence of broadband noise, certain regions of the speech spectrum which are of lower level will be more affected by the noise. This makes the calculation of the spectral distortion difficult as more corrupted regions represent less reliable spectral measurements. By recognizing this difficulty, the use of noise masking in conjunction with a filter-bank analyser is suggested. The key idea is to choose first, for each channel of the filter- bank output, the masking noise level is the greater of the noise level in the reference signal and that in the testing signal, and then replace that channel output by the mask value if it is below the corresponding mask level. This helps prevent spurious distortion accumulation because those channels that are determined to have been seriously corrupted by noise will have the same spectral value in both the training and the testing tokens. The technique of noise compensation was also employed by adapting the spectral prototypes to the noise condition in the autocorrelation domain. The underlying assumption, unlike the above masking model, was that the power spectra of speech and noise are additive and so are the autocorrelations. 4. Stress compensation The purpose of stress compensation is to provide offset for the spectral distortion caused by extraordinary speaking effort due to the talker’s reaction to ambient conditions. Because of the difficulties in modeling such characteristic changes, no analytical result is available. Several proposed heuristic techniques are noteworthy because of their effectiveness. A traditional template- based system that incorporates both vector quantization and dynamic time warping. Another stress compensation technique is operating in the cepstral domain. 5. Robust distortion measures The idea of robust distortion measures is to emphasize selectively and automatically the distortion pertaining to certain regions of the spectrum that are less corrupted by noise. A noise compensation scheme can be interpreted as implicitly defining a robust distortion. Many distortion measures are there: 1. Cepstral distance 2. Likelihood ratio distortion 3. Weighted Likelihood Ratio distortion 4. Asymmetrically Weighted Likelihood Ratio distortion
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 08 | Aug 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1215 6. Novel representation of speech Apart from the above discussed schemes and algorithms, there are other attempts to compensate the noise problem by finding characteristic representations of speech that are invariant or resistant to noise corruption. In other words, the approach is to find a robust “front-end” that will reliably produce measurements of speech even in the presence of noise. There are two attempts; one is from a signal processing viewpoint and the other tries to duplicate the human auditory capability. IV CONCLUSION Different approaches and algorithms have been proposed to cope with ambient noise, distortions and other noise-induced problems in speech recognition system. These approaches can be categorized as: signal enhancement preprocessing, special transducer arrangements, noise masking, stress compensation, robust distortion measures and novel representations of speech. Proper adoption of these methods proves beneficial when dealing with noisy recognition problems. V REFERENCES [1]. Chavan, Karishma, and Ujwalla Gawande. "Speech recognition in noisy environment, issues and challenges: A review." In Soft- Computing and Networks Security (ICSNS), 2015 International Conference on, pp. 1-5. IEEE, 2015. [2]. Rajasekaran, P., G. Doddington, and J. Picone. "Recognition of speech under stress and in noise." In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'86., vol. 11, pp. 733-736. IEEE, 1986. [3]. Starks, David Ross. Speech recognition in adverse environments: Improvements to IMELDA. University of Ottawa (Canada), 1995. [4]. Rabiner, Lawrence R., and Biing-Hwang Juang. Fundamentals of speech recognition. Vol. 14. Englewood Cliffs: PTR Prentice Hall, 1993. [5]. Shrawankar, Urmila, and Vilas M. Thakare. "Adverse conditions and ASR techniques for robust speech user interface." arXiv preprint arXiv:1303.5515 (2013) [6]. Juang, B. H. "Speech recognition in adverse environments." Computer speech & language 5, no. 3 (1991): 275-294. [7]. Rahim, Mazin G., Biing-Hwang Juang, Wu Chou, and Eric Buhrke. "Signal conditioning techniques for robust speech recognition." IEEE Signal Processing Letters 3, no. 4 (1996): 107- 109.