A Case Study on DSP (Speech Processing)

A Case Study on
DSP Speech
Processing

What is Speech Processing?
1. Speech Coding 2. Speech Recognition 3. Speech Verification
4. Speech Enhancement5. Speech Synthesis
Speech processing is the application of digital signal processing (DSP) techniques to
the processing and analysis of speech signals.
The application of speech processing includes:

Process of Speech Production
The speech production process begins
when the talker formulates a message
in his/her mind to transmit to listener
via speech. The next step is the
conversion of the message into a
language code. This corresponds to
converting the message into a set of
phoneme sequences corresponding to
sounds that make up the words, along
with prosody makers denoting duration
of sounds, loudness of sounds, and
pitch associated with the sounds.
Figure: Shows a schematic diagram of the speech
production/perception process in human beings

Information Rate of the Speech Signal
First Stage
The discrete symbol
information rate in the raw
message text is rather low
(about 50 bits per second
corresponding to about 8
sounds per second, where
each sound is one of the
about 50 distinct
symbols). After the
language code conversion,
with the inclusion of
prosody information, the
information rate rises to
about 200 bps.
Second Stage
In the next stage the
representation of the
information in the signal
becomes continuous with
an equivalent rate of about
2000 bps at the
neuromuscular control
level and about 30000-
50000 bps at the acoustic
signal level.
Third Stage
The continuous
information rate at the
basilar membrane is in the
range of 30000-500000
bps, while at the neural
transduction stage is about
2000 bps.
Fourth Stage
The higher-level
processing within the
brain converts the neural
signals to a discrete
representation, which
ultimately is decoded into
a low bit rate message.

Classification of Speech Sound
Type 1: VOICED speech is produced when the vocal cords play an active role in the
production of sound:
•50: 200 Hz for male speakers
•150: 300 Hz for female speakers.
•200: 400 Hz for child speakers.
Example: Voiced sounds (A), (E), (I).
Type 2: UNVOICED Speech is produced when vocal cords are inactive.
The vocal cords are held open and air flows continuously through them.
Example: Unvoiced sounds (S), (F).

Formant Frequencies
Speech normally exhibits one formant frequency in
every 1KHz. For VOICED speech, the magnitude of
the lower formant frequencies is successively larger
than the magnitude of the higher formant frequencies.
For UNVOICED speech, the magnitude of the higher
formant frequencies is successively larger than the
magnitude of the lower formant frequencies.

Basic Assumption of Speech Processing
Parameters & Speech Sound
1. Phonemes: Smallest segments of speech sounds /d/ and /b/ are distinct
phonemes e.g. dark and bark.
2. It is important to realize, that phonemes are abstract linguistic units and may
not be directly observed in the speech signal.
3. Different speakers producing the same string of phonemes convey the same
information yet sound different as a result of differences in dialect and vocal
tract length and shape.
4. There are about 40 phonemes in English.
5. We can see the table for IPA (international Phonetic Alphabet) symbol for each
phoneme together with sample words in which they occur.

Model for Speech Production
To develop an accurate model for how speech is produced, it is necessary to develop a digital
filter-based model of human speech production mechanism. The model must contain 4 steps:
Steps of Speech Production
Operation of the Vocal Tract
Lip/Nasal Radiation Process
Both Voice and Unvoiced Speech
Time Frame: 10-20ms

Overall Speech Production Model

A Case Study on DSP (Speech Processing)

More Related Content

What's hot (20)

Similar to A Case Study on DSP (Speech Processing) (20)

More from Md. Towhidul Islam Chowdhury (8)

Recently uploaded (20)

A Case Study on DSP (Speech Processing)