SlideShare a Scribd company logo
Unit 6 Speech Signal
DR MINAKSHI PRADEEP ATRE
PVG’S COET & GKPIM PUNE
References
Book: Speech and Audio Processing by Dr Shaila Apte madam
Pdf document: http://cs.haifa.ac.il/~nimrod/Compression/Speech/S1Basics2010.pdf
For speech samples:
https://www.signalogic.com/index.pl?page=speech_codec_wav_samples
Contents
Speech:
1. Basics of speech signal and its features
2. LTI representation of speech signal
3. LTV representation of speech signal
4. Estimation of fundamental frequency
5. identification of voiced and unvoiced speech
6. and noise removal
Speech
Speech signal is generated by nature
Naturally occurring so random in nature
Necessary to understand the generalized human speech production
Simple linear time invariant (LTI) model for speech production
Inherently time varying nature of speech
Introduction to linear time variant (LTV) model of speech
Speech type: consonants, fricatives
Voiced and unvoiced (V/UV) speech
Speech Production Mechanism: Pipelines
Model
Vocal Tract
Vocal Tract
 Vocal tract is the cavity between the vocal cords and the
lips, and acts as a resonator that spectrally shapes the
periodic input, much like the cavity of a musical wind
instrument. ƒ
Simple model of a steady-state vowel regards the vocal
tract as a linear time-invariant (LTI) filter with a periodic
impulse-like input.
What is Speech signal?
 Created at the Vocal cords, travels through the Vocal tract, and
produced at speakers mouth
 Gets to the listeners ear as a pressure wave
 Non-Stationary, but can be divided to sound segments which have
some common acoustic properties for a short time interval
 Two Major classes: Phonemes (Vowels and Consonants)
Phonemes
The basic sounds of a language (e.g. "a" in the word "father“) are
called phonemes
A typical speech utterance consists of a string of vowel and
consonant phonemes whose temporal and spectral characteristics
change with time
In addition, the time-varying source and system can also
nonlinearly interact in a complex way: our simple model is correct for
a steady vowel, but the sounds of speech are not always well
represented by linear time-invariant systems !
Vowel Production
In vowel production, air is forced from the lungs by contraction of
the muscles around the lung cavity
Air flows through the vocal cords, which are two masses of flesh,
causing periodic vibration of the cords whose rate gives the pitch of
the sound
Resulting periodic puffs of air act as an excitation input, or source,
to the vocal tract
Typical Vowels
Speech Production
A sound source excites a (vocal tract) filter
◦ Voiced: Periodic source, created by vocal cords
◦ Unvoiced: Aperiodic and noisy source
Pitch is the fundamental frequency of the vocal cords vibration (also called F0) followed by 4-5
Formants (F1 - F5) at higher frequencies
Natural frequencies occur at
odd multiples of 500 Hz.
These resonant frequencies
are called formants.
Vowel Adult Male Adult Female
F1 F2 F3 F1 F2 F3
(i) 255 2330 3000 340 2610 3210
(u) 290 940 2180 390 995 2585
(ae) 735 1625 2465 950 1955 2900
Typical formant frequencies for selected vowels in Hz
This table shows
the three values
LTI Model for speech production
Impulse Train
Generator
(Glottis)
Random Signal
Generator
Impulse Response
of Vocal Tract
Generated Speech
Impulse train generator is
used as an excitation signal
when a voiced segment is
produced VOWEL
e.g. “a”
Basic Assumption: source of excitation and
the vocal tract systems are independent
Periodic
LTI Model for speech production
Impulse Train
Generator
(Glottis)
Random Signal
Generator
Impulse Response
of Vocal Tract
Generated Speech
Random Signal Generator is
used as an excitation signal
when an unvoiced segment
is produced
CONSONANTS
e.g. “s”
LTI model is used for a short segment of
speech @10 ms for which we can assume the
parameters of vocal tract remain constant
Random
Nature of Speech Signal
 Speech is generated by components like vocal cords and vocal tracts
 It’s not possible to generate a speech signal on its own
Speech is random signal
 Speech has/ can have infinite features (story of an elephant and the blind people touching the
elephant to identify and specify what the elephant looks like)
So it’s a complex problem
 Uttering the different words is possible because of humans can change the resonant modes of
the vocal cavity and can also stretch the vocal cords to some extent for modifying the pitch
period for different vowels
And that’s why we have the linear time-varying (LTV) model
Linear Time-varying Model: Speech
production
Impulse Train
Generator
Random Signal
Generator
Impulse Response
of Vocal Tract
Generated Speech
Amplitude
Pitch period is
variable
Impulse response is
variable
Speech Sound Categories
Periodic (Sonorants, Voiced)
Noisy (Fricatives , Un-Voiced)
Impulsive (Plosive)
Example:
In the word “shop,” the “sh,” “o,” and “p” are generated from a
noisy, periodic, and impulsive source, respectively
Frequency Range
Speech:
Pitch frequency:
◦ male ~ 85-155 Hz;
◦ female ~ 165-255 Hz;
Singer’s vocal range: from bass to
soprano: 80 Hz-1100 Hz
Pitch
Pitch period: The time duration of one glottal cycle
Pitch (fundamental frequency): The reciprocal of the pitch period.
Remember: we will
calculate the pitch
for voiced segment
Pitch Detection
The pitch period and V/UV
decisions are elementary
to many speech coders
Many methods for the
calculation:
◦ Autocorrelation function
◦ ZCR
Features or categorization of speech
sound
Speech sounds are studied and classified from the following
perspectives:
1) The nature of the source: periodic, noisy, or impulsive, and
combinations of the three
2) The shape of the vocal tract
3) The time-domain waveform, which gives the pressure change with
time at the lips output
4) The time-varying spectral characteristics revealed through the
spectrogram
Spectrogram
Time-varying spectral characteristics of the speech signal can be graphically
displayed through the use of a tow-dimensional pattern
Vertical axis: frequency, Horizontal axis: time
The pseudo-color of the (red: high energy ) pattern is proportional to signal
energy
The resonance frequencies of the vocal tract show up as “energy bands”
Voiced intervals characterized by striated appearance (periodically of the
signal)
Un-Voiced intervals are more solidly filled in
Yellow are formants
Most common Manner of articulation
Plosive, or oral stop, where there is complete occlusion (blockage) of both the oral and nasal
cavities of the vocal tract, and therefore no air flow. Examples include English /p t k/ (voiceless)
and /b d g/ (voiced)
Nasal stop, where there is complete occlusion of the oral cavity, and the air passes instead
through the nose. The shape and position of the tongue determine the resonant cavity that
gives different nasal stops their characteristic sounds. Examples include English /m, n/
Fricative, sometimes called spirant, where there is continuous frication (turbulent and noisy
airflow) at the place of articulation. Examples include English /f, s/ (voiceless), /v, z/ (voiced), etc
Most common Manner of articulation
Sibilants are a type of fricative where the airflow is guided by a groove in the tongue toward the
teeth, creating a high-pitched and very distinctive sound. These are by far the most common
fricatives. English sibilants include /s/ and /z
Affricate, which begins like a plosive, but this releases into a fricative rather than having a
separate release of its own. The English letters "ch" and "j" represent affricates
Trill, in which the articulator (usually the tip of the tongue) is held in place, and the airstream
causes it to vibrate. The double "r" of Spanish "perro" is a trill.
Approximant, where there is very little obstruction. Examples include English /w/ and /r/. Lateral
approximants, usually shortened to lateral, are a type of approximant pronounced with the side
of the tongue. English /l/ is a lateral.
Time for MATLAB Program
THANK YOU

More Related Content

PPT
Voice morphing
PPTX
What are Digital Hearing Aids
PPT
Bone Anchored Hearing Aid JC
PPT
Hearing Instrument Fitting Formulae History and Overview
PDF
Voice therapy
PDF
Speech signal processing lizy
PPTX
audiology.pptx
PPTX
PHONATION.pptx
Voice morphing
What are Digital Hearing Aids
Bone Anchored Hearing Aid JC
Hearing Instrument Fitting Formulae History and Overview
Voice therapy
Speech signal processing lizy
audiology.pptx
PHONATION.pptx

What's hot (20)

PPTX
disorder of voice
PPTX
Evaluation of voice disorders
PPT
1. fluency introduction
DOC
Otoacoustic emissions (sbo 3& k.j.lee )
PPT
Cochlear implant (3)
PPTX
Artifact Detection and Removal from In-Vivo Neural Signals
PPSX
Recent Advances in Cochlear Implant Candidacy
PPT
Long latency responses (Niraj)
PPT
Voice Therapy
PDF
RIC (Receiver in Canal) hearing aid operations Manual
PPT
7. Audiometry Dr. Krishna Koirala
PPT
Voice morphing-101113123852-phpapp01 (1)
PPTX
Assistive Listening Devices
PPTX
Auditory Middle Latency Response (AMLR)
PPTX
MTD & Laryngeal Massage
PPTX
Middle ear implants
PPTX
Acoustic Immittance Measurements
PDF
digital signal-processing-lab-manual
PPTX
Impedance audiometry part2
disorder of voice
Evaluation of voice disorders
1. fluency introduction
Otoacoustic emissions (sbo 3& k.j.lee )
Cochlear implant (3)
Artifact Detection and Removal from In-Vivo Neural Signals
Recent Advances in Cochlear Implant Candidacy
Long latency responses (Niraj)
Voice Therapy
RIC (Receiver in Canal) hearing aid operations Manual
7. Audiometry Dr. Krishna Koirala
Voice morphing-101113123852-phpapp01 (1)
Assistive Listening Devices
Auditory Middle Latency Response (AMLR)
MTD & Laryngeal Massage
Middle ear implants
Acoustic Immittance Measurements
digital signal-processing-lab-manual
Impedance audiometry part2
Ad

Similar to Part1 speech basics (20)

PPT
Phonetics
PDF
Linguistics
PPTX
CHAPTER 2 Phonetics_Linguistics for L teachers.pptx
PPT
Principal characteristics of speech
PPTX
English Mystery 2
PPTX
Phonetics
PPT
4455355.ppt
PPTX
speech processing basics
PDF
SodaBottles-licensing Copyright-Fix.pdf
PPTX
Phonetics & Phonology Mine.pptx
PPTX
Phoneticsphonology lecture 2
PPTX
Principal characteristics of speech
PPT
Phonetic and phonology pp2
PPTX
Phonetics ( Introduction to Linguistics )
PPT
phonetics and articulation speech organ.ppt
PPT
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
PPTX
phonetics and phonology CONSONANT SOUNDS
PPTX
Cube model Theory of acoustic phonetics
PPTX
Week 2 Phonetics The Sounds of Language.pptx
PPTX
Acoustic phonetics
Phonetics
Linguistics
CHAPTER 2 Phonetics_Linguistics for L teachers.pptx
Principal characteristics of speech
English Mystery 2
Phonetics
4455355.ppt
speech processing basics
SodaBottles-licensing Copyright-Fix.pdf
Phonetics & Phonology Mine.pptx
Phoneticsphonology lecture 2
Principal characteristics of speech
Phonetic and phonology pp2
Phonetics ( Introduction to Linguistics )
phonetics and articulation speech organ.ppt
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
phonetics and phonology CONSONANT SOUNDS
Cube model Theory of acoustic phonetics
Week 2 Phonetics The Sounds of Language.pptx
Acoustic phonetics
Ad

More from Minakshi Atre (20)

PPTX
Signals&Systems: Quick pointers to Fundamentals
PPTX
Unit 4 Statistical Learning Methods: EM algorithm
PPTX
Inference in HMM and Bayesian Models
PPTX
Artificial Intelligence: Basic Terminologies
PPTX
2)local search algorithms
PPTX
Performance appraisal/ assessment in higher educational institutes (HEI)
PPTX
DSP preliminaries
PPTX
Artificial intelligence agents and environment
PPTX
Unit 6: DSP applications
PPTX
Unit 6: DSP applications
PPTX
Learning occam razor
PPTX
Learning in AI
PDF
Waltz algorithm in artificial intelligence
PPTX
Perception in artificial intelligence
PPTX
Popular search algorithms
PPTX
Artificial Intelligence Terminologies
PPTX
composite video signal
PPTX
Basic terminologies of television
PPTX
PPTX
Beginning of dtv
Signals&Systems: Quick pointers to Fundamentals
Unit 4 Statistical Learning Methods: EM algorithm
Inference in HMM and Bayesian Models
Artificial Intelligence: Basic Terminologies
2)local search algorithms
Performance appraisal/ assessment in higher educational institutes (HEI)
DSP preliminaries
Artificial intelligence agents and environment
Unit 6: DSP applications
Unit 6: DSP applications
Learning occam razor
Learning in AI
Waltz algorithm in artificial intelligence
Perception in artificial intelligence
Popular search algorithms
Artificial Intelligence Terminologies
composite video signal
Basic terminologies of television
Beginning of dtv

Recently uploaded (20)

PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Geodesy 1.pptx...............................................
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
PPT on Performance Review to get promotions
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
composite construction of structures.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT
Mechanical Engineering MATERIALS Selection
PPTX
additive manufacturing of ss316l using mig welding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PPTX
web development for engineering and engineering
PPTX
Construction Project Organization Group 2.pptx
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Geodesy 1.pptx...............................................
Operating System & Kernel Study Guide-1 - converted.pdf
Safety Seminar civil to be ensured for safe working.
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPT on Performance Review to get promotions
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
bas. eng. economics group 4 presentation 1.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
composite construction of structures.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Mechanical Engineering MATERIALS Selection
additive manufacturing of ss316l using mig welding
Embodied AI: Ushering in the Next Era of Intelligent Systems
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
web development for engineering and engineering
Construction Project Organization Group 2.pptx
Fundamentals of safety and accident prevention -final (1).pptx

Part1 speech basics

  • 1. Unit 6 Speech Signal DR MINAKSHI PRADEEP ATRE PVG’S COET & GKPIM PUNE
  • 2. References Book: Speech and Audio Processing by Dr Shaila Apte madam Pdf document: http://cs.haifa.ac.il/~nimrod/Compression/Speech/S1Basics2010.pdf For speech samples: https://www.signalogic.com/index.pl?page=speech_codec_wav_samples
  • 3. Contents Speech: 1. Basics of speech signal and its features 2. LTI representation of speech signal 3. LTV representation of speech signal 4. Estimation of fundamental frequency 5. identification of voiced and unvoiced speech 6. and noise removal
  • 4. Speech Speech signal is generated by nature Naturally occurring so random in nature Necessary to understand the generalized human speech production Simple linear time invariant (LTI) model for speech production Inherently time varying nature of speech Introduction to linear time variant (LTV) model of speech Speech type: consonants, fricatives Voiced and unvoiced (V/UV) speech
  • 5. Speech Production Mechanism: Pipelines Model Vocal Tract
  • 6. Vocal Tract  Vocal tract is the cavity between the vocal cords and the lips, and acts as a resonator that spectrally shapes the periodic input, much like the cavity of a musical wind instrument. ƒ Simple model of a steady-state vowel regards the vocal tract as a linear time-invariant (LTI) filter with a periodic impulse-like input.
  • 7. What is Speech signal?  Created at the Vocal cords, travels through the Vocal tract, and produced at speakers mouth  Gets to the listeners ear as a pressure wave  Non-Stationary, but can be divided to sound segments which have some common acoustic properties for a short time interval  Two Major classes: Phonemes (Vowels and Consonants)
  • 8. Phonemes The basic sounds of a language (e.g. "a" in the word "father“) are called phonemes A typical speech utterance consists of a string of vowel and consonant phonemes whose temporal and spectral characteristics change with time In addition, the time-varying source and system can also nonlinearly interact in a complex way: our simple model is correct for a steady vowel, but the sounds of speech are not always well represented by linear time-invariant systems !
  • 9. Vowel Production In vowel production, air is forced from the lungs by contraction of the muscles around the lung cavity Air flows through the vocal cords, which are two masses of flesh, causing periodic vibration of the cords whose rate gives the pitch of the sound Resulting periodic puffs of air act as an excitation input, or source, to the vocal tract
  • 11. Speech Production A sound source excites a (vocal tract) filter ◦ Voiced: Periodic source, created by vocal cords ◦ Unvoiced: Aperiodic and noisy source Pitch is the fundamental frequency of the vocal cords vibration (also called F0) followed by 4-5 Formants (F1 - F5) at higher frequencies Natural frequencies occur at odd multiples of 500 Hz. These resonant frequencies are called formants. Vowel Adult Male Adult Female F1 F2 F3 F1 F2 F3 (i) 255 2330 3000 340 2610 3210 (u) 290 940 2180 390 995 2585 (ae) 735 1625 2465 950 1955 2900 Typical formant frequencies for selected vowels in Hz This table shows the three values
  • 12. LTI Model for speech production Impulse Train Generator (Glottis) Random Signal Generator Impulse Response of Vocal Tract Generated Speech Impulse train generator is used as an excitation signal when a voiced segment is produced VOWEL e.g. “a” Basic Assumption: source of excitation and the vocal tract systems are independent Periodic
  • 13. LTI Model for speech production Impulse Train Generator (Glottis) Random Signal Generator Impulse Response of Vocal Tract Generated Speech Random Signal Generator is used as an excitation signal when an unvoiced segment is produced CONSONANTS e.g. “s” LTI model is used for a short segment of speech @10 ms for which we can assume the parameters of vocal tract remain constant Random
  • 14. Nature of Speech Signal  Speech is generated by components like vocal cords and vocal tracts  It’s not possible to generate a speech signal on its own Speech is random signal  Speech has/ can have infinite features (story of an elephant and the blind people touching the elephant to identify and specify what the elephant looks like) So it’s a complex problem  Uttering the different words is possible because of humans can change the resonant modes of the vocal cavity and can also stretch the vocal cords to some extent for modifying the pitch period for different vowels And that’s why we have the linear time-varying (LTV) model
  • 15. Linear Time-varying Model: Speech production Impulse Train Generator Random Signal Generator Impulse Response of Vocal Tract Generated Speech Amplitude Pitch period is variable Impulse response is variable
  • 16. Speech Sound Categories Periodic (Sonorants, Voiced) Noisy (Fricatives , Un-Voiced) Impulsive (Plosive) Example: In the word “shop,” the “sh,” “o,” and “p” are generated from a noisy, periodic, and impulsive source, respectively
  • 17. Frequency Range Speech: Pitch frequency: ◦ male ~ 85-155 Hz; ◦ female ~ 165-255 Hz; Singer’s vocal range: from bass to soprano: 80 Hz-1100 Hz
  • 18. Pitch Pitch period: The time duration of one glottal cycle Pitch (fundamental frequency): The reciprocal of the pitch period. Remember: we will calculate the pitch for voiced segment
  • 19. Pitch Detection The pitch period and V/UV decisions are elementary to many speech coders Many methods for the calculation: ◦ Autocorrelation function ◦ ZCR
  • 20. Features or categorization of speech sound Speech sounds are studied and classified from the following perspectives: 1) The nature of the source: periodic, noisy, or impulsive, and combinations of the three 2) The shape of the vocal tract 3) The time-domain waveform, which gives the pressure change with time at the lips output 4) The time-varying spectral characteristics revealed through the spectrogram
  • 21. Spectrogram Time-varying spectral characteristics of the speech signal can be graphically displayed through the use of a tow-dimensional pattern Vertical axis: frequency, Horizontal axis: time The pseudo-color of the (red: high energy ) pattern is proportional to signal energy The resonance frequencies of the vocal tract show up as “energy bands” Voiced intervals characterized by striated appearance (periodically of the signal) Un-Voiced intervals are more solidly filled in
  • 23. Most common Manner of articulation Plosive, or oral stop, where there is complete occlusion (blockage) of both the oral and nasal cavities of the vocal tract, and therefore no air flow. Examples include English /p t k/ (voiceless) and /b d g/ (voiced) Nasal stop, where there is complete occlusion of the oral cavity, and the air passes instead through the nose. The shape and position of the tongue determine the resonant cavity that gives different nasal stops their characteristic sounds. Examples include English /m, n/ Fricative, sometimes called spirant, where there is continuous frication (turbulent and noisy airflow) at the place of articulation. Examples include English /f, s/ (voiceless), /v, z/ (voiced), etc
  • 24. Most common Manner of articulation Sibilants are a type of fricative where the airflow is guided by a groove in the tongue toward the teeth, creating a high-pitched and very distinctive sound. These are by far the most common fricatives. English sibilants include /s/ and /z Affricate, which begins like a plosive, but this releases into a fricative rather than having a separate release of its own. The English letters "ch" and "j" represent affricates Trill, in which the articulator (usually the tip of the tongue) is held in place, and the airstream causes it to vibrate. The double "r" of Spanish "perro" is a trill. Approximant, where there is very little obstruction. Examples include English /w/ and /r/. Lateral approximants, usually shortened to lateral, are a type of approximant pronounced with the side of the tongue. English /l/ is a lateral.
  • 25. Time for MATLAB Program