SlideShare a Scribd company logo
Dynamic Time Wrapping (DTW)
Dynamic time wrapping (DTW) is a well-known technique to find an optimal alignment
between two given (time-dependent) sequences under certain restrictions (Fig. 01). Intuitively,
the sequences are warped in a nonlinear fashion to match each other. Originally, DTW has been
used to compare different speech patterns in automatic speech recognition. In fields such as
data mining and information retrieval, DTW has been successfully applied to automatically
cope with time deformations and different speeds associated with time-dependent data. Fig-
01 showing two speech signal aligned with DTW.
Fig-01: Two Speech Signal Aligned using DTW
Fig. 02: Time alignment of two time-dependent sequences. Aligned points are indicated by
the arrows
The distance between two point, x=[x1,x2,...,xn] and y=[y1,y2,...,yn]
in a n-dimensional space can be computed via the Euclidean distance:
dist(x,y)=∥x−y∥=√((𝑥1 − 𝑦1)2 + (𝑥2 − 𝑦2)2 + ⋯ + (𝑥 𝑛 − 𝑦 𝑛)2)
However, if the length of x is different from y, then we cannot use the above formula to
compute the distance. Instead, we need a more flexible method that can find the best mapping
from elements in x to those in y in order to compute the distance.
The goal of dynamic time wrapping (DTW for short) is to find the best mapping with the
minimum distance by the use of DP. The method is called "time wrapping " since both x and
y are usually vectors of time series and we need to compress or expand in time in order to
find the best mapping. We shall give the formula for DTW in this section.
Let t and r be two vectors of lengths m and n, respectively. The goal of DTW is to find a
mapping path {(p1,q1),(p2,q2),...,(pk,qk)} such that the distance on this mapping path
∑ |𝑡(𝑝𝑖) − 𝑟(𝑞𝑖)|𝑘
𝑖=0 is minimized, with the following constraints:
 Boundary conditions: (p1,q1)=(1,1), (pk,qk)=(m,n). This is a typical example of
"anchored beginning" and "anchored end".
 Local constraint: For any given node (i,j) in the path, the possible fan-in nodes are
restricted to (i−1,j), (i,j−1), (i−1,j−1). This local constraint guarantees that the
mapping path is monotonically non-decreasing in its first and second arguments.
Moreover, for any given element in t, we should be able to find at least one
corresponding element in r, and vice versa.
How can we find the optimum mapping path in DTW? An obvious choice is forward DP,
which can be summarized in the following three steps:
1. Optimum-value function: Define D(i,j) as the DTW distance between t(1:i) and r(1:j),
with the mapping path starting from (1,1) to (i,j).
2. Recursion:
with the initial condition D(1,1)=|t(1)−r(1)|
3. Final answer: D(m,n).
In practice, we need to construct a matrix D of dimensions m×n first and fill in the value of
D(1,1) by using the initial condition. Then by using the recursive formula, we fill the whole
matrix one element at a time, by following a column-by-column or row-by-row order. The
final answer will be available as D(m,n), with a computational complexity of O(mn).
If we want to know the optimum mapping path in addition to the minimum distance, we may
want to keep the optimum fan-in of each node. Then at the end of DP, we can quickly back
track to find the optimum mapping path between the two input vectors.
We can also have the backward DP for DTW, as follows:
1. Optimum-value function: Define D(i,j) as the DTW distance between t(i:m) and r(j:n),
with the mapping path from (i,j) to (m,n).
2. Recursion:
with the initial condition D(m,n)=|t(m)−r(n)|
3. Final answer: D(1,1).
The answer obtain by the backward DP should be that same as that obtained by the forward
DP.
Another commonly used local path constraint is to set the fan-in of 27°-45°-63° only, as
shown in the following figure:
Advantages
 Works well for small number of templates (<20)
 Language independent
 Speaker specific
 Easy to train (end user controls it)
Disadvantages
 Limited number of templates
 Need actual training examples
Applications
 Spoken word recognition
Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus
time axis which needs to be eliminated. Considering any two speech patterns, we can get
rid of their timing differences by wrapping the time axis of one so that the maximum
coincidence is attained with the other.
 Correlation Power Analysis
Unstable clocks are used to defeat naive power analysis. Several techniques are used to
counter this defense, one of which is dynamic time warp.
Vector Quantization
Quantization is a process of mapping an infinite set of scalar or vector quantities by a finite set
of scalar or vector quantities. Quantization has applications in the areas of signal processing,
speech processing and Image processing. In speech coding, quantization is required to reduce
the number of bits used to represent a sample of speech signal. When less number of bits is
used to represent a sample the bit-rate, complexity and memory requirement gets reduced.
Quantization results in the loss in the quality of a speech signal, which is undesirable. So a
compromise must be made between the reduction in bit-rate and the quality of speech signal.
Two types of quantization techniques exist. They are scalar quantization and vector
quantization. “Scalar quantization deals with the quantization of samples on a sample by
sample basis”, while “vector quantization deals with quantizing the samples in groups called
vectors”. Vector quantization increases the optimality of a quantizer at the cost of increased
computational complexity and memory requirements.
Shannon theory states that “quantizing a vector is more effective than quantizing individual
scalar values in terms of spectral distortion”. According to Shannon the dimension of a vector
chosen greatly affects the performance of quantization. Vectors of larger dimension produce
better quality compared to vectors of smaller dimension and in vectors of smaller dimension
the transparency in the quantization is not good at a particular bit-rate chosen. This is because
in vectors of smaller dimension the correlation that exists between the samples is lost and the
scalar quantization itself destroys the correlation that exists between successive samples. So
the quality of the quantized speech signal gets lost. Therefore, quantizing correlated data
requires techniques that preserve the correlation between the samples, which is achieved by the
vector quantization technique (VQ). Vector quantization is the simplification of scalar
quantization. Vectors of larger dimension produce transparency in quantization at a particular
bit-rate chosen. In Vector quantization the data is quantized in the form of contiguous blocks
called vectors rather than individual samples. But later with the development of better coding
techniques, it is made possible that transparency in quantization can also be achieved even for
vectors of smaller dimension. In this thesis quantization is performed on vectors of full length
and on vectors of smaller dimensions for a given bit-rate.
Fig-03 : Two Dimensional Vector Quantizer
An example of two dimensional vector quantizer is shown in Fig-03. The two dimensional
region shown in Fig-03 is called as the voronoi region, which in turn contains several numbers
of small hexagonal regions. The hexagonal regions defined by the red borders are called as the
encoding regions. The green dots represent the vectors to be quantized which fall in different
hexagonal regions and the blue circles represent the codeword‟s (centroids). The vectors (green
dots) falling in a particular hexagonal region is best represented by the codeword (blue circle)
falling in that hexagonal region.
Vector quantization technique has become a great tool with the development of non variational
design algorithms like the Linde, Buzo, Gray (LBG) algorithm. On the other hand besides
spectral distortion the vector quantizer is having its own limitations like the computational
complexity and memory requirements required for the searching and storing of the codebooks.
For applications requiring higher bit-rates the computational complexity and memory
requirements increases exponentially. The block diagram of a vector quantizer is shown in Fig-
04
Fig-04 : Block diagram of Vector Quantizer
Let 𝑠 𝑘 = [𝑠1, 𝑠2, … . 𝑠 𝑁] 𝑇
be an N dimensional vector with real valued samples in the range 1≤
k ≤N. The superscript T in the vector 𝑠 𝑘 denotes the transpose of the vector. In vector
quantization, a real valued N dimensional input vector 𝑠 𝑘 is matched with the real valued N
dimensional codewords of the codebook N = 2 𝑏
. The code word that best matches the input
vector with lowest distortion is taken and the input vector is replaced by it. The codebook
consists of a finite set of codewords C = 𝐶𝑖 , 1≤ i ≤L, where 𝐶𝑖 =
[𝐶1𝑖, 𝐶2𝑖, … . 𝐶 𝑁𝑖] 𝑇
, where C is the codebook, L is the length of the codebook and 𝐶𝑖 denote
the ith codeword in a codebook.
Advantages
Advantages with vector quantization compared to scalar quantization.
 Can utilize the memory of the source.
 The distortion at a given rate will always be lower when increasing the number of
dimensions, even for a memoryless source.
Index iInput Vector
Buffe r
Codebook with
Codeword’s
C
Vector
Quantizer
( )ns ks
iC
Disadvantages
Disadvantages with vector quantization compared to scalar quantization.
 Both the storage space and the time needed to perform the quantization grows faster
than exponentially with the number of dimensions. Since there is no structure to the
codebook (in the general case) we will have to compare each signal vector with every
reconstruction vector in the codebook to find the closest one.
Applications
A few classical examples of applications include:
 Medical image storage (e.g. Magnetic Resonance Imaging). e.g. Magnetic resonance
image compression using scalar-vector quantization, or Compression of skin tumor
images;
 Satellite image storage and transmission (e.g. Remote Sensing). e.g. A vector
quantization-based coding scheme for television transmission via satellite;
 Transmission of audio signals through old noisy radio mobile communication channels.
e.g. A study of vector quantization for noisy channels; see also Competitive learning
algorithms for robust vector quantization for more examples in transmission
applications; etc.
More recent applications have been integrating VQ in several machine learning tasks such as:
 Speaker identification. e.g. A discriminative training algorithm for VQ-based speaker
identification;
 Image Steganography. e.g. High-capacity image hiding scheme based on vector
quantization, or Steganography using overlapping codebook partition; etc.
Linear predictive coding (LPC)
Linear predictive coding (LPC) is a method for signal source modelling in speech signal
processing. It is often used by linguists as a formant extraction tool. It has wide application in
other areas. LPC analysis is usually most appropriate for modeling vowels which are periodic,
except nasalized vowels. LPC is based on the source-filter model of speech signal.
Envelope Calculation
The LPC method is quite close to the FFT. The envelope is calculated from a number of
formants or poles specified by the user.
 The formants are estimated removing their effects from the speech signal, and
estimating the intensity and frequency of the remaining buzz. The removing process is
called inverse filtering, and the remaining signal is called the residue.
 The speech signal – source – is synthesized from the buzz parameters and the residue.
The source is ran through the filter – formants –, resulting in speech.
 The process is iterated several time is a second, with "frames". A 30 to 50 frames rate
per second yields and intelligible speech.
Advantages
Its main advantage comes from the reference to a simplified vocal tract model and the analogy
of a source-filter model with the speech production system. It is a useful methods for encoding
speech at a low bit rate.
Limitations
The LPC performance is limited by the method itself, and the local characteristics of the
signal.
 The harmonic spectrum sub-samples the spectral envelope, which produces a spectral
aliasing. These problems are especially manifested in voiced and high-pitched signals,
affecting the first harmonics of the signal, which refer to the perceived speech quality
and formant dynamics.
 A correct all-pole model for the signal spectrum can hardly be obtained.
 The desired spectral information, the spectral envelope is not represented: we get too
close to the original spectra. The LPC follows the curve of the spectrum down to the
residual noise level in the gap between two harmonics, or partials spaced too far apart.
It does not represent the desired spectral information to be modeled since we are
interested in fitting the spectral envelope as close as possible and not the original
spectra. The spectral envelope should be a smooth function passing through the
prominent peaks of the spectrum, yielding a flat sequence, and not the "valleys" formed
by the harmonic peaks.
Fig-05 : Comparing several envelope estimation methods
Applications
1. LPC, a statistical method for predicting future values of a waveform on the basis of its
past values1
, is often used to obtain a spectral envelope.
2. LPC differs from formant tracking in that:
 the waveform remains in the time domain; resonances are described by the
coefficients of an all-pole filter.
 altering resonances is difficult since editing IIR filter coefficients can result in an
unstable filter.
 analysis may be applied to a wide range of sounds.
3. LPC is often used to determine the filter in a source-filter model of speech2
which:
 characterizes the response of the vocal tract.
 reconstitutes the speech waveform when driven by the correct source.

More Related Content

DOC
Seminar Report on image compression
PPTX
Big data introduction
PDF
MMS2401 - Multimedia system and Communication Notes
PPTX
Important Classification and Regression Metrics.pptx
PPTX
web technologies Unit 5
PPTX
Image Processing and Computer Vision
PDF
Performance Metrics for Machine Learning Algorithms
Seminar Report on image compression
Big data introduction
MMS2401 - Multimedia system and Communication Notes
Important Classification and Regression Metrics.pptx
web technologies Unit 5
Image Processing and Computer Vision
Performance Metrics for Machine Learning Algorithms

What's hot (14)

PPTX
SYBSC IT COMPUTER NETWORKS UNIT I Digital and Analog Transmission
PDF
Image Registration (Digital Image Processing)
PPT
Lecture 6 -_presentation_layer
PPTX
Image compression 14_04_2020 (1)
PPT
One shot scene specific crowd counting
PDF
Lecture 4 Relationship between pixels
PDF
PCA (Principal Component Analysis)
PDF
Basics of image processing & analysis
PDF
Data Communication & computer network: Channel types
PPTX
Deep Learning Applications | Deep Learning Applications In Real Life | Deep l...
PDF
Scaling and Normalization
PPTX
Transform coding
PPTX
Basic Introduction of Data Warehousing from Adiva Consulting
SYBSC IT COMPUTER NETWORKS UNIT I Digital and Analog Transmission
Image Registration (Digital Image Processing)
Lecture 6 -_presentation_layer
Image compression 14_04_2020 (1)
One shot scene specific crowd counting
Lecture 4 Relationship between pixels
PCA (Principal Component Analysis)
Basics of image processing & analysis
Data Communication & computer network: Channel types
Deep Learning Applications | Deep Learning Applications In Real Life | Deep l...
Scaling and Normalization
Transform coding
Basic Introduction of Data Warehousing from Adiva Consulting
Ad

Similar to Dynamic time wrapping (dtw), vector quantization(vq), linear predictive coding (lpc) (20)

PPTX
Multimedia lossy compression algorithms
PDF
Speech recognition using vector quantization through modified k means lbg alg...
PDF
E017263040
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
PDF
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
PDF
An Image representation using Compressive Sensing and Arithmetic Coding
PDF
Non standard size image compression with reversible embedded wavelets
PDF
Non standard size image compression with reversible embedded wavelets
PDF
International Journal of Engineering Research and Development (IJERD)
PPTX
Digital communication unit II
PDF
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
PPTX
Speaker recognition systems
PDF
MDCT audio coding with pulse vector quantizers
PDF
Effect of grid adaptive interpolation over depth images
PDF
QRC-ESPRIT Method for Wideband Signals
PDF
I0341042048
PDF
Switched Multistage Vector Quantizer
Multimedia lossy compression algorithms
Speech recognition using vector quantization through modified k means lbg alg...
E017263040
International Journal of Engineering Research and Development (IJERD)
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
Investigative Compression Of Lossy Images By Enactment Of Lattice Vector Quan...
An Image representation using Compressive Sensing and Arithmetic Coding
Non standard size image compression with reversible embedded wavelets
Non standard size image compression with reversible embedded wavelets
International Journal of Engineering Research and Development (IJERD)
Digital communication unit II
Fixed Point Realization of Iterative LR-Aided Soft MIMO Decoding Algorithm
Speaker recognition systems
MDCT audio coding with pulse vector quantizers
Effect of grid adaptive interpolation over depth images
QRC-ESPRIT Method for Wideband Signals
I0341042048
Switched Multistage Vector Quantizer
Ad

More from Tanjarul Islam Mishu (11)

PDF
Vulnerabilities of Fingerprint Authentication Systems and Their Securities
PPTX
Dynamic time wrapping
PPTX
A presentation on windowing
PPTX
PPTX
Mobile satellite communication
PPTX
Shop management system
PDF
E health system design
PPTX
Multiplication algorithm, hardware and flowchart
PPTX
Rules of Karnaugh Map
PPSX
Implement Fingerprint authentication for employee automation system
PDF
Implement fingerprint authentication for employee automation system
Vulnerabilities of Fingerprint Authentication Systems and Their Securities
Dynamic time wrapping
A presentation on windowing
Mobile satellite communication
Shop management system
E health system design
Multiplication algorithm, hardware and flowchart
Rules of Karnaugh Map
Implement Fingerprint authentication for employee automation system
Implement fingerprint authentication for employee automation system

Recently uploaded (20)

PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Cell Types and Its function , kingdom of life
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
GDM (1) (1).pptx small presentation for students
A systematic review of self-coping strategies used by university students to ...
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Cell Types and Its function , kingdom of life
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Microbial disease of the cardiovascular and lymphatic systems
102 student loan defaulters named and shamed – Is someone you know on the list?
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
O7-L3 Supply Chain Operations - ICLT Program
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Microbial diseases, their pathogenesis and prophylaxis
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Supply Chain Operations Speaking Notes -ICLT Program
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
GDM (1) (1).pptx small presentation for students

Dynamic time wrapping (dtw), vector quantization(vq), linear predictive coding (lpc)

  • 1. Dynamic Time Wrapping (DTW) Dynamic time wrapping (DTW) is a well-known technique to find an optimal alignment between two given (time-dependent) sequences under certain restrictions (Fig. 01). Intuitively, the sequences are warped in a nonlinear fashion to match each other. Originally, DTW has been used to compare different speech patterns in automatic speech recognition. In fields such as data mining and information retrieval, DTW has been successfully applied to automatically cope with time deformations and different speeds associated with time-dependent data. Fig- 01 showing two speech signal aligned with DTW. Fig-01: Two Speech Signal Aligned using DTW Fig. 02: Time alignment of two time-dependent sequences. Aligned points are indicated by the arrows The distance between two point, x=[x1,x2,...,xn] and y=[y1,y2,...,yn] in a n-dimensional space can be computed via the Euclidean distance: dist(x,y)=∥x−y∥=√((𝑥1 − 𝑦1)2 + (𝑥2 − 𝑦2)2 + ⋯ + (𝑥 𝑛 − 𝑦 𝑛)2)
  • 2. However, if the length of x is different from y, then we cannot use the above formula to compute the distance. Instead, we need a more flexible method that can find the best mapping from elements in x to those in y in order to compute the distance. The goal of dynamic time wrapping (DTW for short) is to find the best mapping with the minimum distance by the use of DP. The method is called "time wrapping " since both x and y are usually vectors of time series and we need to compress or expand in time in order to find the best mapping. We shall give the formula for DTW in this section. Let t and r be two vectors of lengths m and n, respectively. The goal of DTW is to find a mapping path {(p1,q1),(p2,q2),...,(pk,qk)} such that the distance on this mapping path ∑ |𝑡(𝑝𝑖) − 𝑟(𝑞𝑖)|𝑘 𝑖=0 is minimized, with the following constraints:  Boundary conditions: (p1,q1)=(1,1), (pk,qk)=(m,n). This is a typical example of "anchored beginning" and "anchored end".  Local constraint: For any given node (i,j) in the path, the possible fan-in nodes are restricted to (i−1,j), (i,j−1), (i−1,j−1). This local constraint guarantees that the mapping path is monotonically non-decreasing in its first and second arguments. Moreover, for any given element in t, we should be able to find at least one corresponding element in r, and vice versa. How can we find the optimum mapping path in DTW? An obvious choice is forward DP, which can be summarized in the following three steps: 1. Optimum-value function: Define D(i,j) as the DTW distance between t(1:i) and r(1:j), with the mapping path starting from (1,1) to (i,j). 2. Recursion: with the initial condition D(1,1)=|t(1)−r(1)| 3. Final answer: D(m,n). In practice, we need to construct a matrix D of dimensions m×n first and fill in the value of D(1,1) by using the initial condition. Then by using the recursive formula, we fill the whole
  • 3. matrix one element at a time, by following a column-by-column or row-by-row order. The final answer will be available as D(m,n), with a computational complexity of O(mn). If we want to know the optimum mapping path in addition to the minimum distance, we may want to keep the optimum fan-in of each node. Then at the end of DP, we can quickly back track to find the optimum mapping path between the two input vectors. We can also have the backward DP for DTW, as follows: 1. Optimum-value function: Define D(i,j) as the DTW distance between t(i:m) and r(j:n), with the mapping path from (i,j) to (m,n). 2. Recursion: with the initial condition D(m,n)=|t(m)−r(n)| 3. Final answer: D(1,1). The answer obtain by the backward DP should be that same as that obtained by the forward DP. Another commonly used local path constraint is to set the fan-in of 27°-45°-63° only, as shown in the following figure: Advantages  Works well for small number of templates (<20)  Language independent  Speaker specific  Easy to train (end user controls it) Disadvantages  Limited number of templates  Need actual training examples
  • 4. Applications  Spoken word recognition Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis which needs to be eliminated. Considering any two speech patterns, we can get rid of their timing differences by wrapping the time axis of one so that the maximum coincidence is attained with the other.  Correlation Power Analysis Unstable clocks are used to defeat naive power analysis. Several techniques are used to counter this defense, one of which is dynamic time warp.
  • 5. Vector Quantization Quantization is a process of mapping an infinite set of scalar or vector quantities by a finite set of scalar or vector quantities. Quantization has applications in the areas of signal processing, speech processing and Image processing. In speech coding, quantization is required to reduce the number of bits used to represent a sample of speech signal. When less number of bits is used to represent a sample the bit-rate, complexity and memory requirement gets reduced. Quantization results in the loss in the quality of a speech signal, which is undesirable. So a compromise must be made between the reduction in bit-rate and the quality of speech signal. Two types of quantization techniques exist. They are scalar quantization and vector quantization. “Scalar quantization deals with the quantization of samples on a sample by sample basis”, while “vector quantization deals with quantizing the samples in groups called vectors”. Vector quantization increases the optimality of a quantizer at the cost of increased computational complexity and memory requirements. Shannon theory states that “quantizing a vector is more effective than quantizing individual scalar values in terms of spectral distortion”. According to Shannon the dimension of a vector chosen greatly affects the performance of quantization. Vectors of larger dimension produce better quality compared to vectors of smaller dimension and in vectors of smaller dimension the transparency in the quantization is not good at a particular bit-rate chosen. This is because in vectors of smaller dimension the correlation that exists between the samples is lost and the scalar quantization itself destroys the correlation that exists between successive samples. So the quality of the quantized speech signal gets lost. Therefore, quantizing correlated data requires techniques that preserve the correlation between the samples, which is achieved by the vector quantization technique (VQ). Vector quantization is the simplification of scalar quantization. Vectors of larger dimension produce transparency in quantization at a particular bit-rate chosen. In Vector quantization the data is quantized in the form of contiguous blocks called vectors rather than individual samples. But later with the development of better coding techniques, it is made possible that transparency in quantization can also be achieved even for vectors of smaller dimension. In this thesis quantization is performed on vectors of full length and on vectors of smaller dimensions for a given bit-rate. Fig-03 : Two Dimensional Vector Quantizer
  • 6. An example of two dimensional vector quantizer is shown in Fig-03. The two dimensional region shown in Fig-03 is called as the voronoi region, which in turn contains several numbers of small hexagonal regions. The hexagonal regions defined by the red borders are called as the encoding regions. The green dots represent the vectors to be quantized which fall in different hexagonal regions and the blue circles represent the codeword‟s (centroids). The vectors (green dots) falling in a particular hexagonal region is best represented by the codeword (blue circle) falling in that hexagonal region. Vector quantization technique has become a great tool with the development of non variational design algorithms like the Linde, Buzo, Gray (LBG) algorithm. On the other hand besides spectral distortion the vector quantizer is having its own limitations like the computational complexity and memory requirements required for the searching and storing of the codebooks. For applications requiring higher bit-rates the computational complexity and memory requirements increases exponentially. The block diagram of a vector quantizer is shown in Fig- 04 Fig-04 : Block diagram of Vector Quantizer Let 𝑠 𝑘 = [𝑠1, 𝑠2, … . 𝑠 𝑁] 𝑇 be an N dimensional vector with real valued samples in the range 1≤ k ≤N. The superscript T in the vector 𝑠 𝑘 denotes the transpose of the vector. In vector quantization, a real valued N dimensional input vector 𝑠 𝑘 is matched with the real valued N dimensional codewords of the codebook N = 2 𝑏 . The code word that best matches the input vector with lowest distortion is taken and the input vector is replaced by it. The codebook consists of a finite set of codewords C = 𝐶𝑖 , 1≤ i ≤L, where 𝐶𝑖 = [𝐶1𝑖, 𝐶2𝑖, … . 𝐶 𝑁𝑖] 𝑇 , where C is the codebook, L is the length of the codebook and 𝐶𝑖 denote the ith codeword in a codebook. Advantages Advantages with vector quantization compared to scalar quantization.  Can utilize the memory of the source.  The distortion at a given rate will always be lower when increasing the number of dimensions, even for a memoryless source. Index iInput Vector Buffe r Codebook with Codeword’s C Vector Quantizer ( )ns ks iC
  • 7. Disadvantages Disadvantages with vector quantization compared to scalar quantization.  Both the storage space and the time needed to perform the quantization grows faster than exponentially with the number of dimensions. Since there is no structure to the codebook (in the general case) we will have to compare each signal vector with every reconstruction vector in the codebook to find the closest one. Applications A few classical examples of applications include:  Medical image storage (e.g. Magnetic Resonance Imaging). e.g. Magnetic resonance image compression using scalar-vector quantization, or Compression of skin tumor images;  Satellite image storage and transmission (e.g. Remote Sensing). e.g. A vector quantization-based coding scheme for television transmission via satellite;  Transmission of audio signals through old noisy radio mobile communication channels. e.g. A study of vector quantization for noisy channels; see also Competitive learning algorithms for robust vector quantization for more examples in transmission applications; etc. More recent applications have been integrating VQ in several machine learning tasks such as:  Speaker identification. e.g. A discriminative training algorithm for VQ-based speaker identification;  Image Steganography. e.g. High-capacity image hiding scheme based on vector quantization, or Steganography using overlapping codebook partition; etc.
  • 8. Linear predictive coding (LPC) Linear predictive coding (LPC) is a method for signal source modelling in speech signal processing. It is often used by linguists as a formant extraction tool. It has wide application in other areas. LPC analysis is usually most appropriate for modeling vowels which are periodic, except nasalized vowels. LPC is based on the source-filter model of speech signal. Envelope Calculation The LPC method is quite close to the FFT. The envelope is calculated from a number of formants or poles specified by the user.  The formants are estimated removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The removing process is called inverse filtering, and the remaining signal is called the residue.  The speech signal – source – is synthesized from the buzz parameters and the residue. The source is ran through the filter – formants –, resulting in speech.  The process is iterated several time is a second, with "frames". A 30 to 50 frames rate per second yields and intelligible speech. Advantages Its main advantage comes from the reference to a simplified vocal tract model and the analogy of a source-filter model with the speech production system. It is a useful methods for encoding speech at a low bit rate. Limitations The LPC performance is limited by the method itself, and the local characteristics of the signal.  The harmonic spectrum sub-samples the spectral envelope, which produces a spectral aliasing. These problems are especially manifested in voiced and high-pitched signals, affecting the first harmonics of the signal, which refer to the perceived speech quality and formant dynamics.  A correct all-pole model for the signal spectrum can hardly be obtained.  The desired spectral information, the spectral envelope is not represented: we get too close to the original spectra. The LPC follows the curve of the spectrum down to the residual noise level in the gap between two harmonics, or partials spaced too far apart. It does not represent the desired spectral information to be modeled since we are interested in fitting the spectral envelope as close as possible and not the original spectra. The spectral envelope should be a smooth function passing through the prominent peaks of the spectrum, yielding a flat sequence, and not the "valleys" formed by the harmonic peaks.
  • 9. Fig-05 : Comparing several envelope estimation methods Applications 1. LPC, a statistical method for predicting future values of a waveform on the basis of its past values1 , is often used to obtain a spectral envelope. 2. LPC differs from formant tracking in that:  the waveform remains in the time domain; resonances are described by the coefficients of an all-pole filter.  altering resonances is difficult since editing IIR filter coefficients can result in an unstable filter.  analysis may be applied to a wide range of sounds. 3. LPC is often used to determine the filter in a source-filter model of speech2 which:  characterizes the response of the vocal tract.  reconstitutes the speech waveform when driven by the correct source.