Av 738 - Adaptive Filtering Lecture 1 - Introduction

Av-738
Adaptive Filter Theory
Lecture 1- Introduction
Dr. Bilal A. Siddiqui
Air University (PAC Campus)
Spring 2018

Course Outline
• Introduction/Background material
• Linear Optimum Filtering
• Weiner filter
• Kalman filter
• Linear Adaptive Filtering
• Method of steepest descent
• Least Mean Square (LMS) filter
• Recursive Least Squares (RLS) filter
• Tracking time varying systems
• Nonlinear Adaptive Filtering
• Back-propagation learning (BP) neural networks
• Radial basis function (RBF) neural networks
• Applications to student voted areas of interest

Grading and then some
• Grades and absentee policies as in vogue at AU.
• One homework every week.
• Difficulty level will increase with each week.
• Quiz of 05-10 minutes each week, so be prepared.
• Term project for the course will also carry a major chunk of the grade.
• We will review some background material but basic knowledge of random
processes, calculus, discrete transforms and complex algebra is assumed.
• I don’t fail people. I abhor it. But getting caught cheating is an automatic F.
• Be regular. The course is fast paced; there is much to loose if you don’t
attend.
• Try to tailor the course project with your thesis. This keeps your interested.
• Try to publish from your course project. This keeps me interested!

Text Books
• Simon Haykin, Adaptive Filter Theory, 4th ed., Prentice Hall, 2002.
• Ali H. Syed, “Fundamentals of Adaptive Filtering”, J.W. & Sons, 2003.
• Behrouz Farhang-Boroujeny, “Adaptive Filters – Theory and
Applications”, 2nd Ed., J.W. & Sons, 2013
• Paulo S.R. Diniz, “Adaptive Filtering - Algorithms and Practical
Implementation”, 3rd Ed., Springer, 2008
• Poularikas and Ramadan, “Adaptive Filtering Primer with Matlab”,
CRC Press, 2006.

What is this course about?
• Data is often corrupted and coupled with noise.
• Sources of noise are
• Sensor noise and bias
• Quantization error due to digital conversion
• Corruption due to communication channel
• To make sense of it all, one needs to filter the chaff from the grain.
• Filter is a device (hard or soft) which is applied to some noisy and
corrupt data to extract information we desire.
• There is a difference between data and information!

What does a filter do?
• A filter does one of the following 3 basic info-processing tasks
• Filtering: extraction of desired information at current time ‘t’ from data
collected upto (and including) ‘t’
• Smoothing: information is desired at some time t0 by using data acquired
before and after some past time 𝑡0 = 𝑡 − 𝜏 including current time t. Since t<t0,
there is a delay in obtaining the desired information. Since data before and
after t0 is used, we expect smoothing to be more accurate than filtering
• Predicting: derive what the desired information will be like in some future time
𝑡 + 𝜏, by using data measured up to (and including) current time t.
• A filter may be linear if its output is a linear function of data
measured.
• It is a nonlinear filter if it is not linear 
• Linear filters are easier to implement, understand and design, but
nonlinear filters are more accurate (but considerably more “difficult”)

An example (communications)
• Transmitter converts digital (1-0) signal into a waveform suitable for transmission
• Channel suffers from two types of impairments:
• Intersymbol interference. Channel transmission is “good” in some frequencies and not in
others, which causes some messages to be “smeared”
• Noise. This can be any internal or external (usually additive) interference. Can be electronic
noise due to thermal variations.
• The net result is a received signal which is both a noisy and distorted version of
transmitted message.
• The receiver therefore needs to be equipped with a filter to remove the noise and
recover the original message. Think of how clear videos on YouTube are, for
example.
Digital source of
info
Transmitter Channel Receiver User of info

Another example (aerospace)
• Generally the equation describing system evolution (F=ma) is not entirely accurate. This is
called “process noise”. Wind and turbulence maybe another source of system error.
• Errors in measurement may be additive, zero mean noise, or constant (or slow varying)
bias.
• State vector x(t) consists of positions and velocities.
• It may not be possible to measure each state individually.
• In other words, measurements may be functions of various states (e.g. 𝑉𝑣𝑖 =
𝑉 sin(𝜃 − 𝛼))……and it is a pain in the neck measuring 𝛼
• Measurement maybe through some radar which adds noise
• Prior information is assumed knowledge of statistical parameters (mean and correlation)
of process and noise. This again is a problem. How do we know signal statistics?
Aircraft Sensors Filter
observations
y(t)
state
x(t)
state estimate
𝑥(𝑡)
system errors
and disturbances
measurement
errors
Prior
information

Optimum Filters
• Optimal means “best”. Here it means best in some statistical sense.
• The requirement to design the best (optimum) filter which takes noisy data as
input and minimizes the effect of noise according to some statistical criterion.
• A widely used statistical criterion is minimization of mean square of the error
signal. Squaring makes the “error surface” smooth, something which makes
optimization easy.
• If the process is “stationary” (mean and variance do not vary with time), the
resulting filter is called the Wiener Filter. The filter parameters will also be fixed
(not varying with time).
• In the more general case of non-stationary process or noise, the filter parameters
will also vary with time. The optimum solution for dynamic process and non-
stationary noise is called Kalman Filter.

A disadvantage of Wiener filter
• Wiener filter is time-invariant since the noise and process statistics is stationary (or very slowly varying).
• For time-invariant filters, parameters and the structure of the filter are fixed. Once prescribed specifications are
given, the design of time-invariant linear filters entails three basic steps:
1. the approximation of signal statistics
2. the choice of an appropriate structure defining the filter
3. The choice of the filter parameter, depending on outcomes of steps 1 and 2.
• Weiner filter requires a priori knowledge of statistics of the data to be processed! The filter is optimum only when
actual statistics of input data matches a priori information on which the filter was designed.
• If a priori info is not available, it is not possible to design the Wiener filter.
• One way to do this is to wait for sufficient data to be collected before designing the Wiener filter. This is inefficient
and cannot be applied in real time
• By real time, we mean an operation in which the filter estimate is based on data available now!
• To mitigate this disadvantage, we design “adaptive filters”. Adaptive filter perform steps 1 and 3 above online (real
time). Sometimes, step 2 is also automated for online design.

Adaptive Filters
• An adaptive filter is “self-designing in the sense that the algorithm is recursive
(updates filter parameters or structure) every time new data is available.
• Filter parameters are therefore “data dependent” and time varying.
• This makes it possible for filter to perform satisfactorily when complete signal
stats are not available.
• An adaptive filter is also required when performance specifications cannot be
satisfied by time-invariant filters.
• The algorithm starts from initial conditions representing the “best guess” of signal
statistics we think represent the environment we operate the filter in.
• In a stationary environment (process in not dynamic and noise stats are time-
invariant), the adaptive filter eventually converges to the Wiener filter after
“some” iterations!
• In other words, instead of finding the optimum filter parameters and performance
in one shot, we hit the optimum by “learning” in a “trial and error” manner.
• In a non-stationary environment, adaptive filters can “track” sufficiently slow
variations in signal statistics (depending on “filter bandwidth”).

Is the adaptive filter linear?
• What does linearity mean?
• Superposition holds (sum of two inputs produce the sum of their individual
outputs, i.e. 𝑦 𝑥1 + 𝑥2 = 𝑦 𝑥1 + 𝑦 𝑥2
• Homogeneity holds (scaling the input by a factor scales the output by the
same factor, i.e. 𝑦 𝑎. 𝑥 = 𝑎. 𝑦(𝑥)
• Adaptive filters are time-varying since their parameters are
continually changing in order to meet a performance requirement.
• Strictly speaking, an adaptive filter is a nonlinear filter since its
characteristics are dependent on the input signal and consequently
the homogeneity and superposition conditions are not satisfied.
• However, if we freeze the filter parameters at a given instant
of time, most adaptive filters considered in this course are linear in
the sense that their output signals are linear functions of their input
signals.
• Are neural networks linear filters?

Choice of Adaptive Filter algorithms
• A wide variety of adaptive filtering algorithms have been developed
• Choice of one algo over another depends on following factors
• Rate of convergence (in how many iterations does the algo converge “close”
to the performance of Wiener filter)
• Misadjustment (how “far” was the final estimate from the Wiener estimate)
• Tracking (ability to track variations in signal statistics)
• Robustness (disturbances should not produce large errors in the estimate)
• Computational Cost (number of FLOPs and memory requirements)
• Modularity (processes can be cascaded (in series). This favors VLSI
implementation)
• Parallelization (not all algorithms can be parallelized to use GPU)
• Numerical accuracy (robustness against word-length of ADC  quantization
errors)

Applications of Adaptive filtering
• A wide variety of applications in
• Digital signal processing (DSP)
• Control systems (particularly nonlinear control)
• Adaptive filters have received wide applications in
• Communications
• Radar/Sonar
• Seismology
• Biomedical engineering
• Aerospace and mechanical engineering etc.
• There is something common in all of the above, though.
• There is an input, a desired response and the error
between them to adjust filter parameters.
• There are four classes of adaptive filtering applications

Application Class I
System Identification
• In this class, the adaptive filter is used to identify a mathematical model
which represents the “best fit” to the behavior of the unknown system
(plant)
• Plant and filter are driven by the same input.
• Therefore, the input is also designed to excite all possible behaviors of
the plant. This is an important point: the input should be “sufficiently
exciting”.
• The resulting model can also be used for control design and prediction.
• Sys Id=Channel estimation in Communication Eng.

Application Class II
Inverse Modeling / Channel Equalization
• In this class, the adaptive filter is used to provide an inverse model that
represents the “best” cancellation of the unknown plant’s dynamics.
• Ideally, for a linear plant, the transfer function of the filter will be the
reciprocal of the plant’s transfer function.
• In Communications Eng., this is the task of Channel Equalization
(combination of plant (channel) and its inverse represent an ideal
transmission medium.
• A delayed version of system input serves as desired response. In some
cases no delay is used

Application Class III
Prediction
• In this class, the adaptive filter is used to provide the best prediction (in a
statistical sense) of the present (or future) value of a random signal (original
signal corrupted with noise)
• Past values of the signal serve as inputs to the filter
• Depending on the application, the filter output or prediction error serves as
system output
• For output 2, the system operates as a predictor.
• For output 1, it operates as a prediction-error filter. This error is used to drive
some other process. This is used in speech recognition.

Application Class IV
Interference Cancellation
• In this class, the adaptive filter is used to cancel unknown interference
in the primary signal
• Primary signal = sensor measurement = information carrying signal +
unknown interference
• A reference signal is used as input to excite the filter
• The purpose is to cancel noise and interferences (slow varying biases)

Applications in Brief
Class of Adaptive Filtering Application
1. Identification 1. System identification (control design)
2. Channel estimation
3. Layered earth modeling
2. Inverse 1. Channel equalization
2. Blind deconvolution
3. Prediction 1. Linear predictive coding (speech recognition)
2. Spectrum analysis
3. Signal detection
4. Interference cancellation 1. Noise and echo cancellation (ECG)
2. Adaptive beam-forming (radars)

Some historical notes
(Prof. Ali Syed, UCLA)

Adaptive Filter Structure
• Operation of linear adaptive filters requires two basic processes:
• A filtering process which produces an output response to a sequence of input
data
• An adaptation process which recursively adjusts parameters of the filter
• These two processes are interactive.
• Choice of structure of the filter obviously has a profound effect on the
operation of the algorithm
• Linear adaptive filters generally have two forms:
• Infinite impulse response (IIR) filters (infinite but fading memory)
• Finite impulse response (FIR) filters (finite memory)

Filter Structure:
Transversal FIR Filters
• Filters with finite memory (or those which discard measurements beyond a certain
point in history) are of two basic types.
• The first type is “transversal filter”, aka “tapped delay line”
• Consists of (1) unit element delay (2) multiplier and (3) adder
• Number of delay elements (M) determines filter length aka filter order
𝑦 𝑛 = 𝑘=0
𝑀
𝑤 𝑘 ∗ 𝑢(𝑛 − 𝑘)
* represents
complex
conjugate, since in
general signal may
be complex

Filter Structure:
Lattice FIR Filter
• Lattice predictors have a modular structure: number of cascaded stages.
• Each stage “looks” like a lattice cell, hence the name.
• Each stage is represented by a pair of input-output relations
𝑓𝑚 𝑛 = 𝑓 𝑚−1 𝑛 + 𝑘 𝑚
∗
𝑏 𝑚−1 𝑛 − 1
𝑏 𝑚 𝑛 = 𝑏 𝑚−1 𝑛 − 1 + 𝑘 𝑚 𝑓 𝑚−1(𝑛)
• Here, f is the forward prediction error and b is the backward prediction error.
• The filter is initialized as
𝑓0 𝑛 = 𝑏0 𝑛 = 𝑢0(𝑛)
Each stage is solved sequentially.
A linear combination of backward prediction errors can be used to provide prediction of
some desired response.

Filter Structure: FIR Lattice Predictor

Filter Structure:
Recursive or IIR filters
• FIR filters only had feed forward
elements. There is no danger of
instability!
• IIR filters on the other hand have
feedback elements. Feedback gives
infinite memory. But feedback can
also destabilize if it is “too much”.
• IIR can become unstable if the filter
parameters are not chosen correctly.
• Nevertheless IIR filters have their
uses and there are methods of
tuning them and keeping them
stable

Adaptation Techniques
• There is no unique way of adapting the filter parameters (weights)
• It is important to understand capabilities and limitations of all
available adaptation algorithms.
• This understanding will allow us to select the
• Two of the more popular techniques for adaptation are
• Stochastic gradient adaptation algorithms
• Least squares adaptation algorithms
• We discuss each briefly, next.

Adaptation Algorithms
Stochastic Gradient Approach
• Structure used for implementation is transversal (tapped-delay line)
• The cost function to be minimized is the mean-squared error.
• Cost function is shaped like a multidimensional paraboloid, with a
unique minimum (which corresponds to Wiener solution), which we
try to reach by the “method of steepest descent”
• The cost function is a second order function of filter weights
• The recursive algorithm which results is called the celebrated Least
Mean Squares (LMS) algorithm

Stochastic Gradient Approach
• The tap-weights (filter-weights) are updated with the following law
• In a non-stationary environment the error-performance surface changes
continuously, so the LMS must continually track the bottom of this surface
• Changes in the input statistics must be slow compared with the LMS
learning rate for tracking to occur
• It also converges very slowly and is sensitive to the “quality” of input data.
• For lattice filter structure, the stochastic gradient approach produces the
“Gradient Adaptive Lattice” (GAL) algorithm.

Least Squares Estimation Approach
• These algorithms adapt filter weights based on the method of least
squares developed by Gauss.
• Cost function consists of weighted sum of squared errors.
• One popular algorithm of this kind is the Recursive Least Squares (RLS)
algorithm which is a special form of Kalman filter.
• In a Kalman filter, a “state” is updated using new information coming into
the filter (called “innovation”)

Least Squares Estimation Approach
• Stochastic gradient algorithms are model independent. They have good
tracking performance.
• Least squares algorithms are model dependent and may perform inferior to
LMS algorithms if the model structure is not accurate.
• However, RLS converges more rapidly than LMS.
• Three basic classes of recursive least squares algorithms exist:
• Standard Recursive least-squares (RLS): The algorithm relies on the matrix-inversion
lemma; the algorithm converges rapidly and has rather high computational
complexity; lack
of numerical robustness
– Square-root RLS: numerically robust form of the standard RLS
– Fast RLS: less computations (hence faster) version, but numerically less stable.

How to choose adaptive filtering algorithms?
• When choosing an adaptive filter practical issues that are important:
• computational cost
• Performance
• Robustness
• Adaptive filter algorithms generally assume the input data is in
baseband form, for bandpass signals this means complex baseband
following frequency translation
𝑢 𝑛 = 𝑢𝐼 𝑛 + 𝑗𝑢 𝑄 𝑛
Algorithms are thus typically developed in complex form, with
the real form being a special case
• The use of computer simulation is very useful as a first step in
the evaluation process

Term Projects
• Floor is open for discussion

Av 738 - Adaptive Filtering Lecture 1 - Introduction

More Related Content

What's hot (20)

Similar to Av 738 - Adaptive Filtering Lecture 1 - Introduction (20)

More from Dr. Bilal Siddiqui, C.Eng., MIMechE, FRAeS (20)

Recently uploaded (20)

Av 738 - Adaptive Filtering Lecture 1 - Introduction