SlideShare a Scribd company logo
Numerical Solution of Linear Systems
Chen Greif
Department of Computer Science
The University of British Columbia
Vancouver B.C.
Tel Aviv University
December 17, 2008
1
Outline
1 Direct Solution Methods
Classification of Methods
Gaussian Elimination and LU Decomposition
Special Matrices
Ordering Strategies
2 Conditioning and Accuracy
Upper bound on the error
The Condition Number
3 Iterative Methods
Motivation
Basic Stationary Methods
Nonstationary Methods
Preconditioning
A Dense Matrix
3
Top Ten Algorithms in Science (Dongarra and Sullivan,
2000)
1 Metropolis Algorithm (Monte Carlo method)
2 Simplex Method for Linear Programming
3 Krylov Subspace Iteration Methods
4 The Decompositional Approach to Matrix Computations
5 The Fortran Optimizing Compiler
6 QR Algorithm for Computing Eigenvalues
7 Quicksort Algorithm for Sorting
8 Fast Fourier Transform
9 Integer Relation Detection Algorithm
10 Fast Multipole Method
Red: Algorithms within the exclusive domain of NLA research.
Blue: Algorithms strongly (though not exclusively) connected
to NLA research.
4
Outline
1 Direct Solution Methods
Classification of Methods
Gaussian Elimination and LU Decomposition
Special Matrices
Ordering Strategies
2 Conditioning and Accuracy
Upper bound on the error
The Condition Number
3 Iterative Methods
Motivation
Basic Stationary Methods
Nonstationary Methods
Preconditioning
Two types of methods
Numerical methods for solving linear systems of equations can
generally be divided into two classes:
Direct methods. In the absence of roundoff error such
methods would yield the exact solution within a finite number
of steps.
Iterative methods. These are methods that are useful for
problems involving special, very large matrices.
6
Gaussian Elimination and LU Decomposition
Assumptions:
The given matrix A is real, n × n and nonsingular.
The problem Ax = b therefore has a unique solution x for any
given vector b in Rn.
The basic direct method for solving linear systems of equations is
Gaussian elimination. The bulk of the algorithm involves only the
matrix A and amounts to its decomposition into a product of two
matrices that have a simpler form. This is called an LU
decomposition.
7
How to
solve linear equations when A is in upper triangular form. The
algorithm is called backward substitution.
transform a general system of linear equations into an upper
triangular form, where backward substitution can be applied.
The algorithm is called Gaussian elimination.
8
Backward Substitution
Start easy:
If A is diagonal:
A =





a11
a22
...
ann





,
then the linear equations are uncoupled and the solution is
obviously
xi =
bi
aii
, i = 1, 2, . . . , n.
9
Triangular Systems
An upper triangular matrix
A =






a11 a12 · · · a1n
a22
...
.
.
.
...
.
.
.
ann






,
where all elements below the main diagonal are zero:
aij = 0, ∀i > j.
Solve backwards: The last row reads annxn = bn, so xn = bn
ann
.
Next, now that we know xn, the row before last can be written as
an−1,n−1xn−1 = bn−1 − an−1,nxn, so xn−1 =
bn−1−an−1,nxn
an−1,n−1
. Next the
previous row can be dealt with, etc. We obtain the backward
substitution algorithm.
10
Computational Cost (Naive but Useful)
What is the cost of this algorithm? In a simplistic way we just
count each floating point operation (such as + and ∗) as a flop.
The number of flops required here is
1+
n−1
X
k=1
((n − k − 1) + (n − k) + 2) ≈ 2
n−1
X
k=1
(n−k) = 2
(n − 1)n
2
≈ n2
.
Simplistic but not ridiculously so: doesn’t take into account data
movement between elements of the computer’s memory hierarchy.
In fact, concerns of data locality can be crucial to the execution of
an algorithm. The situation is even more complex on
multiprocessor machines. But still: gives an idea...
11
LU Decomposition
The Gaussian elimination procedure decomposes A into a product
of a unit lower triangular matrix L and an upper triangular matrix
U. This is the famous LU decomposition. Together with the
ensuing backward substitution the entire solution algorithm for
Ax = b can therefore be described in three steps:
1 Decomposition:
A = LU
2 Forward substitution: solve
Ly = b.
3 Backward substitution: solve
Ux = y.
12
Pivoting
In a nutshell, perform permutations to increase numerical stability.
Trivial but telling examples:
For
A =

0 1
1 0

or
Aε =

ε 1
1 0

G.E. will fail (for A) or perform poorly (for Aε).
Nothing wrong with the problem, it’s the algorithm to blame!
Partial pivoting (not always stable but standard)
Complete pivoting (stable but too expensive)
Rook pivoting (I like it!)
13
Special Matrices
Symmetric Positive Definite. A matrix A is symmetric
positive definite (SPD) if A = AT and xT Ax  0 for any
nonzero vector x 6= 0. (All SPD matrices necessarily have
positive eigenvalues.)
In the context of linear systems – Cholesky Decomposition:
A = FFT
.
No pivoting required
Half the storage and work. (But still O(n2
) and O(n3
)
respectively.)
14
Special Matrices (Cont.)
Narrow Banded.
A =














a11 · · · a1q
.
.
.
...
...
...
ap1
...
...
...
...
...
...
...
...
...
...
...
...
... an−q+1,n
...
...
...
.
.
.
an,n−p+1 · · · ann














.
Significant savings, if bandwidth is small: O(n) work and
storage.
15
A Useful Numerical Tip
Never invert a matrix explicitly unless your life depends on it.
In Matlab, choose backslash over inv.
Reasons:
More accurate and efficient
For banded matrices, great saving in storage
16
LU vs. Gaussian Elimination (why store L?)
If you did all the work, might as well record it!
One good reason: solving linear systems with multiple right hand
sides.
17
Permutations and Reordering Strategies
Riddle: Which matrix is better to work with?
A =






× × × × ×
× × 0 0 0
× 0 × 0 0
× 0 0 × 0
× 0 0 0 ×






.
B =






× 0 0 0 ×
0 × 0 0 ×
0 0 × 0 ×
0 0 0 × ×
× × × × ×






.
B is a matrix obtained by swapping the first and the last row and
column of A.
18
Permutation Matrices
(PAPT
)(Px) = Pb.
Look at Px rather than x, as per the performed permutation.
If A is symmetric then so is PAPT . We can define the latter
matrix as B and rewrite the linear system as
By = c,
where y = Px and c = Pb.
In the example B = PAPT where P is a permutation matrix
associated with the vector p = (n, 2, 3, 4, . . . , n − 2, n − 1, 1)T .
19
Graphs
At least two possible ways of aiming to reduce the storage and
computational work:
Reduce the bandwidth of the matrix.
Reduce the expected fill-in in the decomposition stage.
One of the most commonly used tools for doing it is graph theory.
20
In the process of Gaussian elimination, each stage can be
described as follows in terms of the matrix graph: upon
zeroing out the (i, j) entry of the matrix (that is, with entry
(j, j) being the current pivot in question), all the vertices that
are the ‘neighbors’ of vertex j will cause the creation of an
edge connecting them to vertex i, if such an edge does not
already exist.
This shows why working with B is preferable over working
with A in the example: for instance, when attempting to zero
out entry (5, 1) using entry (1, 1) as pivot, in the graph of
A(1) all vertices j connected to vertex 1 will generate an edge
(5, j). For A, this means that new edges (2, 5), (3, 5), (4, 5)
are now generated, whereas for B no new edge is generated
because there are no edges connected to vertex 1 other than
vertex 5 itself.
21
Edges and Vertices
The degree of a vertex is the number of edges emanating from
the vertex. It is in our best interest to postpone dealing with
vertices of a high degree as much as possible.
For the matrix A in the example all the vertices except vertex
1 have degree 1, but vertex 1 has degree 4 and we start off
the Gaussian elimination by eliminating it; this results in
disastrous fill-in.
On the other hand for the B matrix we have that all vertices
except vertex 5 have degree 1, and vertex 5 is the last vertex
we deal with. Until we hit vertex 5 there is no fill, because the
latter is the only vertex that is connected to the other
vertices. When we deal with vertex 5 we identify vertices that
should hypothetically be generated, but they are already in
existence to begin with, so we end up with no fill whatsoever.
22
Optimality criteria
How to assure that the amount of work for determining the
ordering does not dominate the computation. As you may
already sense, determining an ‘optimal’ graph may be quite a
costly adventure.
How to deal with ‘tie breaking’ situations. For example, if we
have an algorithm based on the degrees of vertices, what if
two or more of them have the same degree: which one should
be labeled first?
23
Ordering strategies
Reverse Cuthill McKee (RCM): aims at minimizing bandwidth.
minimum degree (MMD) or approximate minimum degree
(AMD): aims at minimizing the expected fill-in.
24
25
26
Outline
1 Direct Solution Methods
Classification of Methods
Gaussian Elimination and LU Decomposition
Special Matrices
Ordering Strategies
2 Conditioning and Accuracy
Upper bound on the error
The Condition Number
3 Iterative Methods
Motivation
Basic Stationary Methods
Nonstationary Methods
Preconditioning
Suppose that, using some algorithm, we have computed an
approximate solution x̂. We would like to be able to evaluate the
absolute error kx − x̂k, or the relative error
kx − x̂k
kxk
.
We do not know the error; seek an upper bound, and rely on
computable quantities, such as the residual
r = b − Ax̂.
A stable Gaussian elimination variant will deliver a residual
with a small norm. The question is, what can we conclude
from this about the error in x?
28
How does the residual r relate to the error in x̂ in general?
r = b − Ax̂ = Ax − Ax̂ = A(x − x̂).
So
x − x̂ = A−1
r.
Then
kx − x̂k = kA−1
rk ≤ kA−1
kkrk.
This gives a bound on the absolute error in x̂ in terms of kA−1k.
But usually the relative error is more meaningful. Since
kbk ≤ kAkkxk implies 1
kxk ≤ kAk
kbk , we have
kx − x̂k
kxk
≤ kA−1
kkrk
kAk
kbk
.
29
Condition Number
We therefore define the condition number of the matrix A as
κ(A) = kAkkA−1
k
and write the bound obtained on the relative error as
kx − x̂k
kxk
≤ κ(A)
krk
kbk
.
In words, the relative error in the solution is bounded by the
condition number of the matrix A times the relative error in the
residual.
30
Properties (and myths)
Range of values:
1 = kIk = kA−1
Ak ≤ κ(A),
(i.e. a matrix is ideally conditioned if its condition number
equals 1), and κ(A) = ∞ for a singular matrix.
Orthogonal matrices are perfectly conditioned.
If A is SPD, κ2(A) = λ1
λn
.
Condition number is defined for any (even non-square)
matrices by the singular values of the matrix.
When something goes wrong with the numerical solution -
blame the condition number! (and hope for the best)
One of the most important areas of research: preconditioning.
(To be discussed later.)
What’s a well-conditioned matrix and what’s an
ill-conditioned matrix?
31
Outline
1 Direct Solution Methods
Classification of Methods
Gaussian Elimination and LU Decomposition
Special Matrices
Ordering Strategies
2 Conditioning and Accuracy
Upper bound on the error
The Condition Number
3 Iterative Methods
Motivation
Basic Stationary Methods
Nonstationary Methods
Preconditioning
Trouble in Paradise
The Gaussian elimination algorithm and its variations such as the
LU decomposition, the Cholesky method, adaptation to banded
systems, etc., is the approach of choice for many problems. There
are situations, however, which require a different treatment.
33
Drawbacks of Direct Solution Methods
The Gaussian elimination (or LU decomposition) process may
introduce fill-in, i.e. L and U may have nonzero elements in
locations where the original matrix A has zeros. If the amount
of fill-in is significant then applying the direct method may
become costly. This in fact occurs often, in particular when
the matrix is banded and is sparse within the band.
Sometimes we do not really need to solve the system exactly.
(e.g. nonlinear problems.) Direct methods cannot accomplish
this because by definition, to obtain a solution the process
must be completed; there is no notion of an early termination
or an inexact (yet acceptable) solution.
34
Drawbacks of Direct Solution Methods (Cont.)
Sometimes we have a pretty good idea of an approximate
guess for the solution. For example, in time dependent
problems (warm start with previous time solution). Direct
methods cannot make good use of such information.
Sometimes only matrix-vector products are given. In other
words, the matrix is not available explicitly or is very
expensive to compute. For example, in digital signal
processing applications it is often the case that only input and
output signals are given, without the transformation itself
explicitly formulated and available.
35
Motivation for Iterative Methods
What motivates us to use iterative schemes is the possibility that
inverting A may be very difficult, to the extent that it may be
worthwhile to invert a much ‘easier’ matrix several times, rather
than inverting A directly only once.
36
A Common Example: Laplacian (Poisson Equation)
A =







J −I
−I J −I
...
...
...
−I J −I
−I J







,
where J is a tridiagonal N × N matrix
J =







4 −1
−1 4 −1
...
...
...
−1 4 −1
−1 4







and I denotes the identity matrix of size N.
37
A Small Example
For instance, if N = 3 then
A =














4 −1 0 −1 0 0 0 0 0
−1 4 −1 0 −1 0 0 0 0
0 −1 4 0 0 −1 0 0 0
−1 0 0 4 −1 0 −1 0 0
0 −1 0 −1 4 −1 0 −1 0
0 0 −1 0 −1 4 0 0 −1
0 0 0 −1 0 0 4 −1 0
0 0 0 0 −1 0 −1 4 −1
0 0 0 0 0 −1 0 −1 4














.
38
Sparsity Pattern
39
Stationary Methods
Given Ax = b, we can rewrite as x = (I − A)x + b, which yields
the iteration
xk+1 = (I − A)xk + b.
From this we can generalize: for a given splitting A = M − N, we
have Mx = Nx + b, which leads to the fixed point iteration
Mxk+1 = Nxk + b.
40
The Basic Mechanism
Suppose that A = M − N is a splitting, and that Mz = r is much
easier to solve than Ax = b. Given an initial guess x0,
e0 = x − x0
is the error and
Ae0 = b − Ax0 := r0.
Notice that r0 is computable whereas e0 is not, because x is not
available. Since x = x0 + e0 = x0 + A−1r0, set
Mẽ = r0,
and then
x1 = x0 + ẽ
is our new guess. x1 is hopefully closer to x than x0.
41
The Tough Task of Finding a Good M
The matrix M should satisfy two contradictory requirements:
It should be close to A in some sense (or rather, M−1 should
be close to A−1).
It should be much easier to invert than A.
42
Jacobi, Gauss-Seidel, and Friends
It all boils down to the choice of M. If A = D + E + F is split into
diagonal D, strictly upper triangular part E and strictly lower
triangular part F, then:
Jacobi: M = D.
Gauss-Seidel: M = D + E.
SOR: a parameter dependent ‘improvement’ of Gauss-Seidel.
43
Convergence
It is easy to show that (asymptotic) convergence is governed
(barring ‘nasty’ matrices) by the question what the eigenvalues
of T = M−1N = I − M−1A are. They must be smaller than 1
in magnitude, and the smaller they are the faster we converge.
Jacobi and Gauss-Seidel are terribly slow methods. SOR is
not so bad, but requires a choice of a parameter.
But these methods also have merits.
Convergence analysis is easy to perform and can give an idea.
Jacobi is beautifully parallelizable, which is why it has been
taken out of its grave since HPC has become important.
Gauss-Seidel has beautiful smoothing properties and is used in
Multigrid.
44
Nonstationary Methods: Conjugate Gradients and Friends
The trouble with stationary schemes is that they do not make use
of information that has accumulated throughout the iteration.
How about trying to optimize something throughout the iteration?
For example,
xk+1 = xk + αkrk.
Adding b to both sides and subtracting the equations multiplied by
A:
b − Axk+1 = b − Axk − αkArk.
It is possible to find αk that minimizes the residual. Notice:
rk = pk(A)r0.
Modern method work hard at finding ‘the best’ pk.
45
Nonstationary Methods as an Optimization Problem
The methods considered here can all be written as
xk+1 = xk + αkpk,
where the vector pk is the search direction and the scalar αk is the
step size. The simplest such non-stationary scheme is obtained by
setting pk = rk, i.e. Mk = αkI, with I the identity matrix. The
resulting method is called gradient descent.
The step size αk may be chosen so as to minimize the `2 norm of
the residual rk. But there are other options too.
46
Conjugate Gradients (for SPD matrices)
Our problem Ax = b is equivalent to the problem of finding a
vector x that minimizes
φ(x) =
1
2
xT
Ax − bT
x.
The Conjugate Gradient Method defines search directions that are
A-conjugate, and minimizes kekkA =
q
eT
k Aek. Note that this is
well defined only if A is SPD.
47
The Conjugate Gradient Algorithm
Given an initial guess x0 and a tolerance tol, set at first
r0 = b − Ax0, δ0 = hr0, r0i, bδ = hb, bi, k = 0 and p0 = r0. Then:
While δk  tol2 bδ,
sk = Apk
αk =
δk
hpk, ski
xk+1 = xk + αkpk
rk+1 = rk − αksk
δk+1 = hrk+1, rk+1i
pk+1 = rk+1 +
δk+1
δk
pk
k = k + 1.
48
Numerical Solution of Linear algebraic Equation
Krylov Subspace Methods
We are seeking to find a solution within the Krylov subspace
xk ∈ x0 + Kk
(A; r0) ≡ x0 + span{r0, Ar0, A2
r0, . . . , Ak−1
r0}.
Find a good basis for the space (riddle: ‘good’ means
what??): Lanczos or Arnoldi will help here.
Optimality condition:
Require that the norm of the residual kb − Axk k2 is minimal
over the Krylov subspace.
Require that the residual is orthogonal to the subspace.
50
Well Known Methods
CG for SPD matrices
GMRES for nonsymmetric, MINRES for symmetric indefinite
BiCG-Stab (non-optimal: based on bidiagonalization)
QMR: Quasi-Minimal Residuals.
A few more...
51
Preconditioning — Motivation
Convergence rate typically depends on two factors:
Distribution/clustering of eigenvalues (crucial!)
Condition number (the less important factor!)
Idea: Since A is given and is beyond our control, define a matrix
M such that the above properties are better for M−1A, and solve
M−1Ax = M−1b rather than Ax = b.
Requirements: To produce an effective method the preconditioner
matrix M must be easily invertible. At the same time it is desirable
to have at least one of the following properties hold:
κ(M−1A)  κ(A), and/or the eigenvalues of M−1A are much
better clustered compared to those of A.
52
Basically, Two Types of Preconditioners
Algebraic, general purpose (arguably, frequently needed in
continuous optimization)
Specific to the problem (arguably, frequently needed in PDEs)
53
Important Classes of Preconditioners
Preconditioning is a combination of art and science...
Stationary preconditioners, such as Jacobi, Gauss-Seidel, SOR.
Incomplete factorizations.
Multigrid and multilevel preconditioners. (Advanced)
Preconditioners tailored to the problem in hand, that rely for
example on the properties of the underlying differential
operators.
54
Incomplete Factorizations
Given the matrix A, construct an LU decomposition or a Cholesky
decomposition (if A is symmetric positive definite) that follows
precisely the same steps as the usual decomposition algorithms,
except that a nonzero entry of a factor is generated only if the
matching entry of A is nonzero!
55
56
Matlab
R=cholinc(A,’0’);
x=pcg(A,b,tol,maxit,R’,R);
... and many other commands and features.
57
Problem-Tailored Preconditioners
Schur complement based methods, e.g. for Stokes and Maxwell.
To be discussed (time permitting).
58
The END
59

More Related Content

PDF
Applied numerical methods lec6
PPT
ge.ppt
PPTX
Direct Methods For The Solution Of Systems Of
PDF
Numerical Methods
PDF
Ch9-Gauss_Elimination4.pdf
PPTX
Linear Algebra and Matlab tutorial
PPT
Linear-Algebra.ppt
PPT
Linear-Algebra.ppt
Applied numerical methods lec6
ge.ppt
Direct Methods For The Solution Of Systems Of
Numerical Methods
Ch9-Gauss_Elimination4.pdf
Linear Algebra and Matlab tutorial
Linear-Algebra.ppt
Linear-Algebra.ppt

Similar to Numerical Solution of Linear algebraic Equation (20)

PPTX
Linear Algebra- Gauss Elim-converted.pptx
PPTX
PPTX
PPT
Solution of linear system of equations
PDF
Chapter 3 solving systems of linear equations
PDF
Numerical Methods in Civil engineering for problem solving
PPT
ch06-03-30-2018.ppt aaaaaaaaaaaaaaaaaaaaaa
PPT
System of linear algebriac equations nsm
PPTX
Are there trends, changes in the mi.pptx
PDF
7.6 Solving Systems with Gaussian Elimination
PDF
4 pages from matlab an introduction with app.-2
PDF
Slide_Chapter1_st.pdf
PPTX
Parallel algorithm in linear algebra
PPTX
Gaussian elimination method & homogeneous linear equation
PDF
Linear Algebra for AI & ML
PDF
9.3 Solving Systems With Gaussian Elimination
PPTX
PPTX
Gauss Elimination and Gauss Jordan.pptx
PPT
lec 10 gauss elimination mathematics note for engineering subject
PDF
tw1979 Exercise 1 Report
Linear Algebra- Gauss Elim-converted.pptx
Solution of linear system of equations
Chapter 3 solving systems of linear equations
Numerical Methods in Civil engineering for problem solving
ch06-03-30-2018.ppt aaaaaaaaaaaaaaaaaaaaaa
System of linear algebriac equations nsm
Are there trends, changes in the mi.pptx
7.6 Solving Systems with Gaussian Elimination
4 pages from matlab an introduction with app.-2
Slide_Chapter1_st.pdf
Parallel algorithm in linear algebra
Gaussian elimination method & homogeneous linear equation
Linear Algebra for AI & ML
9.3 Solving Systems With Gaussian Elimination
Gauss Elimination and Gauss Jordan.pptx
lec 10 gauss elimination mathematics note for engineering subject
tw1979 Exercise 1 Report
Ad

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
master seminar digital applications in india
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Basic Mud Logging Guide for educational purpose
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Classroom Observation Tools for Teachers
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Pharma ospi slides which help in ospi learning
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
VCE English Exam - Section C Student Revision Booklet
TR - Agricultural Crops Production NC III.pdf
Microbial diseases, their pathogenesis and prophylaxis
master seminar digital applications in india
STATICS OF THE RIGID BODIES Hibbelers.pdf
01-Introduction-to-Information-Management.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Basic Mud Logging Guide for educational purpose
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
GDM (1) (1).pptx small presentation for students
2.FourierTransform-ShortQuestionswithAnswers.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Classroom Observation Tools for Teachers
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPH.pptx obstetrics and gynecology in nursing
Pharma ospi slides which help in ospi learning
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Ad

Numerical Solution of Linear algebraic Equation

  • 1. Numerical Solution of Linear Systems Chen Greif Department of Computer Science The University of British Columbia Vancouver B.C. Tel Aviv University December 17, 2008 1
  • 2. Outline 1 Direct Solution Methods Classification of Methods Gaussian Elimination and LU Decomposition Special Matrices Ordering Strategies 2 Conditioning and Accuracy Upper bound on the error The Condition Number 3 Iterative Methods Motivation Basic Stationary Methods Nonstationary Methods Preconditioning
  • 4. Top Ten Algorithms in Science (Dongarra and Sullivan, 2000) 1 Metropolis Algorithm (Monte Carlo method) 2 Simplex Method for Linear Programming 3 Krylov Subspace Iteration Methods 4 The Decompositional Approach to Matrix Computations 5 The Fortran Optimizing Compiler 6 QR Algorithm for Computing Eigenvalues 7 Quicksort Algorithm for Sorting 8 Fast Fourier Transform 9 Integer Relation Detection Algorithm 10 Fast Multipole Method Red: Algorithms within the exclusive domain of NLA research. Blue: Algorithms strongly (though not exclusively) connected to NLA research. 4
  • 5. Outline 1 Direct Solution Methods Classification of Methods Gaussian Elimination and LU Decomposition Special Matrices Ordering Strategies 2 Conditioning and Accuracy Upper bound on the error The Condition Number 3 Iterative Methods Motivation Basic Stationary Methods Nonstationary Methods Preconditioning
  • 6. Two types of methods Numerical methods for solving linear systems of equations can generally be divided into two classes: Direct methods. In the absence of roundoff error such methods would yield the exact solution within a finite number of steps. Iterative methods. These are methods that are useful for problems involving special, very large matrices. 6
  • 7. Gaussian Elimination and LU Decomposition Assumptions: The given matrix A is real, n × n and nonsingular. The problem Ax = b therefore has a unique solution x for any given vector b in Rn. The basic direct method for solving linear systems of equations is Gaussian elimination. The bulk of the algorithm involves only the matrix A and amounts to its decomposition into a product of two matrices that have a simpler form. This is called an LU decomposition. 7
  • 8. How to solve linear equations when A is in upper triangular form. The algorithm is called backward substitution. transform a general system of linear equations into an upper triangular form, where backward substitution can be applied. The algorithm is called Gaussian elimination. 8
  • 9. Backward Substitution Start easy: If A is diagonal: A =      a11 a22 ... ann      , then the linear equations are uncoupled and the solution is obviously xi = bi aii , i = 1, 2, . . . , n. 9
  • 10. Triangular Systems An upper triangular matrix A =       a11 a12 · · · a1n a22 ... . . . ... . . . ann       , where all elements below the main diagonal are zero: aij = 0, ∀i > j. Solve backwards: The last row reads annxn = bn, so xn = bn ann . Next, now that we know xn, the row before last can be written as an−1,n−1xn−1 = bn−1 − an−1,nxn, so xn−1 = bn−1−an−1,nxn an−1,n−1 . Next the previous row can be dealt with, etc. We obtain the backward substitution algorithm. 10
  • 11. Computational Cost (Naive but Useful) What is the cost of this algorithm? In a simplistic way we just count each floating point operation (such as + and ∗) as a flop. The number of flops required here is 1+ n−1 X k=1 ((n − k − 1) + (n − k) + 2) ≈ 2 n−1 X k=1 (n−k) = 2 (n − 1)n 2 ≈ n2 . Simplistic but not ridiculously so: doesn’t take into account data movement between elements of the computer’s memory hierarchy. In fact, concerns of data locality can be crucial to the execution of an algorithm. The situation is even more complex on multiprocessor machines. But still: gives an idea... 11
  • 12. LU Decomposition The Gaussian elimination procedure decomposes A into a product of a unit lower triangular matrix L and an upper triangular matrix U. This is the famous LU decomposition. Together with the ensuing backward substitution the entire solution algorithm for Ax = b can therefore be described in three steps: 1 Decomposition: A = LU 2 Forward substitution: solve Ly = b. 3 Backward substitution: solve Ux = y. 12
  • 13. Pivoting In a nutshell, perform permutations to increase numerical stability. Trivial but telling examples: For A = 0 1 1 0 or Aε = ε 1 1 0 G.E. will fail (for A) or perform poorly (for Aε). Nothing wrong with the problem, it’s the algorithm to blame! Partial pivoting (not always stable but standard) Complete pivoting (stable but too expensive) Rook pivoting (I like it!) 13
  • 14. Special Matrices Symmetric Positive Definite. A matrix A is symmetric positive definite (SPD) if A = AT and xT Ax 0 for any nonzero vector x 6= 0. (All SPD matrices necessarily have positive eigenvalues.) In the context of linear systems – Cholesky Decomposition: A = FFT . No pivoting required Half the storage and work. (But still O(n2 ) and O(n3 ) respectively.) 14
  • 15. Special Matrices (Cont.) Narrow Banded. A =               a11 · · · a1q . . . ... ... ... ap1 ... ... ... ... ... ... ... ... ... ... ... ... ... an−q+1,n ... ... ... . . . an,n−p+1 · · · ann               . Significant savings, if bandwidth is small: O(n) work and storage. 15
  • 16. A Useful Numerical Tip Never invert a matrix explicitly unless your life depends on it. In Matlab, choose backslash over inv. Reasons: More accurate and efficient For banded matrices, great saving in storage 16
  • 17. LU vs. Gaussian Elimination (why store L?) If you did all the work, might as well record it! One good reason: solving linear systems with multiple right hand sides. 17
  • 18. Permutations and Reordering Strategies Riddle: Which matrix is better to work with? A =       × × × × × × × 0 0 0 × 0 × 0 0 × 0 0 × 0 × 0 0 0 ×       . B =       × 0 0 0 × 0 × 0 0 × 0 0 × 0 × 0 0 0 × × × × × × ×       . B is a matrix obtained by swapping the first and the last row and column of A. 18
  • 19. Permutation Matrices (PAPT )(Px) = Pb. Look at Px rather than x, as per the performed permutation. If A is symmetric then so is PAPT . We can define the latter matrix as B and rewrite the linear system as By = c, where y = Px and c = Pb. In the example B = PAPT where P is a permutation matrix associated with the vector p = (n, 2, 3, 4, . . . , n − 2, n − 1, 1)T . 19
  • 20. Graphs At least two possible ways of aiming to reduce the storage and computational work: Reduce the bandwidth of the matrix. Reduce the expected fill-in in the decomposition stage. One of the most commonly used tools for doing it is graph theory. 20
  • 21. In the process of Gaussian elimination, each stage can be described as follows in terms of the matrix graph: upon zeroing out the (i, j) entry of the matrix (that is, with entry (j, j) being the current pivot in question), all the vertices that are the ‘neighbors’ of vertex j will cause the creation of an edge connecting them to vertex i, if such an edge does not already exist. This shows why working with B is preferable over working with A in the example: for instance, when attempting to zero out entry (5, 1) using entry (1, 1) as pivot, in the graph of A(1) all vertices j connected to vertex 1 will generate an edge (5, j). For A, this means that new edges (2, 5), (3, 5), (4, 5) are now generated, whereas for B no new edge is generated because there are no edges connected to vertex 1 other than vertex 5 itself. 21
  • 22. Edges and Vertices The degree of a vertex is the number of edges emanating from the vertex. It is in our best interest to postpone dealing with vertices of a high degree as much as possible. For the matrix A in the example all the vertices except vertex 1 have degree 1, but vertex 1 has degree 4 and we start off the Gaussian elimination by eliminating it; this results in disastrous fill-in. On the other hand for the B matrix we have that all vertices except vertex 5 have degree 1, and vertex 5 is the last vertex we deal with. Until we hit vertex 5 there is no fill, because the latter is the only vertex that is connected to the other vertices. When we deal with vertex 5 we identify vertices that should hypothetically be generated, but they are already in existence to begin with, so we end up with no fill whatsoever. 22
  • 23. Optimality criteria How to assure that the amount of work for determining the ordering does not dominate the computation. As you may already sense, determining an ‘optimal’ graph may be quite a costly adventure. How to deal with ‘tie breaking’ situations. For example, if we have an algorithm based on the degrees of vertices, what if two or more of them have the same degree: which one should be labeled first? 23
  • 24. Ordering strategies Reverse Cuthill McKee (RCM): aims at minimizing bandwidth. minimum degree (MMD) or approximate minimum degree (AMD): aims at minimizing the expected fill-in. 24
  • 25. 25
  • 26. 26
  • 27. Outline 1 Direct Solution Methods Classification of Methods Gaussian Elimination and LU Decomposition Special Matrices Ordering Strategies 2 Conditioning and Accuracy Upper bound on the error The Condition Number 3 Iterative Methods Motivation Basic Stationary Methods Nonstationary Methods Preconditioning
  • 28. Suppose that, using some algorithm, we have computed an approximate solution x̂. We would like to be able to evaluate the absolute error kx − x̂k, or the relative error kx − x̂k kxk . We do not know the error; seek an upper bound, and rely on computable quantities, such as the residual r = b − Ax̂. A stable Gaussian elimination variant will deliver a residual with a small norm. The question is, what can we conclude from this about the error in x? 28
  • 29. How does the residual r relate to the error in x̂ in general? r = b − Ax̂ = Ax − Ax̂ = A(x − x̂). So x − x̂ = A−1 r. Then kx − x̂k = kA−1 rk ≤ kA−1 kkrk. This gives a bound on the absolute error in x̂ in terms of kA−1k. But usually the relative error is more meaningful. Since kbk ≤ kAkkxk implies 1 kxk ≤ kAk kbk , we have kx − x̂k kxk ≤ kA−1 kkrk kAk kbk . 29
  • 30. Condition Number We therefore define the condition number of the matrix A as κ(A) = kAkkA−1 k and write the bound obtained on the relative error as kx − x̂k kxk ≤ κ(A) krk kbk . In words, the relative error in the solution is bounded by the condition number of the matrix A times the relative error in the residual. 30
  • 31. Properties (and myths) Range of values: 1 = kIk = kA−1 Ak ≤ κ(A), (i.e. a matrix is ideally conditioned if its condition number equals 1), and κ(A) = ∞ for a singular matrix. Orthogonal matrices are perfectly conditioned. If A is SPD, κ2(A) = λ1 λn . Condition number is defined for any (even non-square) matrices by the singular values of the matrix. When something goes wrong with the numerical solution - blame the condition number! (and hope for the best) One of the most important areas of research: preconditioning. (To be discussed later.) What’s a well-conditioned matrix and what’s an ill-conditioned matrix? 31
  • 32. Outline 1 Direct Solution Methods Classification of Methods Gaussian Elimination and LU Decomposition Special Matrices Ordering Strategies 2 Conditioning and Accuracy Upper bound on the error The Condition Number 3 Iterative Methods Motivation Basic Stationary Methods Nonstationary Methods Preconditioning
  • 33. Trouble in Paradise The Gaussian elimination algorithm and its variations such as the LU decomposition, the Cholesky method, adaptation to banded systems, etc., is the approach of choice for many problems. There are situations, however, which require a different treatment. 33
  • 34. Drawbacks of Direct Solution Methods The Gaussian elimination (or LU decomposition) process may introduce fill-in, i.e. L and U may have nonzero elements in locations where the original matrix A has zeros. If the amount of fill-in is significant then applying the direct method may become costly. This in fact occurs often, in particular when the matrix is banded and is sparse within the band. Sometimes we do not really need to solve the system exactly. (e.g. nonlinear problems.) Direct methods cannot accomplish this because by definition, to obtain a solution the process must be completed; there is no notion of an early termination or an inexact (yet acceptable) solution. 34
  • 35. Drawbacks of Direct Solution Methods (Cont.) Sometimes we have a pretty good idea of an approximate guess for the solution. For example, in time dependent problems (warm start with previous time solution). Direct methods cannot make good use of such information. Sometimes only matrix-vector products are given. In other words, the matrix is not available explicitly or is very expensive to compute. For example, in digital signal processing applications it is often the case that only input and output signals are given, without the transformation itself explicitly formulated and available. 35
  • 36. Motivation for Iterative Methods What motivates us to use iterative schemes is the possibility that inverting A may be very difficult, to the extent that it may be worthwhile to invert a much ‘easier’ matrix several times, rather than inverting A directly only once. 36
  • 37. A Common Example: Laplacian (Poisson Equation) A =        J −I −I J −I ... ... ... −I J −I −I J        , where J is a tridiagonal N × N matrix J =        4 −1 −1 4 −1 ... ... ... −1 4 −1 −1 4        and I denotes the identity matrix of size N. 37
  • 38. A Small Example For instance, if N = 3 then A =               4 −1 0 −1 0 0 0 0 0 −1 4 −1 0 −1 0 0 0 0 0 −1 4 0 0 −1 0 0 0 −1 0 0 4 −1 0 −1 0 0 0 −1 0 −1 4 −1 0 −1 0 0 0 −1 0 −1 4 0 0 −1 0 0 0 −1 0 0 4 −1 0 0 0 0 0 −1 0 −1 4 −1 0 0 0 0 0 −1 0 −1 4               . 38
  • 40. Stationary Methods Given Ax = b, we can rewrite as x = (I − A)x + b, which yields the iteration xk+1 = (I − A)xk + b. From this we can generalize: for a given splitting A = M − N, we have Mx = Nx + b, which leads to the fixed point iteration Mxk+1 = Nxk + b. 40
  • 41. The Basic Mechanism Suppose that A = M − N is a splitting, and that Mz = r is much easier to solve than Ax = b. Given an initial guess x0, e0 = x − x0 is the error and Ae0 = b − Ax0 := r0. Notice that r0 is computable whereas e0 is not, because x is not available. Since x = x0 + e0 = x0 + A−1r0, set Mẽ = r0, and then x1 = x0 + ẽ is our new guess. x1 is hopefully closer to x than x0. 41
  • 42. The Tough Task of Finding a Good M The matrix M should satisfy two contradictory requirements: It should be close to A in some sense (or rather, M−1 should be close to A−1). It should be much easier to invert than A. 42
  • 43. Jacobi, Gauss-Seidel, and Friends It all boils down to the choice of M. If A = D + E + F is split into diagonal D, strictly upper triangular part E and strictly lower triangular part F, then: Jacobi: M = D. Gauss-Seidel: M = D + E. SOR: a parameter dependent ‘improvement’ of Gauss-Seidel. 43
  • 44. Convergence It is easy to show that (asymptotic) convergence is governed (barring ‘nasty’ matrices) by the question what the eigenvalues of T = M−1N = I − M−1A are. They must be smaller than 1 in magnitude, and the smaller they are the faster we converge. Jacobi and Gauss-Seidel are terribly slow methods. SOR is not so bad, but requires a choice of a parameter. But these methods also have merits. Convergence analysis is easy to perform and can give an idea. Jacobi is beautifully parallelizable, which is why it has been taken out of its grave since HPC has become important. Gauss-Seidel has beautiful smoothing properties and is used in Multigrid. 44
  • 45. Nonstationary Methods: Conjugate Gradients and Friends The trouble with stationary schemes is that they do not make use of information that has accumulated throughout the iteration. How about trying to optimize something throughout the iteration? For example, xk+1 = xk + αkrk. Adding b to both sides and subtracting the equations multiplied by A: b − Axk+1 = b − Axk − αkArk. It is possible to find αk that minimizes the residual. Notice: rk = pk(A)r0. Modern method work hard at finding ‘the best’ pk. 45
  • 46. Nonstationary Methods as an Optimization Problem The methods considered here can all be written as xk+1 = xk + αkpk, where the vector pk is the search direction and the scalar αk is the step size. The simplest such non-stationary scheme is obtained by setting pk = rk, i.e. Mk = αkI, with I the identity matrix. The resulting method is called gradient descent. The step size αk may be chosen so as to minimize the `2 norm of the residual rk. But there are other options too. 46
  • 47. Conjugate Gradients (for SPD matrices) Our problem Ax = b is equivalent to the problem of finding a vector x that minimizes φ(x) = 1 2 xT Ax − bT x. The Conjugate Gradient Method defines search directions that are A-conjugate, and minimizes kekkA = q eT k Aek. Note that this is well defined only if A is SPD. 47
  • 48. The Conjugate Gradient Algorithm Given an initial guess x0 and a tolerance tol, set at first r0 = b − Ax0, δ0 = hr0, r0i, bδ = hb, bi, k = 0 and p0 = r0. Then: While δk tol2 bδ, sk = Apk αk = δk hpk, ski xk+1 = xk + αkpk rk+1 = rk − αksk δk+1 = hrk+1, rk+1i pk+1 = rk+1 + δk+1 δk pk k = k + 1. 48
  • 50. Krylov Subspace Methods We are seeking to find a solution within the Krylov subspace xk ∈ x0 + Kk (A; r0) ≡ x0 + span{r0, Ar0, A2 r0, . . . , Ak−1 r0}. Find a good basis for the space (riddle: ‘good’ means what??): Lanczos or Arnoldi will help here. Optimality condition: Require that the norm of the residual kb − Axk k2 is minimal over the Krylov subspace. Require that the residual is orthogonal to the subspace. 50
  • 51. Well Known Methods CG for SPD matrices GMRES for nonsymmetric, MINRES for symmetric indefinite BiCG-Stab (non-optimal: based on bidiagonalization) QMR: Quasi-Minimal Residuals. A few more... 51
  • 52. Preconditioning — Motivation Convergence rate typically depends on two factors: Distribution/clustering of eigenvalues (crucial!) Condition number (the less important factor!) Idea: Since A is given and is beyond our control, define a matrix M such that the above properties are better for M−1A, and solve M−1Ax = M−1b rather than Ax = b. Requirements: To produce an effective method the preconditioner matrix M must be easily invertible. At the same time it is desirable to have at least one of the following properties hold: κ(M−1A) κ(A), and/or the eigenvalues of M−1A are much better clustered compared to those of A. 52
  • 53. Basically, Two Types of Preconditioners Algebraic, general purpose (arguably, frequently needed in continuous optimization) Specific to the problem (arguably, frequently needed in PDEs) 53
  • 54. Important Classes of Preconditioners Preconditioning is a combination of art and science... Stationary preconditioners, such as Jacobi, Gauss-Seidel, SOR. Incomplete factorizations. Multigrid and multilevel preconditioners. (Advanced) Preconditioners tailored to the problem in hand, that rely for example on the properties of the underlying differential operators. 54
  • 55. Incomplete Factorizations Given the matrix A, construct an LU decomposition or a Cholesky decomposition (if A is symmetric positive definite) that follows precisely the same steps as the usual decomposition algorithms, except that a nonzero entry of a factor is generated only if the matching entry of A is nonzero! 55
  • 56. 56
  • 58. Problem-Tailored Preconditioners Schur complement based methods, e.g. for Stokes and Maxwell. To be discussed (time permitting). 58