2013 Matrix Computations 4th.pdf

Johns Hopkins Studies in the Mathematical Sciences
in association with the Department of Mathematical Sciences,
The Johns Hopkins University

Matrix Computations
Fourth Edition
Gene H. Golub
Department of Computer Science
Stanford University
Charles F. Van Loan
Department of Computer Science
Cornell University
The Johns Hopkins University Press
Baltimore

© 1983, 1989, 1996, 2013 The Johns Hopkins University Press
All rights reserved. Published 2013
Printed in the United States of America on acid-free paper
9 8 7 6 5 4 3 2 1
First edition 1983
Second edition 1989
Third edition 1996
Fourth edition 2013
The Johns Hopkins University Press
2715 North Charles Street
Baltimore, Maryland 21218-4363
www.press.jhu.edu
Library of Congress Control Number: 2012943449
ISBN 13: 978-1-4214-0794-4 (he)
ISBN 10: 1-4214-0794-9 (he)
ISBN 13: 978-1-4214-0859-0 (eb)
ISBN 10: 1-4214-0859-7 (eb)
A catalog record for this book is available from the British Library.
MATLAB® is a registered trademark of The Mathworks Inc.
Special discounts are available for bulk purchases of this book. For more information, please
contact Special Sales at 410-516-6936 or specialsales@press.jhu. edu.
The Johns Hopkins University Press uses environmentally friendly book materials, including
recycled text paper that is composed of at least 30 percent post-consumer waste, whenever
possible.

To
ALSTON S. HOUSEHOLDER
AND
JAMES H. WILKINSON

Contents
Preface xi
Global References xiii
Other Books xv
Useful URLs xix
Common Notation xxi
1 Matrix Multiplication
1.1 Basic Algorithms and Notation
1.2 Structure and Efficiency 14
1.3 Block Matrices and Algorithms
1.4 Fast Matrix-Vector Products
1.5 Vectorization and Locality 43
1.6 Parallel Matrix Multiplication
2 Matrix Analysis
2
22
33
49
2.1 Basic Ideas from Linear Algebra 64
2.2 Vector Norms 68
2.3 Matrix Norms 71
2.4 The Singular Value Decomposition
2.5 Subspace Metrics 81
2.6 The Sensitivity of Square Systems
2.7 Finite Precision Matrix Computations
3 General Linear Systems
----
3.l Triangular Systems 106
3.2 The LU Factorization 111
76
87
93
3.3 Roundoff Error in Gaussian Elimination 122
3.4 Pivoting 125
3.5 Improving and Estimating Accuracy 137
3.6 Parallel LU 144
4 Special Linear Systems
1
63
105
153
------------------ ------------- ----
4.1
4.2
Diagonal Dominance and Symmetry
Positive Definite Systems 159
VII
154

viii
4.3
4.4
4.5
4.6
4.7
4.8
5
5.1
5.2
5.3
5.4
5.5
5.6
6
Banded Systems 176
Symmetric Indefinite Systems 186
Block Tridiagonal Systems 196
Vandermonde Systems 203
Classical Methods for Toeplitz Systems
Circulant and Discrete Poisson Systems
208
219
Orthogonalization and Least Squares
Householder and Givens Transformations 234
The QR Factorization 246
The Full-Rank Least Squares Problem 260
Other Orthogonal Factorizations 274
The Rank-Deficient Least Squares Problem 288
Square and Underdetermined Systems 298
Modified Least Squares Problems and Methods
6.1 Weighting and Regularization 304
6.2 Constrained Least Squares 313
6.3 Total Least Squares 320
6.4 Subspace Computations with the SVD 327
6.5 Updating Matrix Factorizations 334
7
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
8
8.1
8.2
8.3
8.4
8.5
8.6
8.7
Unsymmetric Eigenvalue Problems
Properties and Decompositions 348
Perturbation Theory 357
Power Iterations 365
The Hessenberg and Real Schur Forms 376
The Practical QR Algorithm 385
Invariant Subspace Computations 394
The Generalized Eigenvalue Problem 405
Hamiltonian and Product Eigenvalue Problems 420
Pseudospectra 426
Symmetric Eigenvalue Problems
Properties and Decompositions 440
Power Iterations 450
The Symmetric QR Algorithm 458
More Methods for Tridiagonal Problems 467
Jacobi Methods 476
Computing the SVD 486
Generalized Eigenvalue Problems with Symmetry 497
CONTENTS
233
303
347
439

CONTENTS ix
9 Functions of Matrices 513
------- ··-- --- ----
--- - -- . --· · ---
·---
Eigenvalue Methods 514
Approximation Methods 522
The Matrix Exponential 530
9.1
9.2
9.3
9.4 The Sign, Square Root, and Log of a Matrix 536
10 Large Sparse Eigenvalue Problems
----- ----
10.1 The Symmetric Lanczos Process 546
10.2 Lanczos, Quadrature, and Approximation 556
10.3 Practical Lanczos Procedures 562
10.4 Large Sparse SVD Frameworks 571
10.5 Krylov Methods for Unsymmetric Problems 579
10.6 Jacobi-Davidson and Related Methods 589
11 Large Sparse Linear System Problems
11.l Direct Methods 598
---- ---- ·
11.2 The Classical Iterations 611
11.3 The Conjugate Gradient Method 625
11.4 Other Krylov Methods 639
11.5 Preconditioning 650
11.6 The Multigrid Framework 670
12 Special Topics
545
597
681
··--- ---- - --·· --- --- · - -- --- - -- -- ---
12.1
12.2
12.3
12.4
12.5
Linear Systems with Displacement Structure
Structured-Rank Problems 691
Kronecker Product Computations
Tensor Unfoldings and Contractions
Tensor Decompositions and Iterations
Index 747
707
719
731
681

Preface
My thirty-year book collaboration with Gene Golub began in 1977 at a matrix
computation workshop held at Johns Hopkins University. His interest in my
work at the start ofmy academic careerprompted the writing of GVL1. Sadly,
Gene died on November 16, 2007. At the time we had only just begun to
talk about GVL4. While writing these pages, I was reminded every day ofhis
far-reaching impact and professional generosity. This edition is a way to thank
Gene for our collaboration and the friendly research community that his unique
personality helped create.
It has been sixteen years since the publication of the third edition-a power-of-two
reminder that what we need to know about matrix computations is growing exponen
tially! Naturally, it is impossible to provide in-depth coverage of all the great new
advances and research trends. However, with the relatively recent publication of so
many excellent textbooks and specialized volumes, we are able to complement our
brief treatments with useful pointers to the literature. That said, here are the new
features of GVL4:
Content
The book is about twenty-five percent longer. There are new sections on fast
transforms (§1.4), parallel LU (§3.6), fast methods for circulant systems and discrete
Poisson systems (§4.8), Hamiltonian and product eigenvalue problems (§7.8), pseu
dospectra (§7.9), the matrix sign, square root, and logarithm functions (§9.4), Lanczos
and quadrature (§10.2), large-scale SVD (§10.4), Jacobi-Davidson (§10.6), sparse direct
methods (§11.1), multigrid (§11.6), low displacement rank systems (§12.1), structured
rank systems (§12.2), Kronecker product problems (§12.3), tensor contractions (§12.4),
and tensor decompositions (§12.5).
New topics at the subsection level include recursive block LU (§3.2.11), rook pivot
ing (§3.4.7), tournament pivoting (§3.6.3), diagonal dominance (§4.1.1), recursive block
structures (§4.2.10), band matrix inverse properties (§4.3.8), divide-and-conquer strate
gies for block tridiagonal systems (§4.5.4), the cross product and various point/plane
least squares problems (§5.3.9), the polynomial eigenvalue problem (§7.7.9), and the
structured quadratic eigenvalue problem (§8.7.9).
Substantial upgrades include our treatment of floating-point arithmetic (§2.7),
LU roundoff error analysis (§3.3.1), LS sensitivity analysis (§5.3.6), the generalized
singular value decomposition (§6.1.6 and §8.7.4), and the CS decomposition (§8.7.6).
References
The annotated bibliographies at the end of each section remain. Because of
space limitations, the master bibliography that was included in previous editions is
now available through the book website. References that are historically important
have been retained because old ideas have a way of resurrecting themselves. Plus, we
must never forget the 1950's and 1960's! As mentioned above, we have the luxury of
xi

xii PREFACE
being able to draw upon an expanding library of books on matrix computations. A
mnemonic-based citation system has been incorporated that supports these connections
to the literature.
Examples
Non-illuminating, small-n numerical examples have been removed from the text.
In their place is a modest suite of MATLAB demo scripts that can be run to provide
insight into critical theorems and algorithms. We believe that this is a much more
effective way to build intuition. The scripts are available through the book website.
Algorithmic Detail
It is important to have an algorithmic sense and an appreciation for high-perfor
mance matrix computations. After all, it is the clever exploitation of advanced archi
tectures that account for much of the field's soaring success. However, the algorithms
that we "formally" present in the book must never be considered as even prototype
implementations. Clarity and communication of the big picture are what determine
the level of detail in our presentations. Even though specific strategies for specific
machines are beyond the scope of the text, we hope that our style promotes an ability
to reason about memory traffic overheads and the importance of data locality.
Acknowledgements
I would like to thank everybody who has passed along typographical errors and
suggestions over the years. Special kudos to the Cornell students in CS 4220, CS 6210,
and CS 6220, where I used preliminary versions of GVL4. Harry Terkelson earned big
bucks through through my ill-conceived $5-per-typo program!
A number of colleagues and students provided feedback and encouragement dur
ing the writing process. Others provided inspiration through their research and books.
Thank you all: Diego Accame, David Bindel, Ake Bjorck, Laura Bolzano, Jim Dem
mel, Jack Dongarra, Mark Embree, John Gilbert, David Gleich, Joseph Grear, Anne
Greenbaum, Nick Higham, Ilse Ipsen, Bo Kagstrom, Vel Kahan, Tammy Kolda, Amy
Langville, Julian Langou, Lek-Heng Lim, Nicola Mastronardi, Steve McCormick, Mike
McCourt, Volker Mehrmann, Cleve Moler, Dianne O'Leary, Michael Overton, Chris
Paige, Beresford Parlett, Stefan Ragnarsson, Lothar Reichel, Yousef Saad, Mike Saun
ders, Rob Schreiber, Danny Sorensen, Pete Stewart, Gil Strang, Francoise Tisseur,
Nick Trefethen, Raf Vandebril, and Jianlin Xia.
Chris Paige and Mike Saunders were especially helpful with the editing of Chap
ters 10 and 11.
Vincent Burke, Jennifer Mallet, and Juliana McCarthy at Johns Hopkins Univer
sity Press provided excellent support during the production process. Jennifer Slater
did a terrific job of copy-editing. Of course, I alone am responsible for all mistakes and
oversights.
Finally, this book would have been impossible to produce without my great family
and my 4AM writing companion: Henry the Cat!
Charles F. Van Loan
Ithaca, New York
July, 2012

Global References
A number of books provide broad coverage of the field and are cited multiple times.
We identify these global references using mnemonics. Bibliographic details are given
in the Other Books section that follows.
AEP
ANLA
ASNA
EOM
FFT
FOM
FMC
IMC
IMK
IMSL
ISM
IMSLE
LCG
MA
MABD
MAE
MEP
MPT
NLA
NMA
NMLE
NMLS
NMSE
SAP
SEP
SLAS
SLS
TMA
Wilkinson: Algebraic Eigenvalue Problem
Demmel: Applied Numerical Linear Algebra
Higham: Accuracy and Stability of Numerical Algorithms, second edition
Chatelin: Eigenvalues of Matrices
Van Loan: Computational Frameworks for the Fast Fourier Transform
Higham: Functions of Matrices
Watkins: Fundamentals of Matrix Computations
Stewart: Introduction to Matrix Computations
van der Vorst: Iterative Krylov Methods for Large Linear Systems
Greenbaum: Iterative Methods for Solving Linear Systems
Axelsson: Iterative Solution Methods
Saad: Iterative Methods for Sparse Linear Systems, second edition
Meurant: The Lanczos and Conjugate Gradient Algorithms ...
Horn and Johnson: Matrix Analysis
Stewart: Matrix Algorithms: Basic Decompositions
Stewart: Matrix Algorithms Volume II: Eigensystems
Watkins: The Matrix Eigenvalue Problem: GR and Krylo11 Subspace Methods
Stewart and Sun: Matrix Perturbation Theory
Trefethen and Bau: Numerical Linear Algebra
Ipsen: Numerical Matrix Analysis: Linear Systems and Least Squares
Saad: Numerical Methods for Large Eigenvalue Problems, revised edition
Bjorck: Numerical Methods for Least Squares Problems
Kressner: Numerical Methods for General and Structured Eigenvalue Problems
Trefethen and Embree: Spectra and Pseudospectra
Parlett: The Symmetric Eigenvalue Problem
Forsythe and Moler: Computer Solution of Linear Algebraic Systems
Lawson and Hanson: Solving Least Squares Problems
Horn and Johnson: Topics in Matrix Analysis
LAPACK LAPACK Users' Guide, third edition
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,
J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen.
scaLAPACK ScaLAPACK Users' Guide
L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon,
J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker,
and R. C. Whaley.
LIN_TEMPLATES Templates for the Solution of Linear Systems ...
R. Barrett, M.W. Berry, T.F. Chan, J. Demmel, .J. Donato, J. Dongarra, V. Eijkhout,
R. Pozo, C. Romine, and H. van der Vorst.
EIG_TEMPLATES Templates for the Solution of Algebraic Eigenvalue Problems ...
Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst.
XIII

Other Books
The following volumes are a subset of a larger, ever-expanding library of textbooks and mono
graphs that are concerned with matrix computations and supporting areas. The list of refer
ences below captures the evolution of the field and its breadth. Works that are more specialized
are cited in the annotated bibliographies that appear at the end of each section in the chapters.
Early Landmarks
V.N. Faddeeva (1959). Computational Methods of Linear Algebra, Dover, New York.
E. Bodewig (1959). Matrix Calculus, North-Holland, Amsterdam.
J.H. Wilkinson (1963). Rounding Errors in Algebraic Processes, Prentice-Hall, Englewood
Cliffs, NJ.
A.S. Householder (1964). Theory of Matrices in Numerical Analysis, Blaisdell, New York.
Reprinted in 1974 by Dover, New York.
L. Fox (1964). An Introduction to Numerical Linear Algebra, Oxford University Press, Oxford.
J.H. Wilkinson (1965). The Algebraic Eigenvalue Problem, Clarendon Press, Oxford.
General Textbooks on Matrix Computations
G.W. Stewart (1973). Introduction to Matrix Computations, Academic Press, New York.
R.J. Goult, R.F. Hoskins, J.A. Milner, and M.J. Pratt (1974). Computational Methods in
Linear Algebra, John Wiley and Sons, New York.
W.W. Hager (1988). Applied Numerical Linear Algebra, Prentice-Hall, Englewood Cliffs, NJ.
P.G. Ciarlet (1989). Introduction to Numerical Linear Algebra and Optimisation, Cambridge
University Press, Cambridge.
P.E. Gill, W. Murray, and M.H. Wright (1991). Numerical Linear Algebra and Optimization,
Vol. 1, Addison-Wesley, Reading, MA.
A. Jennings and J.J. McKeowen (1992). Matrix Computation,second edition, John Wiley and
Sons, New York.
L.N. Trefethen and D. Bau III (1997). Numerical Linear Algebra, SIAM Publications, Philadel
phia, PA.
J.W. Demmel (1997). Applied Numerical Linear Algebra, SIAM Publications, Philadelphia,
PA.
A.J. Laub (2005). Matrix Analysis for Scientists and Engineers, SIAM Publications, Philadel
phia, PA.
B.N. Datta (2010). Numerical Linear Algebra and Applications, second edition, SIAM Publi
cations, Philadelphia, PA.
D.S. Watkins (2010). Fundamentals of Matrix Computations, John Wiley and Sons, New
York.
A.J. Laub (2012). Computational Matrix Analysis, SIAM Publications, Philadelphia, PA.
Linear Equation and Least Squares Problems
G.E. Forsythe and C.B. Moler {1967). Computer Solution of Linear Algebraic Systems,
Prentice-Hall, Englewood Cliffs, NJ.
A. George and J.W-H. Liu (1981). Computer Solution of Large Sparse Positive Definite
Systems. Prentice-Hall, Englewood Cliffs, NJ.
xv

xvi OTHER BOOKS
I.S. Duff, A.M. Erisman, and J.K. Reid (1986). Direct Methods for Sparse Matrices, Oxford
University Press, New York.
R.W. Farebrother (1987). Linear Least Squares Computations, Marcel Dekker, New York.
C.L. Lawson and R.J. Hanson (1995). Solving Least Squares Problems, SIAM Publications,
Philadelphia, PA.
A. Bjorck (1996). Numerical Methods for Least Squares Problems, SIAM Publications, Philadel
phia, PA.
G.W. Stewart (1998). Matrix Algorithms: Basic Decompositions, SIAM Publications, Philadel
phia, PA.
N.J. Higham (2002). Accuracy and Stability of Numerical Algorithms, second edition, SIAM
Publications, Philadelphia, PA.
T.A. Davis (2006). Direct Method.� for Sparse Linear Systems, SIAM Publications, Philadel
phia, PA.
l.C.F. Ipsen (2009). Numerical Matrix Analysis: Linear Systems and Least Squares, SIAM
Eigenvalue Problems
A.R. Gourlay and G.A. Watson (1973). Computational Methods for Matrix Eigenproblems,
John Wiley & Sons, New York.
F. Chatelin (1993). Eigenvalues of Matrices, John Wiley & Sons, New York.
B.N. Parlett (1998). The Symmetric Eigenvalue Problem, SIAM Publications, Philadelphia,
PA.
G.W. Stewart (2001). Matrix Algorithms Volume II: Eigensystems, SIAM Publications, Phila
delphia, PA.
L. Komzsik (2003). The Lanczos Method: Euolution and Application, SIAM Publications,
Philadelphia, PA.
D. Kressner (2005). Numerical Methods for General and Stmctured Eigenualue Problems,
Springer, Berlin.
D.S. Watkins (2007). The Matrix Eigenvalue Problem: GR and Krylou Subspace Methods,
SIAM Publications, Philadelphia, PA.
Y. Saad (2011). Numerical Methods for Large Eigenvalue Problems, revised edition, SIAM
Iterative Methods
R.S. Varga {1962). Matrix Iterative Analysis, Prentice-Hall, Englewood Cliffs, NJ.
D.M. Young {1971). Iteratiue Solution of Large Linear Systems, Academic Press, New York.
L.A. Hageman and D.M. Young (1981). Applied Iterative Methods, Academic Press, New
York.
J. Cullum and R.A. Willoughby (1985). Lanczos Algorithms for Large Symmetric Eigenvalue
Computations, Vol. I Theory, Birkhaiiser, Boston.
J. Cullum and R.A. Willoughby (1985). Lanczos Algorithms for Large Symmetric Eigenvalue
Computation.�, Vol. II Programs, Birkhaiiser, Boston.
W. Hackbusch (1994). Iteratiue Solution of Large Sparse Systems of Equations, Springer
Verlag, New York.
0. Axelsson (1994). Iterative Solution Methods, Cambridge University Press.
A. Greenbaum (1997). Iterative Methods for Solving Linear Systems, SIAM Publications,
Philadelphia, PA.
Y. Saad {2003). Iterative Methods for Sparse Linear Systems, second edition, SIAM Publica
tions, Philadelphia, PA.
H. van der Vorst (2003). Iteratiue Krylov Methods for Large Linear Systems, Cambridge
University Press, Cambridge, UK.

OTHER BOOKS xvii
G. Meurant (2006). The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite
Precision Computations, SIAM Publications, Philadelphia, PA.
Special Topics/Threads
L.N. Trefethen and M. Embree (2005). Spectra and Pseudospectra-The Behavior of Nonnor
mal Matrices and Operators, Princeton University Press, Princeton and Oxford.
R. Vandebril, M. Van Barel, and N. Mastronardi (2007). Matrix Computations and Semisep
arable Matrices I: Linear Systems, Johns Hopkins University Press, Baltimore, MD.
R. Vandebril, M. Van Barel, and N. Mastronardi (2008). Matrix Computations and Semisepa
rable Matrices II: Eigenvalue and Singular Value Methods, Johns Hopkins University Press,
Baltimore, MD.
N.J. Higham (2008) Functions of Matrices, SIAM Publications, Philadelphia, PA.
Collected Works
R.H. Chan, C. Greif, and D.P. O'Leary, eds. (2007). Milestones in Matrix Computation:
Selected Works of G.H. Golub, with Commentaries, Oxford University Press, Oxford.
M.E. Kilmer and D.P. O'Leary, eds. (2010). Selected Works of G. W. Stewart, Birkhauser,
Boston, MA.
Implementation
B.T. Smith, J.M. Boyle, Y. Ikebe, V.C. Klema, and C.B. Moler (1970). Matrix Eigensystem
Routines: EISPACK Guide, second edition, Lecture Notes in Computer Science, Vol. 6,
Springer-Verlag, New York.
J.H. Wilkinson and C. Reinsch, eds. (1971). Handbook for Automatic Computation, Vol. 2,
Linear Algebra, Springer-Verlag, New York.
B.S. Garbow, J.M. Boyle, J.J. Dongarra, and C.B. Moler (1972). Matrix Eigensystem Rou
tines: EISPACK Guide Extension, Lecture Notes in Computer Science, Vol. 51, Springer
Verlag, New York.
J.J Dongarra, J.R. Bunch, C.B. Moler, and G.W. Stewart (1979). LINPACK Users' Guide,
K. Gallivan, M. Heath, E. Ng, B. Peyton, R. Plemmons, J. Ortega, C. Romine, A. Sameh,
and R. Voigt {1990). Parallel Algorithms for Matrix Computations, SIAM Publications,
Philadelphia, PA.
R. Barrett, M.W. Berry, T.F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo,
C. Romine, and H. van der Vorst (1993). Templates for the Solution of Linear Systems:
Building Blocks for Iterative Methods, SIAM Publications, Philadelphia, PA.
L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Ham
marling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley {1997). ScaLA
PACK U.�ers' Guide, SIAM Publications, Philadelphia, PA.
J.J. Dongarra, I.S. Duff, D.C. Sorensen, and H.A. van der Vorst (1998). Numerical Linear
Algebra on High-Performance Computers, SIAM Publications, Philadelphia, PA.
E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A.
Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen (1999). LAPACK Users'
Guide, third edition, SIAM Publications, Philadelphia, PA.
Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst (2000). Templates for
the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM Publications,
Philadelphia, PA.
V.A. Barker, L.S. Blackford, J. Dongarra, J. Du Croz, S. Hammarling, M. Marinova, J. Was
niewski, and P. Yalamov (2001). LAPACK95 Users' Guide, SIAM Publications, Philadel
phia.

xviii OTHER BOOKS
M ATLAB
D.J. Higham and N.J. Higham (2005). MATLAB Guide, second edition, SIAM Publications,
Philadelphia, PA.
R. Pratap (2006). Getting Started with Matlab 7, Oxford University Press, New York.
C.F. Van Loan and D. Fan (2009). Insight Through Computing: A Matlab Introduction to
Computational Science and Engineering, SIAM Publications, Philadelphia, PA.
Matrix Algebra and Analysis
R. Horn and C. Johnson (1985). Matrix Analysis, Cambridge University Press, New York.
G.W. Stewart and J. Sun (1990). Matrix Perturbation Theory, Academic Press, San Diego.
R. Horn and C. Johnson (1991). Topics in Matrix Analysis, Cambridge University Press, New
York.
D.S. Bernstein (2005). Matrix Mathematics, Theory, Facts, and Formulas with Application to
Linear Systems Theory, Princeton University Press, Princeton, NJ.
L. Hogben (2006). Handbook of Linear Algebra, Chapman and Hall, Boca Raton, FL.
Scientific Computing/Numerical Analysis
G.W. Stewart (1996). Ajternotes on Numerical Analysis, SIAM Publications, Philadelphia,
PA.
C.F. Van Loan (1997). Introduction to Scientific Computing: A Matrix-Vector Approach Using
Matlab, Prentice Hall, Upper Saddle River, NJ.
G.W. Stewart (1998). Ajternotes on Numerical Analysis: Ajternotes Goes to Graduate School,
M.T. Heath (2002). Scientific Computing: An Introductory Survey, second edition), McGraw
Hill, New York.
C.B. Moler (2008) Numerical Computing with MATLAB, revised reprint, SIAM Publications,
Philadelphia, PA.
G. Dahlquist and A. Bjorck (2008). Numerical Methods in Scientific Computing, Vol. 1,
U. Ascher and C. Greif (2011). A First Course in Numerical Methods, SIAM Publications,
Philadelphia, PA.

Useful URLs
GVL4
MATLAB demo scripts and functions, master bibliography, list of errata.
http://www.cornell.edu/cv/GVL4
Netlib
Huge repository of numerical software including LAPACK.
http://www.netlib.org/index.html
Matrix Market
Test examples for matrix algorithms.
http://math.nist.gov/MatrixMarket/
Matlab Central
Matlab functions, demos, classes, toolboxes, videos.
http://www.mathworks.com/matlabcentral/
University of Florida Sparse Matrix Collections
Thousands of sparse matrix examples in several formats.
http://www.cise.ufl.edu/research/sparse/matrices/
Pseudospectra Gateway
Grapical tools for pseudospectra.
http://www.cs.ox.ac.uk/projects/pseudospectra/
ARPACK
Software for large sparse eigenvalue problems
http://www.caam.rice.edu/software/ARPACK/
Innovative Computing Laboratory
State-of-the-art high performance matrix computations.
http://icl.cs.utk.edu/
xix

Common Notation
R, R
n
, Rmxn
<C, ccn
'
ccm xn
a;i , A(i,j), [A);3
u
fl(.)
11xllp
llAllP' IIAllF
length(x)
Kp(A)
IAI
AT,AH
house(x)
givens( a, b)
ran(A)
null(A)
span{v1,...,vn}
dim(S)
rank(A)
det(A)
tr(A)
vec(A)
reshape(A, p,q)
Re(A), lm(A)
diag(d1,...,dn)
In
e;
£n, 'Dn, Pp,q
a;(A)
O"max(A), O"min(A)
dist(S1,S2)
sep(A1,A2)
A(A)
A;(A)
Amax(A), Amin(A)
p(A)
JC(A,q,j)
set of real numbers, vectors, and matrices (p. 2)
set of complex numbers, vectors, and matrices (p. 13)
(i,j) entry of a matrix (p. 2)
unit roundoff (p. 96)
floating point operator (p. 96)
p-norm of a vector (p. 68)
p-norm and Frobenius norm of a matrix (p. 71)
dimension of a vector (p. 236)
p-norm condition (p. 87)
absolute value of a matrix (p. 91)
transpose and conjugate transpose (p. 2, 13)
Householder vector (p. 236)
cosine-sine pair (p. 240 )
minimum-norm least squares solution (p. 260)
range of a matrix (p. 64 )
nullspace of a matrix (p. 64)
span defined by vectors (p. 64 )
dimension of a subspace (p. 64 )
rank of a matrix (p. 65)
determinant of a matrix (p. 66)
trace of a matrix (p. 327)
vectorization of a matrix (p. 28)
reshaping a matrix (p. 28)
real and imaginary parts of a matrix (p. 13 )
diagonal matrix (p. 18)
n-by-n identity matrix (p. 19)
ith column of the identity matrix (p. 19)
exchange, downshift, and perfect shuffle permutations (p. 20)
ith largest singular value (p. 77)
largest and smallest singular value (p. 77)
distance between two subspaces (p. 82)
separation between two matrices (p. 360)
set of eigenvalues (p. 66)
ith largest eigenvalue of a symmetric matrix (p. 66)
largest and smallest eigenvalue of a symmetric matrix (p. 66)
spectral radius (p. 349)
Krylov subspace (p. 548)
XXI

Chapter 1
Matrix Multiplication
1.2 Structure and Efficiency
1.4 Fast Matrix-Vector Products
1.5 Vectorization and Locality
The study of matrix computations properly begins with the study of various
matrix multiplication problems. Although simple mathematically, these calculations
are sufficiently rich to develop a wide range of essential algorithmic skills.
In §1.1 we examine several formulations of the matrix multiplication update prob
lem C = C + AB. Partitioned matrices are introduced and used to identify linear
algebraic "levels" of computation.
If a matrix has special properties, then various economies are generally possible.
For example, a symmetric matrix can be stored in half the space of a general matrix.
A matrix-vector product may require much less time to execute if the matrix has many
zero entries. These matters are considered in §1.2.
A block matrix is a matrix whose entries are themselves matrices. The "lan
guage" of block matrices is developed in §1.3. It supports the easy derivation of matrix
factorizations by enabling us to spot patterns in a computation that are obscured at
the scalar level. Algorithms phrased at the block level are typically rich in matrix
matrix multiplication, the operation of choice in many high-performance computing
environments. Sometimes the block structure of a matrix is recursive, meaning that
the block entries have an exploitable resemblance to the overall matrix. This type of
connection is the foundation for "fast" matrix-vector product algorithms such as vari
ous fast Fourier transforms, trigonometric transforms, and wavelet transforms. These
calculations are among the most important in all of scientific computing and are dis
cussed in §1.4. They provide an excellent opportunity to develop a facility with block
matrices and recursion.
1

2 Chapter 1. Matrix Multiplication
The last two sections set the stage for effective, "large-n" matrix computations. In
this context, data locality affects efficiency more than the volume of actual arithmetic.
Having an ability to reason about memory hierarchies and multiprocessor computation
is essential. Our goal in §1.5 and §1.6 is to build an appreciation for the attendant
issues without getting into system-dependent details.
Reading Notes
The sections within this chapter depend upon each other as follows:
§1.1 ---+ §1.2 ---+ §1.3 ---+ §1.4
.!.
§1.5 ---+ §1.6
Before proceeding to later chapters, §1.1, §1.2, and §1.3 are essential. The fast trans
form ideas in §1.4 are utilized in §4.8 and parts of Chapters 11 and 12. The reading of
§1.5 and §1.6 can be deferred until high-performance linear equation solving or eigen
value computation becomes a topic of concern.
Matrix computations are built upon a hierarchy of linear algebraic operations. Dot
products involve the scalar operations of addition and multiplication. Matrix-vector
multiplication is made up of dot products. Matrix-matrix multiplication amounts to
a collection of matrix-vector products. All of these operations can be described in
algorithmic form or in the language of linear algebra. One of our goals is to show
how these two styles of expression complement each other. Along the way we pick up
notation and acquaint the reader with the kind of thinking that underpins the matrix
computation area. The discussion revolves around the matrix multiplication problem,
a computation that can be organized in several ways.
1.1.1 Matrix Notation
Let R. designate the set of real numbers. We denote the vector space of all m-by-n real
matrices by R.'nxn:
If a capital letter is used to denote a matrix (e.g., A, B, A), then the corresponding
lower case letter with subscript ij refers to the (i, j) entry (e.g., aii• bii• 8i3). Sometimes
we designate the elements of a matrix with the notation [A ]ii or A(i, j).
1.1.2 Matrix Operations
Basic matrix operations include transposition (R.mxn ---+ R.''xm ),

1.1. Basic Algorithms and Notation
addition (IRm x n X IRm x n --+ Ilf" xn),
C = A + B �
scalar-matri.r, multiplication (IR x
IRm x n --+ IRm x n)
,
C = aA
and matrix-matrix multiplication (Hrnx p x
wx n --+ IRm x n),
C =AB
1'
Cij = L aikbkj.
k=l
3
Pointwise matrix operations are occasionally useful, especially pointwise multiplication
(Hrnx n X IRm x n --+ IRm x n),
C = A .* B �
and pointwise division (Hrnxn x
nrnxn --+ R"x "),
C = A ./B
Of course, for pointwise division to make sense, the "denominator matrix" must have
nonzero entries.
1.1.3 Vector Notation
Let IRn
denote the vector space of real n-vectors:
X= Xi E IR.
We refer to x; as the ith component of x. Depending upon context, the alternative
notations [x]i and x(i) are sometimes used.
Notice that we are identifying Hr' with m.n xi
and so the members of IRn
are
column vectors. On the other hand, the elements of IR
1 x n
are row vectors:
If :c is a column vector, then y = xT is a row vector.
1.1.4 Vector Operations
Assume that a E JR, x E JR", and y E JR". Basic vector operations include scalar-vector
multiplication,
z = ax Zi axi,
vector addition,
z = x + y Zi Xi + Yi,

and the inner product (or dot product),
c
n
LXiYi·
i=l
A particularly important operation, which we write in update form, is the saxpy:
y = ax + y ===} Yi = axi + Yi
Here, the symbol "= " is used to denote assignment, not mathematical equality. The
vector y is being updated. The name "saxpy" is used in LAPACK, a software package
that implements many of the algorithms in this book. "Saxpy" is a mnemonic for
"scalar a x plus y." See LAPACK.
Pointwise vector operations are also useful, including vector multiplication,
Z = X.* y Zi = XiYi,
and vector division,
z = x./y Zi = Xi/Yi·
1.1.5 The Computation of Dot Products and Saxpys
Algorithms in the text are expressed using a stylized version of the MATLAB language.
Here is our first example:
Algorithm 1.1.1 (Dot Product) If x, y E JRn, then this algorithm computes their dot
product c = xTy.
c = O
for i = l:n
c = c + x(i)y(i)
end
It is clear from the summation that the dot product of two n-vectors involves n multi
plications and n additions. The dot product operation is an "O(n)" operation, meaning
that the amount of work scales linearly with the dimension. The saxpy computation is
also O(n):
Algorithm 1.1.2 (Saxpy) If x, y E JRn and a E JR, then this algorithm overwrites y
with y + ax.
for i = l:n
y(i) = y(i) + ax(i)
end
We stress that the algorithms in this book are encapsulations of important computa
tional ideas and are not to be regarded as "production codes."

1.1.6 Matrix-Vector Multiplication and the Gaxpy
Suppose A E Rmxn and that we wish to compute the update
y = y+Ax
5
where x E Rn and y E Rm are given. This generalized saxpy operation is referred to as
a gaxpy. A standard way that this computation proceeds is to update the components
one-at-a-time: n
Yi = Yi +L ai;x;,
j=l
This gives the following algorithm:
i = l:m.
Algorithm 1.1.3 (Row-Oriented Gaxpy) If A E Rmxn, x E Rn, and y E Rm, then this
algorithm overwrites y with Ax + y.
for i = l:m
end
for j = l:n
y(i) = y(i) + A(i,j)x(j)
end
Note that this involves O(mn) work. If each dimension of A is doubled, then the
amount of arithmetic increases by a factor of 4.
An alternative algorithm results if we regard Ax as a linear combination of A's
columns, e.g.,
[1 2] [1·7+2·8] [1] [2] [23]
3 4 [ � ] = 3.7+4.8 = 7 3 +8 4 = 53
5 6 5·7+6·8 5 6 83
Algorithm 1.1.4 (Column-Oriented Gaxpy) If A E Rmxn, x E Rn, and y E Rm, then
this algorithm overwrites y with Ax + y.
for j = l :n
end
for i = l:m
y(i) = y(i) +A(i, j) ·x(j)
end
Note that the inner loop in either gaxpy algorithm carries out a saxpy operation. The
column version is derived by rethinking what matrix-vector multiplication "means" at
the vector level, but it could also have been obtained simply by interchanging the order
of the loops in the row version.
1.1.7 Partitioning a Matrix into Rows and Columns
Algorithms 1.1.3and 1.1.4access the data in A by row and by column, respectively. To
highlight these orientations more clearly, we introduce the idea of a partitioned matrix.

6 Chapter l. Matrix Multiplication
From one point of view, a matrix is a stack of row vectors:
A E JRmxn A = [T l· rk E Bl" ·
rm
This is called a row partition of A. Thus, if we row partition
then we are choosing to think of A as a collection of rows with
rf = [ 1 2 ], rf = [ 3 4 ], rf = [ 5 6 ] .
With the row partitioning (1.1.1), Algorithm 1.1.3 can be expressed as follows:
for i = l:m
Yi = Yi + r[x
end
Alternatively, a matrix is a collection of column vectors:
A E Dlmxn ¢:::::> A = [CJ I··· I Cn ] , Ck E 1Rm.
(1.1.1)
(1.1.2)
We refer to this as a column partition of A. In the 3-by-2 example above, we thus
would set c1 and c2 to be the first and second columns of A, respectively:
With (1.1.2) we see that Algorithm 1.1.4 is a saxpy procedure that accesses A by
columns:
for j = l:n
y = y +XjCj
end
In this formulation, we appreciate y as a running vector sum that undergoes repeated
saxpy updates.
1.1.8 The Colon Notation
A handy way to specify a column or row of a matrix is with the "colon" notation. If
A E 1Rmxn, then A(k, :) designates the kth row, i.e.,
A(k, :) = [ak1, . .. , akn] .

The kth column is specified by
A(:,k) =
[a1k l
a�k •
With these conventions we can rewrite Algorithms 1.1.3 and 1.1.4 as
for i = l:m
and
y(i) = y(i) + A(i, :}·x
end
for j = l:n
y = y + x(j) ·A(:, j)
end
7
respectively. By using the colon notation, we are able to suppress inner loop details
and encourage vector-level thinking.
1.1.9 The Outer Product Update
As a preliminary application of the colon notation, we use it to understand the outer
product update
A = A + xyr,
The outer product operation xyT "looks funny" but is perfectly legal, e.g.,
[�] [ 4 5] = [ ! 1
5
0 ].
3 12 15
This is because xyT is the product oftwo "skinny" matrices and the number ofcolumns
in the left matrix x equals the number of rows in the right matrix yT. The entries in
the outer product update are prescribed hy
for ·i = l:m
end
for j = l:n
ll;j = llij + XiYj
end
This involves 0(mn) arithmetic operations. The mission of the j loop is to add a
multiple of yT to the ith row of A, i.e.,
for i = l:m
A(i, :) = A(i, :) + x(i) ·yT
end

On the other hand, if we make the i-loop the inner loop, then its task is to add a
multiple of x to the jth column of A:
for j = l :n
A(:, j) = A(:,j) +y(j) ·x
end
Note that both implementations amount to a set of saxpy computations.
1.1.10 Matrix-Matrix Multiplication
Consider the 2- by- 2 matrix-matrix multiplication problem. In the dot product formu
lation, each entry is computed as a dot product:
[ 1 2 ] [ 5 6 ] - [ 1 · 5 + 2 · 7 1 · 6+2 · 8 ]
3 4 7 8 -
3 · 5 + 4 · 7 3 · 6+4 · 8 .
In the saxpy version, each column in the product is regarded as a linear combination
of left-matrix columns:
Finally, in the outer product version, the result is regarded as the sum of outer products:
Although equivalent mathematically, it turns out that these versions of matrix multi
plication can have very different levels of performance because of their memory traffic
properties. This matter is pursued in §1.5. For now, it is worth detailing the various
approaches to matrix multiplication because it gives us a chance to review notation
and to practice thinking at different linear algebraic levels. To fix the discussion, we
focus on the matrix-matrix update computation:
C = C + AB,
The update C = C+AB is considered instead ofjust C = AB because it is the more
typical situation in practice.
1.1.11 Scalar-Level Specifications
The starting point is the familiar triply nested loop algorithm:
Algorithm 1.1.5 (ijk Matrix Multiplication) If A E 1Rmxr, B E 1R'"xn, and C E 1Rmxn
are given, then this algorithm overwrites C with C + AB.
for i = l:m
end
for j = l :n
for k = l :r
end
C(i,j) = C(i,j) + A(i, k)· B(k,j)
end

1.1. Basic Algorithms and Notation 9
This computation involves O(mnr) arithmetic. If the dimensions are doubled, then
work increases by a factor of 8.
Each loop index in Algorithm 1.1.5 has a particular role. (The subscript i names
the row, j names the column, and k handles the dot product.) Nevertheless, the
ordering of the loops is arbitrary. Here is the (mathematically equivalent) jki variant:
for j = l:n
for k = l:r
for i = l:m
C(i,j) = C(i,j) + A(i, k)B(k, j)
end
end
end
Altogether, there are six (= 3!) possibilities:
ijk, jik, ikj, jki, kij, kji.
Each features an inner loop operation (dot product or saxpy) and each has its own
pattern of data flow. For example, in the ijk variant, the inner loop oversees a dot
product that requires access to a row of A and a column of B. The jki variant involves
a saxpy that requires access to a column of C and a column of A. These attributes are
summarized in Table 1.1.1 together with an interpretation of what is going on when
Loop Inner Inner Two Inner Loop
Order Loop Loops Data Access
ijk dot vector x matrix A by row, B by column
jik dot matrix x vector A by row, B by column
ikj saxpy row gaxpy B by row, C by row
jki saxpy column gaxpy A by column, C by column
kij saxpy row outer product B by row, C by row
kji saxpy column outer product A by column, C by column
Table 1.1.1. Matrix multiplication: loop orderings and properties
the middle and inner loops are considered together. Each variant involves the same
amount of arithmetic, but accesses the A, B, and C data differently. The ramifications
of this are discussed in §1.5.
1.1.12 A Dot Product Formulation
The usual matrix multiplication procedure regards A-B as an array of dot products to
be computed one at a time in left-to-right, top-to-bottom order. This is the idea behind
Algorithm 1.1.5 which we rewrite using the colon notation to highlight the mission of
the innermost loop:

Algorithm 1.1.6 (Dot Product Matrix Multiplication) If A E JR.mxr, B E Ill'"x", and
C E JR.mxn are given, then this algorithm overwrites C with C + AB.
for i = l:m
for j = l:n
C(i,j) = C(i,j) + A(i, :)·B(:,j)
end
end
In the language of partitioned matrices, if
and
then Algorithm 1.1.6 has this interpretation:
for i = l:m
end
for j = l:n
Cij = Cij + afbj
end
Note that the purpose of the j-loop is to compute the ith row of the update. To
emphasize this we could write
for i = l:m
end
where
c'!' =CT + a'!'B
• • •
is a row partitioning of C. To say the same thing with the colon notation we write
for i = l:m
C(i, : ) = C(i, :) + A(i, :)·B
end
Either way we see that the inner two loops of the ijk variant define a transposed gaxpy
operation.
1.1.13 A Saxpy Formulation
Suppose A and C are column-partitioned as follows:
c = [ C1 I. . . I Cn l .

By comparing jth columns in C = C + AB we sec that
1'
Cj = Cj + L akbkj,
k=l
j = l:n.
These vector sums can be put together with a sequence of saxpy updates.
11
Algorithm 1.1.7 (Saxpy Matrix Multiplication) If the matrices A E nr
xr, B E IR"xn
,
and C E JRmxn
for j = l:n
for k = l:r
end
C(:,j) = C(:,j) + A(:, k)·B(k,j)
end
Note that the k-loop oversees a gaxpy operation:
for j = l:n
C(:,j) = C(:,j) + AB(:,j)
end
1.1.14 An Outer Product Formulation
Consider the kij variant of Algorithm 1.1.5:
for k = l:r
end
for j = l:n
end
for i = l:m
C(i,j) = C(i,j) + A(i, k)·B(k,j)
end
The inner two loops oversee the outer product update
where
A� I a, I Ia, I and B � [:rl
with ak E JRm and bk E JR
n. This renders the following implementation:
(1.1.3)
Algorithm 1.1.8 (Outer Product Matrix Multiplication) If the matrices A E JRmxr
,
B E IR"xn
, and C E JRmxn
for k = l:r
C = C + A(:, k)·B(k, :)
end
Matrix-matrix multiplication is a sum of outer products.

1.1.15 Flops
One way to quantify the volume ofwork associated with a computation is to count flops.
A flop is a floating point add, subtract, multiply, or divide. The number of flops in a
given matrix computation is usually obtained by summing the amount of arithmetic
associated with the most deeply nested statements. For matrix-matrix multiplication,
e.g., Algorithm 1.1.5, this is the 2-flop statement
C(i,j) = C(i,j) + A(i, k) ·B(k,j).
If A E Rmxr, B E m,rxn, and C E Rmxn, then this statement is executed mnr times.
Table 1.1.2 summarizes the number offlops that are required for the common operations
detailed above.
Operation Dimension Flops
a = xTy x, y E Rn 2n
y = y + ax a E JR., x, y E Rn 2n
y = y + Ax A E Rmxn, x E Rn, y E Rm 2mn
A = A + yxT A E Rmxn, x E Rn, y E Rm 2mn
C = C + AB A E Rmxr, B E m,rxn, C E Rmxn 2rnnr
Table 1.1.2. Important flop counts
1.1.16 Big-Oh Notation/Perspective
In certain settings it is handy to use the "Big-Oh" notation when an order-of-magnitude
assessment of work suffices. (We did this in §1.1.1.) Dot products are O(n), matrix
vector products are O(n2), and matrix-matrix products arc O(n3). Thus, to make
efficient an algorithm that involves a mix of these operations, the focus should typically
be on the highest order operations that are involved as they tend to dominate the overall
computation.
1.1.17 The Notion of 11Level" and the BLAS
The dot product and saxpy operations are examples of level-1 operations. Level-I
operations involve an amount of data and an amount of arithmetic that are linear in
the dimension of the operation. An m-by-n outer product update or a gaxpy operation
involves a quadratic amount ofdata (0(mn)) and a quadratic amount ofwork (0(mn)).
These are level-2 operations. The matrix multiplication update C = C+AB is a level-3
operation. Level-3 operations are quadratic in data and cubic in work.
Important level-1, level-2, and level-3 operations are encapsulated in the "BLAS,''
an acronym that stands for H_asic Linear Algebra .Subprograms. See LAPACK. The design
of matrix algorithms that are rich in level-3 BLAS operations is a major preoccupation
of the field for reasons that have to do with data reuse (§1.5).

1.1.18 Verifying a Matrix Equation
13
In striving to understand matrix multiplication via outer products, we essentially es
tablished the matrix equation
r
AB= Lakb[, (1.1.4)
k=l
where the ak and bkare defined by the partitionings in (1.1.3).
Numerous matrix equations are developed in subsequent chapters. Sometimes
they are established algorithmically as above and other times they are proved at the
ij-component level, e.g.,
lt,••brL � t,[••bIL; �
t,...b,; �
[ABlw
Scalar-level verifications such as this usually provide little insight. However, they are
sometimes the only way to proceed.
1.1.19 Complex Matrices
On occasion we shall he concerned with computations that involve complex matrices.
The vector space of m-by-n complex matrices is designated by <Cmxn. The scaling,
addition, and multiplication of complex matrices correspond exactly to the real case.
However, transposition becomes conjugate transposition:
c =AH :::::::} Cij =llji·
The vector space of complex n-vectors is designated by <Cn. The dot product ofcomplex
n-vectors x and y is prescribed by
n
s= xHy = LXiYi·
i=l
IfA = B + iC E <Cmxn, then we designate the real and imaginary parts ofA by Re(A) =
Band lm(A)= C, respectively. The conjugate ofA is the matrix A = (aij)·
Problems
Pl.1.1 Suppose A E Rnxn and x E Rr arc given. Give an algorithm for computing the first column
of M = (A - :r.11} · · · (A - Xrl).
Pl.1.2 In a conventional 2-by-2 matrix multiplication C = AB, there are eight multiplications: a11bu,
aub12, a21bu, a21b12, ai2b21, a12�2. a22b21, and a22b22. Make a table that indicates the order that
these multiplications are performed for the ijk, jik, kij, ikj, jki, and kji matrix multiplication
algorithms.
Pl.1.3 Give an O(n2) algorithm for computing C = (xyT)k where x and y are n-vectors.
Pl.1.4 Suppose D = ABC where A E Rmxn, B E wxv, and C E wxq. Compare the flop count of
an algorithm that computes D via the formula D = (AB)C versus the flop count for an algorithm that
computes D using D = A(BC). Under what conditions is the former procedure more flop-efficient
than the latter?
Pl.1.5 Suppose we have real n-by-n matrices C, D, E, and F. Show how to compute real n-by-n
matrices A and B with just three real n-by-n matrix multiplications so that
A + iB = (C + iD)(E + iF).

14
Hint: Compute W = (C + D)(E - F).
Pl.1.6 Suppose W E Rnxn is defined by
n
n
Wij L LXipYpqZqj
p=l q=l
Chapter 1. Matrix Multiplication
where X, Y, Z E Rnxn. If we use this formula for each Wij then it would require O(n4) operations to
set up W. On the other hand,
= tXip (tYpqZqj) =
p=l q=l
where U = YZ. Thus, W = XU = XYZ and only O(n3) operations are required.
by
Use this methodology to develop an O(n3) procedure for computing the n-by-n matrix A defined
n
n n
Q;j = L L L E(k1 , i)F(k1 , i)G(k2, k1)H(k2, k3)F(k2, k3)G(k3,j)
k1 =l k2=1 k3=1
where E, F, G, H E Rnxn. Hint. Transposes and pointwise products are involved.
Notes and References for §1.1
For an appreciation of the BLAS and their foundational role, sec:
C.L. Lawson, R.J. Hanson, D.R. Kincaid, and F.T. Krogh (1979). "Basic Linear Algebra Subprograms
for FORTRAN Usage," A CM Trans. Math. Softw. 5, 308-323.
J.J. Dongarra, J. Du Croz, S. Hammarling, and R.J. Hanson (1988). "An Extended Set of Fortran
Basic Linear Algebra Subprograms," ACM Trans. Math. Softw. 14, 1-17.
J.J. Dongarra, J. Du Croz, LS. Duff, and S.J. Hammarling (1990). "A Set of Level 3 Basic Linear
Algebra Subprograms," A CM Trans. Math. Softw. 16, 1- 17.
B. Kagstri:im, P. Ling, and C. Van Loan (1991). "High-Performance Level-3 BLAS: Sample Routines
for Double Precision Real Data," in High Performance Computing II, M. Durand and F. El Dabaghi
(eds.), North-Holland, Amsterdam, 269-281.
L.S. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, M. Heroux, L. Kaufman,
A. Lumsdaine, A. Petitet, R. Pozo, K. Remington, and R.C. Whaley (2002). "An Updated Set of
Basic Linear Algebra Subprograms (BLAS)" , ACM Trans. Math. Softw. 28, 135--151.
The order in which the operations in the matrix product Ai · · · A,. are carried out affects the flop
count if the matrices vary in dimension. (See Pl.1.4.) Optimization in this regard requires dynamic
programming, see:
T.H. Corman, C.E. Leiserson, R.L. Rivest, and C. Stein (2001). Introduction to Algorithms, MIT
Press and McGraw-Hill, 331-339.
1.2 Structure and Efficiency
The efficiency of a given matrix algorithm depends upon several factors. Most obvious
and what we treat in this section is the amount of required arithmetic and storage. How
to reason about these important attributes is nicely illustrated by considering exam
ples that involve triangular matrices, diagonal matrices, banded matrices, symmetric
matrices, and permutation matrices. These are among the most important types of
structured matrices that arise in practice, and various economies can be realized if they
are involved in a calculation.

1.2. Structure and Efficiency 15
1.2.1 Band Matrices
A matrix is sparse if a large fraction of its entries are zero. An important special case
is the band matrix. We say that A E nrnxn has lower bandwidth p if aii = 0 whenever
i > j +p and upper bandwidth q if j > i + q implies aii = 0. Here is an example of an
8-by-5 matrix that has lower bandwidth 1 and upper bandwidth 2:
x x x 0 0
x x x x 0
0 x x x x
A 0 0 x x x
0 0 0 x x
0 0 0 0 x
0 0 0 0 0
0 0 0 0 0
The x 's designate arbitrary nonzero entries. This notation is handy to indicate the
structure of a matrix and we use it extensively. Band structures that occur frequently
are tabulated in Table 1.2.1.
Type Lower Upper
of Matrix Bandwidth Bandwidth
Diagonal 0 0
Upper triangular 0 n - l
Lower triangular m - 1 0
Tridiagonal 1 1
Upper bidiagonal 0 1
Lower bidiagonal 1 0
Upper Hessenberg 1 n - l
Lower Hessenberg m - 1 1
Table 1.2. 1. Band terminology for m-by-n matrices
1.2.2 Triangular Matrix Multiplication
To introduce band matrix "thinking" we look at the matrix multiplication update
problem C = C + AB where A, B, and C are each n-by-n and upper triangular. The
3-by-3 case is illuminating:
AB
It suggestH that the product is upper triangular and that its upper triangular entries
are the result of abbreviated inner products. Indeed, since aikbkj = 0 whenever k < i
or j < k, we see that the update has the form
j
Cij = Cij + Laikbki
k=i

for all i and j that satisfy i � j. This yields the following algorithm:
Algorithm 1.2.1 (Triangular Matrix Multiplication) Given upper triangular matrices
A, B, C E Rnxn, this algorithm overwrites C with C + AB.
for i = l:n
for j = i:n
for k = i:j
end
end
C(i, j) = C(i, j) + A(i, k) ·B(k, j)
end
1.2.3 The Colon Notation-Again
The dot product that the k-loop performs in Algorithm 1.2.1 can be succinctly stated
if we extend the colon notation introduced in §1.1.8. If A E Rmxn and the integers p,
q, and r satisfy 1 � p � q � n and 1 � r � m, then
A(r,p:q) = [ arp I · · · I arq ] E Rlx(q-p+l) .
Likewise, if 1 � p � q � m and 1 � c � n, then
A(p:q, c) = ; E Rq-p+I.
[ape
l
aqc
With this notation we can rewrite Algorithm 1.2.1 as
for i = l:n
end
for j = i:n
C(i, j) = C(i, j) + A(i, i:j) ·B(i:j,j)
end
This highlights the abbreviated inner products that are computed by the innermost
loop.
1.2.4 Assessing Work
Obviously, upper triangular matrix multiplication involves less arithmetic than full
matrix multiplication. Looking at Algorithm 1.2.1, we see that Cij requires 2(j - i + 1)
flops if (i � j). Using the approximations
q
LP
p=I
and q
LP2 =
p=I
q(q + 1)
= ::::::
2
q3 q2 q
- + - + -
3 2 6
q2
2
q3
::::::
3 '

we find that triangular matrix multiplication requires one-sixth the number of flops as
full matrix multiplication:
��2( " . 1) �
n
�
1
2 · � 2(n - i + 1)2
� -2 n3
L....J L....J J - i + = L....J
L....J J ::::::: L....J
= L....J i ::::::: - .
i=l j=i i=l j= l i=l
2 i=l
3
We throw away the low-order terms since their inclusion does not contribute to what
the flop count "says." For example, an exact flop count of Algorithm 1.2.1 reveals
that precisely n3/3 + n2 + 2n/3 flops are involved. For large n (the typical situation
of interest) we see that the exact flop count offers no insight beyond the simple n3/3
accounting.
Flop counting is a necessarily crude approach to the measurement of program
efficiency since it ignores subscripting, memory traffic, and other overheads associ
ated with program execution. We must not infer too much from a comparison of flop
counts. We cannot conclude, for example, that triangular matrix multiplication is six
times faster than full matrix multiplication. Flop counting captures just one dimen
sion of what makes an algorithm efficient in practice. The equally relevant issues of
vectorization and data locality are taken up in §1.5.
1.2.5 Band Storage
Suppose A E JR.nx n has lower bandwidth p and upper bandwidth q and assume that p
and q are much smaller than n. Such a matrix can be stored in a (p+ q + 1)-by-n array
A.band with the convention that
aij = A.band(·i - j + q + 1, j) (1.2.1)
for all (i, j) that fall inside the band, e.g.,
a1 1 ai2 ai3 0 0 0
a21 a22 a23 a24 0 0
[.:,
* a13 a24 a35 ll45
]
0 a32 a33 a34 a35 0 ai2 a23 a34 a45 a55
=>
0 0 a43 a44 a45 a46 a22 a33 a44 a55 a55
0 0 0 as4 a5s a55 a21 a32 a43 as4 a55 *
0 0 0 0 a55 a55
Here, the "*" entries are unused. With this data structure, our column-oriented gaxpy
algorithm (Algorithm 1.1.4) transforms to the following:
Algorithm 1.2.2 (Band Storage Gaxpy) Suppose A E nnx
n has lower bandwidth p
and upper bandwidth q and is stored in the A.band format (1.2.1). Ifx, y E JR.n, then
this algorithm overwrites y with y + Ax.
for j = l:n
end
a1 = max(l,j - q), a2 = min(n, j + p)
!31 = max(l, q + 2 - j), f32 = !31 + a2 - a1
y(a1:a2) = y(a1:a2) + A.band(f31:{32, j)x(j)

Notice that by storing A column by column in A.band, we obtain a column-oriented
saxpy procedure. Indeed, Algorithm 1.2.2 is derived from Algorithm 1.1.4 by recog
nizing that each saxpy involves a vector with a small number of nonzeros. Integer
arithmetic is used to identify the location of these nonzeros. As a result of this careful
zero/nonzero analysis, the algorithm involves just 2n(p + q + 1) flops with the assump
tion that p and q are much smaller than n.
1.2.6 Working with Diagonal Matrices
Matrices with upper and lower bandwidth zero are diagonal. If D E 1Rmxn is diagonal,
then we use the notation
D = diag(di, . . . , dq), q = min{m, n} ¢=::::> di = dii·
Shortcut notations when the dimension is clear include diag(d) and diag(di)· Note
that if D = diag(d) E 1Rnxn and x E 1Rn, then Dx = d. * x. If A E 1Rmxn, then pre
multiplication by D = diag(d1, . . . , dm) E 1Rmxm scales rows,
B = DA B(i, :) = di ·A(i, :), i = l:m
while post-multiplication by D = diag(d1, . . . , dn) E 1Rnxn scales columns,
B = AD B(:, j) = drA(:,j), j = l:n.
Both of these special matrix-matrix multiplications require mn flops.
1.2.7 Symmetry
A matrix A E 1Rnxn is symmetric if AT = A and skew-symmetric if AT = -A. Likewise,
a matrix A E <Cnxn is Hermitian if A" = A and skew-Hermitian if A" = -A. Here
are some examples:
Symmetric:
2 3
l
4 5 '
5 6
Hermitian:
2-3i 4-5i l
6 7-8i '
7+8i 9
Skew-Symmetric: 2 0 -5 , Skew-Hermitian: 2+3i 6i -7+8i .
[ 0 -2 3 l [ i -2+3i -4+5i l
-3 5 0 4+5i 7+8i 9i
For such matrices, storage requirements can be halved by simply storing the lower
triangle of elements, e.g.,
2 3
l
4 5
5 6
A.vec = [ 1 2 3 4 5 6 ] .
For general n, we set
A.vec((n - j/2)(j - 1)+i) aij 1 :::; j :::; i :::; n. (1.2.2)

Here is a column-oriented gaxpy with the matrix A represented in A.vec.
Algorithm 1.2.3 (Symmetric Storage Gaxpy) Suppose A E 1Rnxn is symmetric and
stored in the A.vec style (1.2.2). If x, y E 1Rn, then this algorithm overwrites y with
y + Ax.
for j = l:n
end
for i = l:j - 1
y(i) = y(i) + A.vec((i - l)n - i(i - 1)/2 + j)x(j)
end
for i = j:n
y(i) = y(i) + A.vec((j - l)n - j(j - 1)/2 + i)x(j)
end
This algorithm requires the same 2n2 flops that an ordinary gaxpy requires.
1.2.8 Permutation Matrices and the Identity
We denote the n- by-n identity matrix by In, e.g.,
We use the notation ei to designate the 'ith column of In. If the rows ofIn are reordered,
then the resulting matrix is said to be a permutation matrix, e.g.,
P �
[� � nJ (1.2.3)
The representation of an n- by-n permutation matrix requires just an n-vector of inte
gers whose components specify where the l's occur. For example, if v E 1Rn has the
property that vi specifies the column where the "l" occurs in row i, then y = Px implies
that Yi = Xv; : i = l:n. In the example above, the underlying v-vector is v = [ 2 4 3 1 ].
1.2.9 Specifying Integer Vectors and Submatrices
For permutation matrix work and block matrix manipulation (§1.3) it is convenient to
have a method for specifying structured integer vectors of subscripts. The MATLAB
colon notation is again the proper vehicle and a few examples suffice to show how it
works. If n = 8, then
v = 1:2:n ==:} v = [ 1 3 5 7] ,
v = n:-1:1 ==:} v = [ s 1 6 5 4 3 2 1 J ,
v = [ (1:2:n) (2:2:n) ] ==:} v = [ 1 3 5 7 2 4 6 8 J .

Suppose A E Rmxn and that v E R'. and w E R8 are integer vectors with the
property that 1 ;; Vi � m and 1 � Wi � n. If B = A(v, w), then B E wxs is the
matrix defined by bi; = av,,wi for i = l:r and j = l:s. Thus, if A E R8x8, then
1.2.10 Working with Permutation Matrices
Using the colon notation, the 4-by-4 permutation matrix in (1.2.3) is defined by P =
I4(v, :) where v = [ 2 4 3 l ]. In general, if v E Rn is a permutation of the vector
l:n = [1, 2, . . . , n] and P = In(v, :), then
y = Px ===? y = x(v) ===? Yi = Xv;i i = l:n
y = pTx ===? y(v) = x ===? Yv; = Xi, i = l:n
The second result follows from the fact that Vi is the row index of the "l" in column i
of pT_ Note that PT(Px) = x. The inverse of a permutation matrix is its transpose.
The action of a permutation matrix on a given matrix A E Rmxn is easily de
scribed. If P = Im(v, :) and Q = In(w, :), then PAQT = A(v,w). It also follows that
In(v, :) · In(w, :) = In(w(v), :). Although permutation operations involve no fl.ops, they
move data and contribute to execution time, an issue that is discussed in §1.5.
1.2.11 Three Famous Permutation Matrices
The exchange permutation en turns vectors upside down, e.g.,
In general, if v = n: - 1:1, then the n-by-n exchange permutation is given by en =
In(v,:). No change results if a vector is turned upside down twice and thus, eJ,en =
e� = In.
The downshift permutation 'Dn pushes the components of a vector down one notch
with wraparound, e.g.,
In general, if v = [ (2:n) 1 ], then the n-by-n downshift permutation is given by 'Dn =
In(v, :). Note that V'f:. can be regarded as an upshift permutation.
The mod-p perfect shuffie permutation 'Pp,r treats the components of the input
vector x E Rn, n = pr, as cards in a deck. The deck is cut into p equal "piles" and

reassembled by taking one card from each pile in turn. Thus, if p = 3 and r = 4, then
the piles are x(1:4), x(5:8), and x(9:12) and
x(2:4:12)
y = P3,4x = Ipr([ 1 5 9 2 6 10 3 7 11 4 8 12 ], :)x =
x(3:4:l2)
[x(1:4:12) l
In general, if n = pr, then
Pp,r
and it can be shown that
In([ (l:r:n) (2:r:n) · · · (r:r:n)], :)
P;,r = In ([ (l:p:n) (2:p:n) · · · (p:p:n) ], :).
x(4:4:12)
(1.2.4)
Continuing with the card deck metaphor, P'{;,r reassembles the card deck by placing all
the Xi having i mod p = 1 first, followed by all the Xi having i mod p = 2 second, and
so on.
Problems
Pl.2.1 Give an algorithm that overwrites A with A2 where A E Ir x n. How much extra storage is
required? Repeat for the case when A is upper triangular.
Pl.2.2 Specify an algorithm that computes the first column of the matrix M = (A - >.1!) ···(A - >.rl)
where A E Rn x n is upper Hessenberg and >.1 , . . . , >.,. are given scalars. How many flops are required
assuming that r « n?
Pl.2.3 Give a column saxpy algorithm for the n-by-n matrix multiplication problem C = C + AB
where A is upper triangular and B is lower triangular.
Pl.2.4 Extend Algorithm 1.2.2 so that it can handle rectangular band matrices. Be sure to describe
the underlying data structure.
Pl.2.5 If A = B + iC is Hermitian with B E R'' x n, then it is easy to show that BT = B and
er = -C. Suppose we represent A in an array A.herm with the property that A.herm(i,j) houses
b;j if i ::=:: j and Cij if j > i. Using this data structure, write a matrix-vector multiply function that
computes Re(z) and lm(z) from Re(x) and lm(x) so that z = Ax.
Pl.2.6 Suppose X E R'' x p and A E R'' x n arc given and that A is symmetric. Give an algorithm for
computing B = xrAX assuming that both A and B are to be stored using the symmetric storage
scheme presented in §1.2.7.
Pl.2.7 Suppose a E Rn is given and that A E Rn x n has the property that a;j = ali-il+l· Give an
algorithm that overwrites y with y + Ax where x, y E Rn are given.
Pl.2.8 Suppose a E Rn is given and that A E Rn x n has the property that a;j = a((i+j-l) mod n)+l·
Give an algorithm that overwrites y with y + Ax where x, y E Rn are given.
Pl.2.9 Develop a compact storage scheme for symmetric band matrices and write the corresponding
gaxpy algorithm.
Pl.2.10 Suppose A E Rn x n, u E Rn , and v E Rn are given and that k � n is an integer. Show how to
compute X E R'' x k and Y E R'' x k so that (A + uvT)k = Ak + XYT. How many flops are required?
Pl.2.11 Suppose x E Rn . Write a single-loop algorithm that computes y = V�x where k is a positive
integer and 'Dn is defined in §1.2.11.

Pl.2.12 (a) Verify (1.2.4). (b) Show that P'I,r = 'Pr,p ·
Pl.2.13 The number of n-by-n permutation matrices is n!. How many of these are symmetric?
See LAPACK for a discussion about appropriate data structures when symmetry and/or handedness is
present in addition to
F.G. Gustavson (2008). "The Relevance of New Data Structure Approaches for Dense Linear Al
gebra in the New Multi-Core/Many-Core Environments," in Proceedings of the 7th international
Conference on Parallel Processing and Applied Mathematics, Springer-Verlag, Berlin, 618-621.
The exchange, downshift, and perfect shuffle permutations are discussed in Van Loan (FFT).
A block matrix is a matrix whose entries are themselves matrices. It is a point of view.
For example, an 8-by-15 matrix of scalars can be regarded as a 2-by-3 block matrix
with 4-by-5 entries. Algorithms that manipulate matrices at the block level are often
more efficient because they are richer in level-3 operations. The derivation of many
important algorithms is often simplified by using block matrix notation.
1.3.1 Block Matrix Terminology
Column and row partitionings (§1.1.7) are special cases of matrix blocking. In general,
we can partition both the rows and columns of an m-by-n matrix A to obtain
A
[ Au
Aq1
where m1 + · · · + mq = m, n1 + · · · + nr = n, and A0,e designates the (a, /3) block
(submatrix). With this notation, block A0,e has dimension m0-by-n,e and we say that
A = (A0,e) is a q-by-r block matrix.
Terms that we use to describe well-known band structures for matrices with scalar
entries have natural block analogs. Thus,
[
Au 0 0
l
diag(A11, A22, A33) 0 A22 0
0 0 A33
is block diagonal while the matrices
[Lu 0
Ll u �
[Uu U12 u,,
l
[Tu T12
:,
,],
L = L21 L22 0 U22 U23 , T = �t T22
L31 L32 0 0 U33 T32 T33
are, respectively, block lower triangular, block upper triangular, and block tridiagonal.
The blocks do not have to be square in order to use this block sparse terminology.

1.3. Block Matrices and Algorithms
1.3.2 Block Matrix Operations
Block matrices can be scaled and transposed:
=
[µA11
µA21
µA31
23
Note that the transpose ofthe original (i, j) block becomes the (j, i) block of the result.
Identically blocked matrices can be added by summing the corresponding blocks:
Block matrix multiplication requires more stipulations about dimension. For example,
if
[Au A12 l [ B B ] [AuBu+A12B21 AuB12+A12B22 l
A21 A22 B
u
B
12 = A21B11+A22B21 A21B12+A22B22
A31 A32 21 22 A31B11 +A32B21 A31B12+A32B22
is to make sense, then the column dimensions of A11 , A21, and A31 must each be equal
to the row dimension of both Bu and Bi2- Likewise, the column dimensions of A12,
A22, and A32 must each be equal to the row dimensions of both B21 and B22·
Whenever a block matrix addition or multiplication is indicated, it is assumed
that the row and column dimensions of the blocks satisfy all the necessary constraints.
In that case we say that the operands are partitioned conformably as in the following
theorem.
Theorem 1 . 3 . 1 . If
A =
[ Au
Aq1
B =
P1
and we partition the product C = AB as follows,
c
[ Cu
Cq1
n 1 nr
..
[ Bu
Bs1
then for a = l:q and (3 = l:r we have Ca.{3 = L Aa.-yB-yf3·
-r=l
l
P1
Ps
n,.

Proof. The proof is a tedious exercise in subscripting. Suppose 1 $ a $ q and
1 $ f3 $ r. Set M = m1 + · · · + m0_1 and N = nl + · · · n/3-1· It follows that if
1 $ i $ m0 and 1 $ j $ n13 then
P1+..·P• s P1+··+p.,,
[Ca/3tj = L aM+i,kbk,N+i L:: L::
k=l
s p.,,
')'=1 k=p1+···+P.,,-1+1
s
= L L [Aa'Y]ik [B,,13jkj = L [Aa,,B,,13jij
')'=1 k=l ')'=I
Thus, Ca/3 = A0,1B1,13 + · · · + A0,sBs,/3· D
If you pay attention to dimension and remember that matrices do not commute, i.e.,
AuBu + A12B21 #- B11A11 + B21A12, then block matrix manipulation is just ordinary
matrix manipulation with the aii's and bii's written as Aii's and Bii's!
1.3.3 Submatrices
Suppose A E 1Rmxn. If a = [o:i, . . . , 0:8] and f3 = [{31, . . . , f3t] are integer vectors with
distinct components that satisfy 1 $ ai $ m, and 1 $ !3i $ n, then
[aa1,/31
A(a, {3) = :
ao.,/31
. . . a
"' l
01 ,JJt
. .
. .
. . . aa. ,/3.
is an s-by-t submatrix of A. For example, if A E JR8 x6, a = [2 4 6 8], and f3 = [4 5 6],
then
[::: ::: :::l
A(a,{3) = .
a64 a65 a66
as4 as5 as6
If a = {3, then A(a,{3) is a principal submatrix. If a = f3 = l:k and 1 $ k $ min{m, n},
then A(a,{3) is a leading principal submatrix.
If A E 1Rmxn and
then the colon notation can be used to specify the individual blocks. In particular,
Aii = A(r + l:r + rni,µ + l:µ + ni)
where T = m1 + · · · + mi-1 and µ = ni + · · · + ni-1· Block matrix notation is valuable
for the way in which it hides subscript range expressions.

1.3.4 The Blocked Gaxpy
25
As an exercise in block matrix manipulation and submatrix designation, we consider
two block versions of the gaxpy operation y y + Ax where A E JRmxn, x E JRn, and
y E JRm. If
then
and we obtain
a = O
for i = l:q
A
idx = a+l : a+mi
y(idx) = y(idx) + A(idx, :)·x
a = a + mi
end
and y
The assignment to y(idx) corresponds to Yi = Yi + Aix. This row-blocked version of
the gaxpy computation breaks the given gaxpy into q "shorter" gaxpys. We refer to
Ai as the ith block row of A.
Likewise, with the partitionings
A =
we see that
and we obtain
{J = 0
for j = l:r
n1 n,.
jdx = {J+l : {J+ni
end
y = y + A(:,jdx) ·x(jdx)
{J = {J + ni
and x
r
y + L AjXj
j=l
The assignment to y corresponds to y = y + AjXj · This column-blocked version of the
gaxpy computation breaks the given gaxpy into r "thinner" gaxpys. We refer to Aj as
the jth block column of A.

1 .3.5 Block Matrix Multiplication
Just as ordinary, scalar-level matrix multiplication can be arranged in several possible
ways, so can the multiplication of block matrices. To illustrate this with a minimum
of subscript clutter, we consider the update
C = C + AB
where we regard A = (Aaf:I), B = (Baf3), and C = (Caf3) as N-by-N block matrices
with t'-by-t' blocks. From Theorem 1.3.1 we have
N
Caf3 = Ca(3 + L Aa-yB-y(3,
-y=l
a = l:N, {3 = l:N.
If we organize a matrix multiplication procedure around this summation, then we
obtain a block analog of Algorithm 1.1.5:
for a = l:N
i = (a - l)t' + l:at'
for {3 = l:N
j = ({3 - l)t' + 1:{3£
for 'Y = l:N
k = (7 - l)t' + 1:7£
C(i, j) = C(i, j) + A(i, k) ·B(k, j)
end
end
end
Note that, if t' = 1, then a = i, {3 = j, and 7 = k and we revert to Algorithm 1.1.5.
Analogously to what we did in §1.1, we can obtain different variants ofthis proce
dure by playing with loop orders and blocking strategies. For example, corresponding
to
where Ai E Rlxn and Bi E Rnxl, we obtain the following block outer product compu
tation:
for i = l:N
end
for j = l:N
Cii = Cii + AiBi
end

1.3.6 The Kronecker Product
27
It is sometimes the case that the entries in a block matrix A are all scalar multiples of
the same matrix. This means that A is a Kronecker product. Formally, if B E IRm1 xni
and C E 1Rm2xn2, then their Kronecker product B ® C is an m1-by-n1 block matrix
whose (i,j) block is the m2-by-n2 matrix bijC. Thus, if
A =
then
bnc11
bitC21
buc31
b21Cu
A =
�1C21
�iC31
b31C11
b31C21
b31C31
[bn
b21
b31
b,,
l
[Cn
b22 @ C21
b32 C31
bnc12 bnc13 bi2c11
bnc22 bnc23 bi2C21
buc32 buc33 bi2C31
b21C12 b21C13 b22C11
b21C22 b21C23 b22C21
�iC32 b21C33 �2C31
b31C12 b31C13 b32C11
b31C22 b31C23 bJ2C21
b31C32 b31C33 b32C31
C12 C13
l
C22 C23
C32 C33
bi2C12 bi2C13
bi2C22 bi2C23
bi2C32 bi2C33
b22C12 b22C13
b22C22 b22C23
b22C32 �2C33
b32C12 b32C13
b32C22 b32C23
b32C32 bJ2C33
This type of highly structured blocking occurs in many applications and results in
dramatic economies when fully exploited.
Note that if B has a band structure, then B ® C "inherits" that structure at the
block level. For example, if
{diagonal }
B . tridiagonal
lS
1
.
1
ower tnangu ar
upper triangular
{block diagonal }
then B ® C is
block tridiago�al
.
block lower triangular
block upper triangular
Important Kronecker product properties include:
(B ® C)T = BT ® Cr,
(B ® C)(D ® F) = BD ® CF,
(B ® c)-1 = B-1 ® c-1 ,
B ® (C ® D) = (B ® C) ® D.
(1.3.1)
(1.3.2)
(1.3.3)
(1.3.4)
Of course, the products BD and CF must be defined for (1.3.2) to make sense. Like
wise, the matrices B and C must be nonsingular in (1.3.3).
In general, B ® C =f. C ® B. However, there is a connection between these two
matrices via the perfect shuffle permutation that is defined in §1.2.11. If B E IRm1xni
and C E 1Rm2xn2, then
P(B ® C)QT = C ® B (1.3.5)
where P

1.3.7 Reshaping Kronecker Product Expressions
A matrix-vector product in which the matrix is a Kronecker product is "secretly" a
matrix-matrix-matrix product. For example, if B E ll3x2, C E llmxn, and x1,x2 E lln,
then
where y1, y2, y3 E ll"'. On the other hand, if we define the matrices
then Y = CXBT.
x = [ X1 X2 l and y = [ Y1 Y2 Y3 ] ,
To be precise about this reshaping, we introduce the vec operation. If X E llmxn,
then vec(X) is an nm-by-1 vector obtained by "stacking" X's columns:
[X(:, 1) l
vec(X) = : .
X(:, n)
Y = CXBT *> vec(Y) = (B © C)vec(X). (1.3.6)
Note that if B, C, X E llnxn, then Y = CXBT costs O(n3) to evaluate while the
disregard of Kronecker structure in y = (B © C)x leads to an O(n4) calculation. This
is why reshaping is central for effective Kronecker product computation. The reshape
operator is handy in this regard. If A E llmxn and m1n1 = mn, then
B = reshape(A, m1 , ni)
is the m1-by-n1 matrix defined by vec(B) = vec(A). Thus, if A E ll3x4, then
reshape(A, 2, 6) = [ au a31 a22 a13 a33 a24 ] .
a21 ai2 a32 a23 ai4 a34
1.3.8 Multiple Kronecker Products
Note that A = B © C ® D can be regarded as a block matrix whose entries are block
matrices. In particular, bijCktD is the (k, l) block of A's (i, j) block.
As an example of a multiple Kronecker product computation, let us consider the
calculation of y = (B © C © D)x where B, C, D E llnxn and x E llN with N = n3
•
Using (1.3.6) it follows that
reshape(y, n2, n) = (C © D) · reshape(x, n2, n) · BT.

Thus, if
F = reshape(x, n2, n) · BT,
2
then G = (C ® D)F E Rn xn can computed column-by-column using (1.3.6):
G(:, k) = reshape(D · reshape(F(:, k), n, n) · CT, n2, 1) k = l:n.
29
It follows that y = reshape(G, N, 1). A careful accounting reveals that 6n4 flops are
required. Ordinarily, a matrix-vector product of this dimension would require 2n6 flops.
The Kronecker product has a prominent role to play in tensor computations and
in §13.1 we detail more of its properties.
1.3.9 A Note on Complex Matrix Multiplication
Consider the complex matrix multiplication update
where all the matrices are real and i2 = -1. Comparing the real and imaginary parts
we conclude that
Thus, complex matrix multiplication corresponds to a structured real matrix multipli
cation that has expanded dimension.
1.3.10 Hamiltonian and Symplectic Matrices
While on the topic of 2-by-2 block matrices, we identify two classes of structured
matrices that arise at various points later on in the text. A matrix M E R2nx2n is a
Hamiltonian matrix if it has the form
where A, F, G E Rnxn and F and G are symmetric. Hamiltonian matrices arise in
optimal control and other application areas. An equivalent definition can be given in
terms of the permutation matrix
In particular, if
JMJT = -MT,
then M is Hamiltonian. A related class of matrices are the symplectic matrices. A
matrix S E R2nx2n is symplectic if

30 Chapter l . Matrix Multiplication
If
S = [S11 S12 l
S21 S22
where the blocks are n-by-n, then it follows that both S'fiS21 and SfiS12 are symmetric
and S'fiS22 = In + SiiS12·
1.3.11 Strassen Matrix Multiplication
We conclude this section with a completely different approach to the matrix-matrix
multiplication problem. The starting point in the discussion is the 2-by-2 block matrix
product
where each block is square. In the ordinary algorithm, Cij = Ai1B11 + Ai2B21· There
are 8 multiplies and 4 adds. Strassen {1969) has shown how to compute C with just 7
multiplies and 18 adds:
P1 = (A11 + A22)(B11 + B22),
P2 = (A21 + A22)B11,
P3 Au(B12 - B22),
P4 A22(B21 - Bu),
P5 (Au + Ai2)B22,
p6 (A21 - Au)(Bu + Bi2),
P1 {A12 - A22)(B21 + B22),
Cu P1 + P4 - P5 + P1,
C12 P3 + P5,
C21 P2 + P4,
C22 Pi + P3 - P2 + P5.
These equations are easily confirmed by substitution. Suppose n = 2m so that the
blocks are m-by-m. Counting adds and multiplies in the computation C = AB, we find
that conventional matrix multiplication involves {2m)3 multiplies and {2m)3 - {2m)2
adds. In contrast, if Strassen's algorithm is applied with conventional multiplication
at the block level, then 7m3 multiplies and 7m3 + llm2 adds are required. If m » 1,
then the Strassen method involves about 7/8 the arithmetic of the fully conventional
algorithm.
Now recognize that we can recur on the Strassen idea. In particular, we can apply
the Strassen algorithm to each of the half-sized block multiplications associated with
the Pi. Thus, if the original A and B are n-by-n and n = 2q, then we can repeatedly
apply the Strassen multiplication algorithm. At the bottom "level,'' the blocks are
l-by-1.
Of course, there is no need to recur down to the n = 1 level. When the block
size gets sufficiently small, (n ::; nmin), it may be sensible to use conventional matrix
multiplication when finding the Pi. Here is the overall procedure:

1.3. Block Matrices and Algorithms 31
Algorithm 1.3.1 (Strassen Matrix Multiplication) Suppose n = 2q and that A E R.nxn
and B E R.nxn. If nmin = 2d with d ::S: q, then this algorithm computes C = AB by
applying Strassen procedure recursively.
function C = strass(A, B, n, nmin)
if n ::S: nmin
else
C = AB (conventionally computed)
m = n/2; u = l:m; v = m + l:n
Pi = strass(A{u, u) + A(v, v), B(u, u) + B(v, v), m, nmin)
P2 = strass(A{v, u) + A(v, v), B(u, u), m, nmin)
P3 = strass(A(u, u), B(u, v) - B(v, v), m, nmin)
P4 = strass(A(v, v), B(v, u) - B(u, u), m, nmin)
Ps = strass(A(u, u) + A(u, v), B(v, v), m, nmin)
P6 = strass(A(v, u) - A(u, u), B(u, u) + B(u, v), m, nmin)
P1 = strass(A(u, v) - A(v, v), B(v, u) + B(v, v), m, nmin)
C(u, u) = Pi + P4 - Ps + P1
C(u, v) = P3 + Ps
C(v, u) = P2 + P4
C(v, v) = P1 + P3 - P2 + P6
end
Unlike any of our previous algorithms, strass is recursive. Divide and conquer algo
rithms are often best described in this fashion. We have presented strass in the style
of a MATLAB function so that the recursive calls can be stated with precision.
The amount of arithmetic associated with strass is a complicated function ofn and
nmin· If nmin » 1, then it suffices to count multiplications as the number of additions
is roughly the same. If we just count the multiplications, then it suffices to examine the
deepest level of the recursion as that is where all the multiplications occur. In strass
there are q - d subdivisions and thus 7q-d conventional matrix-matrix multiplications
to perform. These multiplications have size nmin and thus strass involves about s =
(2d)37q-d multiplications compared to c = {2q)3, the number of multiplications in the
conventional approach. Notice that
s
- (2d )3
-d - (7)q-d
- - - 7q - -
c 2q 8
If d = 0 , i.e., we recur on down to the l-by-1 level, then
8 = (7/8)q c = 7q = nlog2 7 �
n2.807 .
Thus, asymptotically, the number of multiplications in Strassen's method is O(n2·807).
However, the number of additions (relative to the number of multiplications) becomes
significant as nmin gets small.

Problems
Pl.3.1 Rigorously prove the following block matrix equation:
Pl.3.2 Suppose M E Rnxn is Hamiltonian. How many flops are required to compute N = M2?
Pl.3.3 What can you say about the 2-by-2 block structure of a matrix A E R2nx2n that satisfies
£2nA£2n = AT where £2n is the exchange permutation defined in §1.2.11. Explain why Ais symmetric
about the "antidiagonal" that extends from the (2n, 1) entry to the (1, 2n) entry.
Pl.3.4 Suppose
A = [ :T � ]
where B E Rnxn is upper bidiagonal. Describe the structure of T = PAPT where P = P2.n is the
perfect shuffle permutation defined in §1.2.11.
Pl.3.5 Show that if B and C are each permutation matrices, then B18> C is also a permutation matrix.
Pl.3.6 Verify Equation (1.3.5).
Pl.3.7 Verify that if x E R"' and y E Rn, then y 18> x = vec(xyT).
Pl.3.8 Show that if B E Jl!' x P, C E Rqxq, and
then
x
[::l
xT (B © C) x
p p
LL)ij (xTCxj) .
i=l j=l
Pl.3.9 Suppose A(k) E Rnkxni. for k = l:r and that x E Rn where n = n1 · · · nr. Give an efficient
algorithm for computing y = (A<1"> 18> • • • 18> A(2) 18> A<1>)x.
Pl.3.10 Suppose n is even and define the following function from Rn to R:
n/2
f(x) = x(1:2:n)Tx(2:2:n) = Lx2;-1x2; .
i=l
(a) Show that if x, y E Rn then
n/2
xTy = L:(x2;-1 + Y2;)(x2; + Y2i-J) - f(x) - f(y).
i=l
(b) Now consider the n-by-n matrix multiplication C = AB. Give an algorithm for computing this
product that requires n3/2 multiplies once f is applied to the rows of A and the columns of B. See
Winograd (1968) for details.
Pl.3.12 Adapt strass so that it can handle square matrix multiplication of any order. Hint: If the
"current" A has odd dimension, append a zero row and column.
Pl.3.13 Adapt strass so that it can handle nonsquare products, e.g., C = AB where A E Rmx•· and
B E wxn. Is it better to augment A and B with zeros so that they become square and equal in size
or to "tile" A and B with square submatrices?
Pl.3.14 Let Wn be the number of flops that strass requires to compute an n-by-n product where n is
a power of 2. Note that W2 = 25 and that for n 2: 4
Wn = 7Wn/2 + 18(n/2)2

1.4. Fast Matrix-Vector Products 33
Show that for every e > 0 there is a constant c, so W
n
$ c, nw+• where w = log2 7 and n is any power
of two.
Pl.3.15 Suppose B E Rm 1 x ni , C E irn2 x n2 , and D E Rma x na. Show how to compute the vector
y = (B ® C ® D)x where :1: E Rn and n = n1n2n:i is given. Is the order of operations important from
the flop point of view?
Useful references for the Kronecker product include Horn and Johnson (TMA, Chap. 4), Van Loan
(FFT), and:
C.F. Van Loan (2000). "The Ubiquitous Kronecker Product," J. Comput. Appl. Math., 129, 85-100.
For quite some time fast methods for matrix multiplication have attracted a lot of attention within
computer science, see:
S. Winograd (1968). "A New Algorithm for Inner Product," IEEE TI-ans. Comput. C-1 7, 693-694.
V. Strassen (1969) . "Gaussian Elimination is not Optimal," Numer. Math. 19, 354-356.
V. Pan (1984). "How Can We Speed Up Matrix Multiplication?," SIAM Review 26, 393-416.
I. Kaporin (1999). "A Practical Algorithm for Faster Matrix Multiplication," Num. Lin. Alg. 6,
687-700.
.
H. Cohn, R. Kleinberg, B. Szegedy, and C. Umans (2005). "Group-theoretic Algorithms for Matrix
Multiplication," Proceeedings of the 2005 Conference on the Foundations of Computer Science
(FOGS), 379-388.
J. Demmel, I. Dumitriu, 0. Holtz, and R. Kleinberg (2007). "Fast Matrix Multiplication is Stable,"
Numer. Math. 1 06, 199- 224.
P. D'Alberto and A. Nicolau (2009). "Adaptive Winograd's Matrix Multiplication," ACM TI-ans.
Math. Softw. 96, Article 3.
At first glance, many of these methods do not appear to have practical value. However, this has proven
not to be the case, see:
D. Bailey (1988). "Extra High Speed Matrix Multiplication on the Cray-2," SIAM J. Sci. Stat.
Comput. 9, 603-607.
N.J. Higham (1990). "Exploiting Fast Matrix Multiplication within the Level 3 BLAS," ACM TI-an.�.
Math. Softw. 16, 352-368.
C.C. Douglas, M. Heroux, G. Slishman, and R.M. Smith (1 994). "GEMMW: A Portable Level 3 BLAS
Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm," J. Comput. Phys. 11 O, 1-10.
Strassen's algorithm marked the beginning of a search for the fastest possible matrix multiplication
algorithm from the complexity point of view. The exponent of matrix multiplication is the smallest
number w such that, for all e > 0, O(nw+•) work suffices. The best known value of w has decreased
over the years and is currently around 2.4. It is interesting to speculate on the existence of an O(n2+•)
procedure.
1 .4 Fast Matrix-Vector Products
In this section we refine our ability to think at the block level by examining some
matrix-vector products y = Ax in which the n-by-n matrix A is so highly structured
that the computation can be carried out with many fewer than the usual O(n2) flops.
These results are used in §4.8.
1.4.1 The Fast Fourier Transform
The discrete Fourier transform (DFT) of a vector x E <Cn is a matrix-vector product

where the DFT matrix Fn = (fki) E mnxn is defined by
fkj = w!,
k-l)(j-1)
with
Wn = exp(-27ri/n) = cos(27r/n) - i · sin(27r/n).
Here is an example:
[� �4 �l �! l =
[�
1 w� wt w� 1
1 wi w� w� 1
1
-i
-1
i
1
-J
-1
1
-1 -i
(1.4.1)
(1.4.2)
The DFT is ubiquitous throughout computational science and engineering and one
reason has to do with the following property:
If n is highly composite, then it is possible to carry out the DFT
in many fewer than the O(n2) fl.ops required by conventional
matrix-vector multiplication.
To illustrate this we set n = 2t and proceed to develop the radix-2 fast Fourier trans-
form.
The starting point is to examine the block structure of an even-order DFT matrix
after its columns are reordered so that the odd-indexed columns come first. Consider
the case
1 1 1 1 1 1 1 1
1 w w2 w3 w4 ws w6 w1
1 w2 w4 w6 1 w2 w4 w6
Fs =
1 w3 w6 w w4 w1 w2 ws
(w = ws).
1 w4 1 w4 1 w4 1 w4
1 ws w2 w1 w4 w w6 w3
1 w6 w4 w2 1 w6 w4 w2
1 w1 w6 w5 w4 w3 w2 w
(Note that ws is a root of unity so that high powers simplify, e.g., [Fsk1 = w3·6 =
w18 = w2.) If cols = [1 3 5 7 2 4 6 8], then
1 1 1 1 1 1 1 1
1 w2 w4 w6 w w3 ws w1
1 w4 1 w4 w2 we w2 w6
Fs(:, cols) =
1 w6 w4 w2 w3 w w1 w5
1 1 1 1 -1 -1 -1 -1
1 w2 w4 w6 -w -w3 -w5 -w7
1 w4 1 w4 -w2 -w6 -w2 -w6
1 w6 w4 w2 -w3 -w -w7 -ws
The lines through the matrix are there to help us think of Fs(:, cols) as a 2-by-2 matrix
with 4-by-4 blocks. Noting that w2 = w� = w4, we see that
Fs(:, cols) =
[ F4 04 F4 ]
F4 -04F4

where !14 = diag(l,w8, w�, w�). It follows that if x E JR8, then
Thus, by simple scalings we can obtain the 8-point DFT y = Fgx from the 4-point
DFTs YT = F4 ·x(1:2:8) and YB = F4 ·x(2:2:8). In particular,
where
y(l:4) = YT + d ·* Ya,
y(5:8) = YT - d ·* Ya
More generally, if n = 2m, then y = Fnx is given by
y(l:m) = Yr + d ·* Ya,
y(m + l:n) = Yr - d ·* Yn
where d = [ 1, Wn , . . . , w�n-l ] T and
YT = Fmx(1:2:n),
Yn = Fmx(2:2:n).
For n = 2t, we can recur on this process until n = 1, noting that F1x = x.
Algorithm 1.4.l If x E {!n and n = 2t, then this algorithm computes the discrete
Fourier transform y = Fnx.
function y = fft(x, n)
if n = 1
y = x
else
end
m = n/2
YT = fft(x(1:2:n), m)
Ya = fft(x(2:2:n), m)
w = exp(-27ri/n)
d = [ 1 w · · · wm-l ]T
' ' '
z = d .* Yn
= [ YT + Z ]
y
y -
r - <-

The flop analysis of fft requires an assessment of complex arithmetic and the solution
of an interesting recursion. We first observe that the multiplication of two complex
numbers involves six (real) flops while the addition of two complex numbers involves
two flops. Let fn be the number of flops that fft needs to produce the DFT of x E <l::n.
Scrutiny of the method reveals that
where n = 2m. Thus,
!Yr ) !fm flops )
Y
dn fm flops
requires 6m flops
z 6m fl.ops
y 2n flops
fn = 2fm + Sn (!1 = 0).
Conjecturing that fn = c· nlog2(n) for some constant c, it follows that
fn = c· nlog2(n) = 2c· m log2(m) + Sn = c· n(log2(n) - 1) + Sn,
from which we conclude that c = S. Thus, fft requires Snlog2(n) flops. Appreciate
the speedup over conventional matrix-vector multiplication. If n = 220, it is a factor
of about 10,000. We mention that the fft flop count can be reduced to 5n log2(n) by
. n/2-1 S Pl 4 1
precomputmg Wn, . . . ,wn . ee . . .
1.4.2 Fast Sine and Cosine Transformations
In the discrete sine transform (DST) problem, we are given real values x1 , . . . ,Xm-l
and compute
m-1 (k·
)
Yk = L sin J7r Xj
i=l
m
(1.4.3)
for k = l:m - 1. In the discrete cosine transform (DCT) problem, we are given real
values xo,X1, . . . , Xm and compute
Xo � (kj7r) (-l)kxm
Yk = -
2
+ L..J cos - Xj +
i=l
m 2
(1.4.4)
for k= O:m. Note that the sine and cosine evaluations "show up" in the DFT matrix.
Indeed, for k = 0:2m - 1 and j= 0:2m -1 we have
k ' (kj7r) . . (kj7r)
[F2m]k+l,j+l = w2� = cos
m
- i sm
m
. (1.4.5)
This suggests (correctly) that there is an exploitable connection between each of these
trigonometric transforms and the DFT. The key observation is to block properly the
real and imaginary parts of F2m· To that end, define the matrices Sr E wxr and
Cr E wxr by
= sin
( kj7r ),
r + l
cos
( kj7r ),
r + l
k= l:r, j= l:r. (1.4.6)

t.4. Fast Matrix-Vector Products
Recalling from §1.2.11 the definition of the exchange permutation £n, we have
Theorem 1.4.1. Let m be a positive integer and define the vectors e and v by
eT = ( 1, 1, . . . , 1 ),
---...-
m-1
VT = ( -1, 1, . . . , (-l)m-l ).
If E = Cm-11 C = Cm-1 , and S = Bm-11 then
[1
e C - iS
F2m =
1 V
T
e E(C + iS)
1
v
(-l)m
Ev
m-1
eT l
(C + iS)E
vTE
.
E(C - iS)E
37
(1.4.7)
Proof. It is clear from (1.4.5) that F2m(:, 1), F2m(l, :1), F2m(:, m+l), and F2m(m+l, :)
are correctly specified. It remains for us to show that equation (1.4.7) holds in blocks
positions (2,2), (2,4), (4,2), and (4,4). The (2,2) verification is straightforward:
[F2m(2:m, 2:m)]kj = cos (k
�n
) - i sin (k
�n
)
= [C - iS]ki·
A little trigonometry is required to verify correctness in the (2,4) position:
(k(m + j)n
)
[F2m(2:m, m + 2:2m)]kj = cos
m
(kjrr
)
= cos
m
+ krr
( kjn
)
= cos -m + krr
((k(m - j)n
)
= cos
m
= [(C + iS)E]ki .
. .
(k(m + j)n
)
- i sm
m
. .
(kjn
k )
- i sm
m
+ rr
. . ( kjn
k )
+ i sm -
m
+ rr
. .
(k(m - j)rr
)
+ i sm
m
We used the fact that post-multiplying a matrix by the permutation E = Cm-l has
the effect of reversing the order of its columns. The recipes for F2m(m + 2:2m, 2:m)
and F2m(m + 2:2m, m + 2:2m) are derived similarly. D
Using the notation of the theorem, we see that the sine transform (1.4.3) is a
matrix-vector product
y(l:m- 1) = DST(m- l) · x(l:m- 1)

where
DST(m-1) = Sm+
If x = x(l:m- 1) and
Xsin [ � lE R2m,
=
-�x
then since eTE = e and E2 = E we have
[1
i i e C - iS
2F2mXsin = 2 l VT
e E(C + iS)
1
v
[_iJ
Thus, the DST of x(l:m- 1) is a scaled subvector of F2mXsin·
(1.4.8)
(1.4.9)
Algorithm 1.4.2 The following algorithm assigns the DST of xi, . . . , Xm-l to y.
Set up the vector Xsin defined by (1.4.9).
Use fft (e.g., Algorithm 1.4.1) to compute ii = F2mXsin
y = i · y(2:m)/2
This computation involves O(m log2(m)) flops. We mention that the vector Xsin is real
and highly structured, something that would be exploited in a truly efficient imple
mentation.
Now let us consider the discrete cosine transform defined by (1.4.4). Using the
notation from Theorem 1.4.1, the DCT is a matrix-vector product
y(O:m) = DCT(m + 1) · x(O:m)
where
[1/2 eT
DCT(m + 1) = e/2 Cm-1
1/2 VT
If x = x(l:m - 1) and
1/2 l
v/2
(-1r12
(1.4.10)
(1.4.11)

1.4. Fast Matrix-Vector Products
then
1 1 e
eT
C - iS
VT
1
v
(-1
r
[I
2F2mXcos = 2 :
E(C + iS) Ev
� [
(xo/2) + eTX +
(xo/2)e + Cx +
(xo/2) + VTX +
(xo/2)e + ECx +
eT
(C + iS)E x
][Xo l
vTE Xm
E(C - iS)E Ex
(xm/2) l
(xm/2)v
(-1
r
(xm/2)
·
(xm/2)Ev
39
Notice that the top three components of this block vector define the DCT of x(O:m).
Thus, the DCT is a scaled subvector of F2mXcos·
Algorithm 1.4.3 The following algorithm assigns to y E Rm+l the DCT of x0, . . . , Xm·
Set up the vector Xcos E R2m defined by (1.4.11).
Use fft (e.g., Algorithm 1.4.1) to compute y = F2mXcos
y = y(l:m + 1)/2
This algorithm requires O(m logm) fl.ops, but as with Algorithm 1.4.2, it can be more
efficiently implemented by exploiting symmetries in the vector Xcos·
We mention that there are important variants of the DST and the DCT that can
be computed fast:
DST-II: Yk
DST-III: Yk
DST-IV: Yk =
DCT-II: Yk =
DCT-III: Yk =
DCT-IV: Yk =
f: . ( k(2j - 1)7r )
sm
2
Xj ,
i= l
m
t . c2k _ l)j7r )
sm
2
Xj ,
i= l
m
t . c2k _ 1)(2j _ l)7r )
sm
2
Xj ,
i= l
m
m-l
( k(2j - l)7r )
L cos
2
Xj ,
j=O
m
Xo
2
m-1
c2k - l)j7r )
L cos
2
Xj ,
i= I
m
y: ( (2k - 1)(2j - 1)7r )
COS
2
Xj ,
j=O
m
k = l:m,
k = l:m,
k = l:m,
(1.4.12)
k = O:m- 1,
k = O:m- 1,
k = O:m - 1 .
For example, if y E R2m-I is the DST of x = [ Xi , 0, x2, 0, . . . , 0, Xm-1 ' Xm ]T, then
fj(l:m) is the DST-II of x E Rm. See Van Loan (FFT) for further details.

1.4.3 The Haar Wavelet Transform
If n = 2t, then the Haar wavelet transform y = W11x is a matrix-vector product in
which the transform matrix Wn E JR" x n
is defined recursively:
W. �
{ [ Wm 0 ( : ) I Im 0 ( _: ) ] if n = 2m,
[ 1 ] if n = l.
Here are some examples:
W2 = [ � 1 -� ],
H
1 1
�]'
W4 = 1 -1
-1 0
-1 0 -1
1 1 1 0 1 0 0 0
1 1 1 0 -1 0 0 0
1 1 -1 0 0 1 0 0
Wa = 1 1 -1 0 0 -1 0 0
1 -1 0 1 0 0 1 0
1 -1 0 1 0 0 -1 0
1 -1 0 -1 0 0 0 1
1 -1 0 -1 0 0 0 -1
An interesting block pattern emerges if we reorder the rows of Wn so that the odd
indexed rows come first:
(1.4.13)
Thus, if x E 1Rn, xT = x(l:m), and Xn = x(rn + l:n), then
_ _
[Im Im l [Wm 0 l [XT l
y - W,.x - P2,m Im -Im 0 Im Xs
In other words,
y(1:2:n) = WmXT + Xn, y(2:2:n) = WmxT - X u .
This points the way to a fast recursive procedure for computing y = Wnx.

Algorithm 1.4.4 (Haar Wavelet Transform) Ifx E ne and n = 2t, then this algorithm
computes the Haar transform y = Wnx.
function y = fht(x, n)
if n = 1
else
end
y = x
m = n/2
z = fht(x(l:m), m)
y(1:2:m) = z + x(m + l:n)
y(2:2:m) = z - x(m + l:n)
It can be shown that this algorithm requires 2n flops.
Problems
Pl.4.1 Suppose w = [1, W
n, wa,...' w�/2-1 ] where n = 2t. Using the colon notation, express
[ 2 r/2-1 ]
1, Wr· , Wr,..., Wr
as a subvector of w where r = 2q, q = l:t. Rewrite Algorithm 1.4.1 with the assumption that w is
precomputed. Show that this maneuver reduces the flop count to 5n log2 n.
Pl.4.2 Suppose n = 3m and examine
G = [ Fn(:,1:3:n - 1) I Fn(:,2:3:n - 1) I Fn(:,3:3:n - 1) ]
as a 3-by-3 block matrix, looking for scaled copies of Fm. Based on what you find, develop a recursive
radix-3 FFT analogous to the radix-2 implementation in the text.
Pl.4.3 If n = 2t, then it can be shown that Fn = (AtI't)· ··(A1I'1) where for q = l:t
I'q = 'P2,rq ® lLq- 1 •
!l d" (1
Lq- 1 - l
)
q = iag , WL
q
, . . . ,wLq ·
Note that with this factorization, the DFT y= Fnx can be computed as follows:
y =x
for q = l:t
y = Aq(I'qy)
end
Fill in the details associated with the y updates and show that a careful implementation requires
5nlog2(n) flops.
Pl.4.4 What fraction of the components of Wn are zero?
Pl.4.5 Using (1.4.13), verify by induction that if n = 2t, then the Haar tranform matrix Wn has the
factorization Wn = Ht···Hi where
H _
[ 'P2, L.
q - 0
0 ] [ W2 ® h.
l
n-L 0
0
l
n-L
Thus, the computation of y = Wnx may proceed as follows:

42
y = x
for q = l:t
y = Hqy
end
Fill in the details associated with the update y = Hqy and confirm that H'11 x costs 2n flops.
Pl.4.6 Using ( 1 .4.13), develop an O(n) procedure for solving WnY = x where x E R" is given and
n = 2t.
In Van Loan (FFT) the FFT family of algorithms is described in the language of matrix-factorizations.
A discussion of various fast trigonometric transforms is also included. See also:
W.L. Briggs and V.E. Henson (1995) . The DFT: An Owners ' Manual for the Discrete FouT'ier
Transform, SIAM Publications, Philadelphia, PA.
The design of a high-performance FFT is a nontrivial task. An important development in this regard
is a software tool known as "the fastest Fourier transform in the west" :
M. Frigo and S.G. Johnson (2005). "The Design and Implementation of FFTW3" , Proc;eedings of the
IEEE, 93, 216- 231.
It automates the search for the "right" FFT given the underlying computer architecture. FFT rcfor
enccs that feature interesting factorization and approximation ideas include:
A. Edelman, P. l'vlcCorquodale, and S. Toledo ( 1 998). "The f1ture Fast Fourier Transform?,'' SIAM
J. Sci. Comput. 20, 1094 1 1 14.
A. Dutt and and V. Rokhlin (1993). "Fast Fourier Transforms for Nonequally Spaced Data,'' SIAM
J. Sci. Comput. 14, 1368 - 1393.
A. F. Ware ( 1998) . "Fast Approximate Fourier Transforms for Irregularly Spaced Data," SIAM Review
40, 838 - 856.
N. Nguyen and Q.H. Liu ( 1999). "The Regular Fourier Matrices and Nonuniform Fast Fourier Trans
forms,'' SIAM J. Sci. Comput. 21, 283-2!)3.
A. Nieslony and G. Steidl (2003). "Approximate Factorizations of Fourier Matrices with Nonequis
paced Knots,'' Lin. Alg. Applic. 366, 337 35 1.
L. Greengard and J. -Y. Lee (2004). "Accelerating the Nonuniform Fast Fourier Transform," SIAM
Review 46, 443-454.
K. Ahlander and H. Munthe-Kaas (2005). ;'Applications of the Generalized Fourier Transform in
Numerical Linear Algebra," BIT 45, 819 850.
The fast multipole method and the fast Gauss transform represent another type of fast transform that
is based on a combination of clever blocking and approximation.
L. Grecngard and V. Rokhliu ( 1987) . "A Fast Algorithm for Particle Simulation," J. Comput. Phys.
73, 325-348.
X. Sun and N.P. Pitsianis (2001). "A Matrix Version of the Fast Multipole Method,'' SIAM Review
43, 289-300.
L. Grecngarcl and J. Strain ( 1991). ·'The Fast Gauss Transform," SIAM J. Sci. Sttit. Comput. 12,
79-94.
M. Spivak, S.K. Veerapaneni, and L. Greengard (2010). "The Fast Generalized Gauss Transform,"
SIA M J. Sci. Comput. 32, 3092-3107.
X. Sun and Y. Bao (2003). "A Kronecker Product Representation of the Fast Gauss Transform,"
SIAM J. MatT'ix Anal. Applic. 24, 768-786.
The Haar transform is a simple example of a wavelet transform. The wavelet idea has had a pro
found impact throughout computational science and engineering. In many applications, wavelet basis
functions work better than the sines and cosines that underly the DFT. Excellent monographs on this
subject include
I Daubechies (1992). Ten Lectures on Wrwelets, SIAl'vl Publications, Philadelphia, PA.
G. Strang (1993) . "Wavelet Transforms Versus Fourier Transforms," Bull. AMS 28, 288-305.
G. Strang and T. Nguyan ( 1996). Wavelets and Filter Banks, Wellesley-Cambridge Press.

1.5. Vectorization and Locality 43
1.5 Vectorization and Locality
When it comes to designing a high-performance matrix computation, it is not enough
simply to minimize flops. Attention must be paid to how the arithmetic units interact
with the underlying memory system. Data structures arc an important part of the
picture because not all matrix layouts are "architecture friendly." Our aim is to build
a practical appreciation for these issues by presenting various simplified models of
execution. These models are qualitative and are just informative pointers to complex
implementation issues.
1.5.1 Vector Processing
An individual floating point operation typically requires several cycles to complete. A
3-cycle addition is depicted in Figure 1.5.1. The input scalars x and y proceed along
:i:
11
Adjust
Exponents 1---+1
Add 1---+1 Normalize
Figure 1.5.1. A 3-Cycle adder
z
a computational "assembly line," spending one cycle at each of three work "stations."
The sum z emerges after three cycles. Note that, during the execution of a single, "free
standing" addition, only one of the three stations would be active at any particular
instant.
Vector processors exploit the fact that a vector operation is a very regular se
quence of scalar operations. The key idea is pipelining, which we illustrate using
the vector addition computation z = x + y. With pipelining, the x and y vectors
are streamed through the addition unit. Once the pipeline is filled and steady state
reached, a z-vector component is produced every cycle, as shown in Figure 1.5.2. In
· • • X10
• · • Y10
Adjust
Exponents Add Normalize
Figure 1.5.2. Pipelined addition
this case, we would anticipate vector processing to proceed at about three times the
rate of scalar processing.
A vector processor comes with a repertoire of vector instructions, such as vector
add, vector multiply, vector scale, dot product, and saxpy. These operations take
place in vector re,gisters with input and output handled by vector load and vector store
instructions. An important attribute of a vector processor is the length vL of the
vector registers that carry out the vector operations. A length-n vector operation must
be broken down into subvector operations of length vL or less. Here is how such a
partitioning might be managed for a vector addition z = x + y where x and y are
n-vectors:

first = 1
while first :::; n
end
last = min{n, first + vL - 1}
Vector load: ri +-- x(first:last)
Vector load: r2 +-- y(first:last)
Vector add: ri = ri + r2
Vector store: z(first:last) +-- r1
first = last + 1
(1.5.1)
The vector addition is a register-register operation while the "flopless" movement of
data to and from the vector registers is identified with the left arrow "+--" . Let us
model the number of cycles required to carry out the various steps in (1.5.1). For
clarity, assume that n is very large and an integral multiple of vL, thereby making it
safe to ignore the final cleanup pass through the loop.
Regarding the vectorized addition r1 = r1 + r2, assume it takes Tadd cycles to fill
the pipeline and that once this happens, a component of z is produced each cycle. It
follows that
Narith = (:)(Tadd + vL) = (T�:d + 1)n
accounts for the total number cycles that (1.5.1) requires for arithmetic.
For the vector loads and stores, assume that Tdata + vL cycles are required to
transport a length-vL vector from memory to a register or from a register to memory,
where Tdata is the number of cycles required to fill the data pipeline. With these
assumptions we see that
specifies the number of cycles that are required by (1.5.1) to get data to and from the
registers.
The arithmetic-to-data-motion ratio
and the total cycles sum
N N. =
(Tarith +
VL
3Tdata + 4
)n
arith + data
are illuminating statistics, but they are not necessarily good predictors of performance.
In practice, vector loads, stores, and arithmetic are "overlapped" through the chaining
together of various pipelines, a feature that is not captured by our model. Nevertheless,
our simple analysis is a preliminary reminder that data motion is an important factor
when reasoning about performance.

1.5.2 Gaxpy versus Outer Product
Two algorithms that involve the same number of flops can have substantially different
data motion properties. Consider the n-by-n gaxpy
y = y + Ax
and the n-by-n outer product update
Both of these level-2 operations involve 2n2 flops. However, if we assume (for clarity)
that n = vL , then we see that the gaxpy computation
rx +- x
ry +- y
for j = l:n
ra +- A(:,j)
ry = ry + rarx(j)
end
Y +- ry
requires (3 + n) load/store operations while for the outer product update
rx +- x
ry +- y
for j = l:n
end
ra +- A(:,j)
ra = ra + ryrx(j)
A(:,j) +- ra
the corresponding count is (2 + 2n). Thus, the data motion overhead for the outer
product update is worse by a factor of 2, a reality that could be a factor in the design
of a high-performance matrix computation.
1.5.3 The Relevance of Stride
The time it takes to load a vector into a vector register may depend greatly on how
the vector is laid out in memory, a detail that we did not consider in §1.5.1. Two
concepts help frame the issue. A vector is said to have unit stride if its components
are contiguous in memory. A matrix is said to be stored in column-major order if its
columns have unit stride.
Let us consider the matrix multiplication update calculation
C = C + AB
where it is assumed that the matrices C E 1Rmxn, A E JR'nxr, and B E IE(xn are stored
in column-major order. Suppose the loading of a unit-stride vector proceeds much more
quickly than the loading of a non-unit-stride vector. If so, then the implementation

46
for j = l:n
for k = l :r
end
C(:,j) = C(:,j) + A(:, k)·B(k,j)
end
which accesses C, A, and B by column would be preferred to
for i = l:m
end
for j = l:n
C(i,j) = C(i,j) + A(i, :)·B(:,j)
end
which accesses C and A by row. While this example points to the possible importance
of stride, it is important to keep in mind that the penalty for non-unit-stride access
varies from system to system and may depend upon the value of the stride itself.
1.5.4 Blocking for Data Reuse
Matrices reside in memory but rnemoT'IJ has levels. A typical arrangement is depicted
in Figure 1.5.3. The cache is a relatively small high-speed memory unit that sits
Functional Units
Cache
Main Memory
Disk
Figure 1.5.3. A memory hiemrchy
just below the functional units where the arithmetric is carried out. During a matrix
computation, matrix elements move up and down the memory hierarchy. The cache,
which is a small high-speed memory situated in between the functional units and main
memory, plays a particularly critical role. The overall design of the hierarchy varies
from system to system. However, two maxims always apply:
• Each level in the hierarchy has a limited capacity and for economic reasons this
capacity usually becomes smaller as we ascend the hierarchy.
• There is a cost, sometimes relatively great, associated with the moving of data
between two levels in the hierarchy.
The efficient implementation of a matrix algorithm requires an ability to reason about
the ftow of data between the various levels of storage.

To develop an appreciation for cache utilization we again consider the update
C = C + AB where each matrix is n-by-n and blocked as follows:
C - . . . A - . . . B - . . . .
_
[c
_
11 :• • c
_
1r l _
[A_11 :· · A_11, l _
[n_u :· · B
.
1" l
. . . . . . . . .
Cqr . . . c,,,. Aqr Aqp Bpr . . . Bpr
Assume that these three matrices reside in main memory and that we plan to update
C block by block:
1'
Cij = Cij + L A.;kBkj.
k=l
The data in the blocks must be brought up to the functional units via the cache which
we assume is large enough to hold a C-block, an A-block, mid a B-block. This enables
us to structure the computation as follows:
for i = l:q
end
for j = l:r
Load Ci.i from main memory into cache
for k = l:p
end
Load Aik from main memory into cache
Load Bkj from main memory into cache
CiJ = CiJ + A.;kBkJ
Store Cij in main memory.
end
(1.5.4)
The question before us is how to choose the blocking parameters q, r, and p so as to
minimize memory traffic to and from the cache. Assume that the cache can hold 11
floating point numbers and that 11 « 3n2, thereby forcing us to block the computation.
We assume that
Cij } {(n/q)-by-(n/r)
Aik is roughly (n/q)-by-(n/p) .
B�,i (n/p)-by-(n/r)
We say "roughly" because if q, r, or p does not divide n, then the blocks are not quite
uniformly sized, e.g.,
x x x x x x
x x x x x x
x x x x x x
x x x x x x
A
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x x x x
x x x
x x x
x x x
x x x
x x x
x x x
x x x
x x x
x x x
x x x
x
x
x
x
x
x
x
x
x
x
n = 10,
q = 3,
p = 4.

However, nothing is lost in glossing over this detail since our aim is simply to develop
an intuition about cache utilization for large-n problems. Thus, we are led to impose
the following constraint on the blocking parameters:
(%) (�) + (%) (�) + (�) (�) � M.
{1.5.5)
Proceeding with the optimization, it is reasonable to maximize the amount ofarithmetic
associated with the update Ci; = Ci; + AikBki· After all, we have moved matrix
data from main memory to cache and should make the most of the investment. This
leads to the problem of maximizing 2n3/(qrp) subject to the constraint {1.5.5). A
straightforward Lagrange multiplier argument leads us to conclude that
fn2
qopt = Popt = Topt � y 3.M" {1.5.6)
That is, each block of C, A, and B should be approximately square and occupy about
one-third of the cache.
Because blocking affects the amount of memory traffic in a matrix computation,
it is of paramount importance when designing a high-performance implementation. In
practice, things are never as simple as in our model example. The optimal choice of
q0pt , ropt , and Popt will also depend upon transfer rates between memory levels and
upon all the other architecture factors mentioned earlier in this section. Data structures
are also important; storing a matrix by block rather than in column-major order could
enhance performance.
Problems
Pl.5.1 Suppose A E R"' x n is tridiagonal and that the elements along its subdiagonal, diagonal, and
superdiagonal are stored in vectors e(l:n - 1), d(l:n), and /(2:n). Give a vectorized implementation
of the n-by-n gaxpy y = y + Ax. Hint: Make use of the vector multiplication operation.
Pl.5.2 Give an algorithm for computing C = C + ATBA where A and B are n-by-n and B is
symmetric. Innermost loops should oversee unit-stride vector operations.
Pl.5.3 Suppose A E w x n is stored in column-major order and that m = m1 M and n = n1 N.
Regard A as an M-by-N block matrix with m1-by-n1 blocks. Give an algorithm for storing A in a
vector A.block(l:mn) with the property that each block Aij is stored contiguously in column-major
order.
References that address vector computation include:
J.J. Dongarra, F.G. Gustavson, and A. Karp ( 1984) . "Implementing Linear Algebra Algorithms for
Dense Matrices on a Vector Pipeline Machine,'' SIAM Review 26, 91-1 12.
B.L. Buzbee (1986) "A Strategy for Vectorization," Parallel Comput. 3, 187-192.
K. Gallivan, W. Jalby, U. Meier, and A.H. Sameh (1988). "Impact of Hierarchical Memory Systems
on Linear Algebra Algorithm Design,'' Int. J. Supercomput. Applic. 2, 12-48.
J.J. Dongarra and D. Walker (1995). "Software Libraries for Linear Algebra Computations on High
Performance Computers," SIAM Review 37, 151-180.
One way to realize high performance in a matrix computation is to design algorithms that are rich in
matrix multiplication and then implement those algorithms using an optimized level-3 BLAS library.
For details on this philosophy and its effectiveness, see:

1.6. Parallel Matrix Multiplication 49
B. Kagstrom, P. Ling, and C. Van Loan (1998). "GEMM-based Level-3 BLAS: High-Performance
Model Implementations and Performance Evaluation Benchmark," ACM '.lrans. Math. Softw. 24,
268-302.
M.J. Dayde and LS. Duff (1999). "The RISC BLAS: A Blocked Implementation of Level 3 BLAS for
RISC Processors," ACM '.lrans. Math. Softw. 25, 316-340.
E. Elmroth, F. Gustavson, I. Jonsson, and B. Kagstrom (2004). "Recursive Blocked Algorithms and
Hybrid Data Structures for Dense Matrix Library Software,'' SIAM Review 46, 3-45.
K. Goto and R. Van De Geign (2008). "Anatomy of High-Performance Matrix Multiplication,'' ACM
'.lrans. Math. Softw. 34, 12:1-12:25.
Advanced data structures that support high performance matrix computations are discussed in:
F.G. Gustavson (1997). "Recursion Leads to Automatic Variable Blocking for Dense Linear Algebra
Algorithms," IBM J. Res. Dev. 41, 737-755.
V. Valsalam and A. Skjellum (2002). "A Framework for High-Performance Matrix Multiplication
Based on Hierarchical Abstractions, Algorithms, and Optimized Low-Level Kernels," Concurrency
Comput. Pract. E:r:per. 14, 805-839.
S.R. Chatterjee, P. Patnala, and M. Thottethodi (2002). "Recursive Array Layouts and Fast Matrix
Multiplication," IEEE '.lrans. Parallel. Distrib. Syst. 13, 1105-1123.
F.G. Gustavson (2003). "High-Performance Linear Algebra Algorithms Using Generalized Data Struc
tures for Matrices,'' IBM J. Res. Dev. 47, 31-54.
N. Park, B. Hong, and V.K. Prasanna (2003). "Tiling, Block Data Layout, and Memory Hierarchy
Performance," IEEE '.lrans. Parallel Distrib. Systems, 14, 640 -654.
J.A. Gunnels, F.G. Gustavson, G.M. Henry, and R.A. van de Geijn (2005). "A Family of High
Performance Matrix Multiplication Algorithms," PARA 2004, LNCS 3732, 256-265.
P. D'Alberto and A. Nicolau (2009). "Adaptive Winograd's Matrix Multiplications," A CM '.lrans.
Math. Softw. 36, 3:1-3:23.
A great deal of effort has gone into the design of software tools that automatically block a matrix
computation for high performance, e.g.,
S. Carr and R.B. Lehoucq (1997) "Compiler Blockability of Dense Matrix Factorizations," A CM '.lrans.
Math. Softw. 23, 336· ·361.
J.A. Gunnels, F. G. Gustavson, G.M. Henry, and R. A. van de Geijn (2001). "FLAME: Formal Linear
Algebra Methods Environment," ACM '.lrans. Math. Softw. 27, 422-455.
P. Bientinesi, J.A. Gunnels, M.E. Myers, E. Quintana-Orti, and R.A. van de Geijn (2005). "The
Science of Deriving Dense Linear Algebra Algorithms," ACM '.lrans. Math. Softw. 31, 1-26.
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R.C. Whaley, and K. Yelick
(2005). "Self-Adapting Linear Algebra Algorithms and Software," , Proc. IEEE 93, 293-312.
K. Yotov, X.Li, G. Ren, M. Garzaran, D. Padua, K. Pingali, and P. Stodghill (2005). "Is Search
Really Necessary to Generate High-Performance BLAS?," Proc. IEEE 93, 358-386.
For a rigorous treatment of communication lower bounds in matrix computations, see:
G. Ballard, J. Demmel, 0. Holtz, and 0. Schwartz (2011). "Minimizing Communication in Numerical
Linear Algebra," SIAM J. Matrix Anal. Applic. 32, 866-901.
The impact ofmatrix computation research in many application areas depends upon the
development of parallel algorithms that scale. Algorithms that scale have the property
that they remain effective as problem size grows and the number of involved processors
increases. Although powerful new programming languages and related system tools
continue to simplify the process of implementing a parallel matrix computation, being
able to "think parallel" is still important. This requires having an intuition about load
balancing, communication overhead, and processor synchronization.

1.6.1 A Model Computation
To illustrate the major ideas associated with parallel matrix computations, we consider
the following model computation:
Given C E Rnxn, A E 1R
mxr, and B E m;xn, effectively compute
the matrix multiplication update C = C + AB assuming the
availability of p processors. Each processor has its own local
memory and executes its own local program.
The matrix multiplication update problem is a good choice because it is a.ii inherently
parallel computation and because it is at the heart of many important algorithms that
we develop in later chapters.
The design of a parallel procedure begins with the breaking up of the given
problem into smaller parts that exhibit a measure of independence. In our problem we
assume the blocking
[Cu
C = : · .
C�11 · · : (1.6.1)
m = m1 M, r = r1 R,
with Cij E 1R
m1 xni , Aij E 1R
m1 x1•1 , and Bij E IR1"1 xni. It follows that the C + AB
update partitions nicely into MN smaller tasks:
n
Task(i, j): Cij = Cij + L AikBkj . (1 .6.2)
k=l
Note that the block-block products AiJ,,Bkj arc all the same size.
Because the tasks are naturally double-indexed, we double index the available
processors as well. Assume that p = ProwPcol and designate the (i, j)th processor by
Proc(i, j) for i = l:Prow and j = l:Pcol· The double indexing of the processors is just a
notation and is not a statement about their physical connectivity.
1.6.2 Load Balancing
An effective parallel program equitably partitions the work among the participating
processors. Two subdivision strategies for the model computation come to mind. The
2-dimensional block distribution assigns contiguous block updates to each processor.
See Figure 1.6.1. Alternatively, we can have Proc(µ, T) oversee the update of Cii
for i = µ: Prow :M and j = T: Pcol :N. This is called the 2-dimensional block-cyclic
distribution. See Figure 1.6.2. For the displayed exan1ple, both strategics assign twelve
Cii updates to each processor and each update involves R block-block multiplications,
i.e., 12(2m1n1r1 ) fl.ops. Thus, from the fl.op point of view, both strategies are load
balanced, by which we mean that the amount of arithmetic computation assigned to
each processor is roughly the same.

1.6. Parallel Matrix Multiplication
{
{
{
{
Proc{l,1) Proc{l,2) Proc{l,3)
C1 1 C12 C13
} {
C14 C15 c16
} {
C11 C1s C19
C21 C22 C23 C24 C2s C25 C21 C2s C29
C31 C32 C33 C34 C3s c36 C31 C3s C39
C41 C42 C43 C44 C4s c46 C41 C4s C49
Proc{2,1) Proc{2,2) Proc{2,3)
Cs1 Cs2 Cs3
} {
Cs4 C5s Cs6
} {
Cs1 Css Csg
C61 C62 C53 c64 c6s c66 c61 C6s c69
C11 C12 C73 C14 C1s C16 C11 C1s C1g
Cs1 Cs2 Cs3 Cs4 Css Cs6 Cs1 Css Csg
Figure 1.6.1. The block distribution of tasks
(M = 8, Prow = 2, N = 9, and Pcol = 3).
Proc{l,1) Proc{l,2) Proc{l,3)
C11 C14 C11
} {
C12 C15 C1s
} {
C13 Crn C19
C31 C34 C31 C32 C35 C3s C33 c36 C39
C51 Cs4 Cs1 Cs2 Css Css Cs3 Cs6 Csg
C11 C14 C11 C12 C1s C1s C73 C16 C1g
Proc{2,1) Proc{2,2) Proc{2,3)
C21 C24 C21
} {
C22 C2s C2s
} {
C23 c26 C29
C41 C44 C41 C12 C4s C4s C43 c46 C49
C61 CM c61 C62 c6s c6s c63 c66 c69
Cs1 Cs1 Cs1 Cs2 Css Cs11 Cs3 Cs6 Csg
Figure 1.6.2. The block-cyclic distribution of tasks
(Al = 8, Prow = 2, N = 9, and Pcol = 3).
51
}
}
}
}

If M is not a multiple ofProw or if N is not a multiple ofPeat . then the distribution
of work among processors is no longer balanced. Indeed, if
M = 0:1Prow + /31,
N = 0:2Peol + /32'
0 � /31 < Prow •
0 � /32 < Peat .
then the number of block-block multiplications per processor can range from 0:10:2R to
(o:1 + 1)(0:2 + l)R. However, this variation is insignificant in a large-scale computation
with M » Prow and N » Peal :
We conclude that both the block distribution and the block-cyclic distribution strate
gies are load balanced for the general C + AB update.
This is not the case for certain block-sparse situations that arise in practice. If
A is block lower triangular and B is block upper triangular, then the amount of work
associated with Task(i, j) depends upon i and j. Indeed from (1.6.2) we have
min{i,j,R}
Cij = Cij + L AikBkj·
k=l
A very uneven allocation of work for the block distribution can result because the
number of flops associated with Task(i,j) increases with i and j. The tasks assigned
to Proc(Prow , Peoi ) involve the most work while the tasks assigned to Proc(l,1) involve
the least. To illustrate the ratio of workloads, set M = N = R = NI and assume that
Prow = Peal = p divides M. It can be shown that
Flops assigned to Proc(jj, p)
= O(p)
Flops assigned to Proc(l, 1)
(1.6.3)
if we assume M/p » 1. Thus, load balancing does not depend on problem size and
gets worse as the number of processors increase.
This is not the case for the block-cyclic distribution. Again, Proc(l,1) and
Proc(jj,jj) are the least busy and most busy processors. However, now it can be verified
that
Flops assigned to Proc(jj, p) _
1 0 (p
)
Flops assigned to Proc(l, 1) -
+
M '
(1.6.4)
showing that the allocation of work becomes increasingly balanced as the problem size
grows.
Another situation where the block-cyclic distribution of tasks is preferred is the
case when the first q block rows of A are zero and the first q block columns of B are
zero. This situation arises in several important matrix factorization schemes. Note from
Figure 1.6.1 that if q is large enough, then some processors have absolutely nothing
to do if tasks are assigned according to the block distribution. On the other hand,
the block-cyclic distribution is load balanced, providing further justification for this
method of task distribution.

1.6.3 Data Motion Overheads
So far the discussion has focused on load balancing from the flop point of view. We now
turn our attention to the costs associated with data motion and processor coordination.
How does a processor get hold of the data it needs for an assigned task? How does a
processor know enough to wait if the data it needs is the output of a computation being
performed by another processor? What are the overheads associated with data transfer
and synchronization and how do they compare to the costs of the actual arithmetic?
The importance of data locality is discussed in §1.5. However, in a parallel com
puting environment, the data that a processor needs can be "far away," and if that is
the case too often, then it is possible to lose the multiprocessor advantage. Regarding
synchronization, time spent waiting for another processor to finish a calculation is time
lost. Thus, the design of an effective parallel computation involves paying attention
to the number of synchronization points and their impact. Altogether, this makes it
difficult to model performance, especially since an individual processor can typically
compute and communicate at the same time. Nevertheless, we forge ahead with our
analysis of the model computation to dramatize the cost of data motion relative to
flops. For the remainder of this section we assume:
(a) The block-cyclic distribution of tasks is used to ensure that arithmetic is load
balanced.
(b) Individual processors can perform the computation Cii = Cii + AikBki at a
rate of F flops per second. Typically, a processor will have its own local memory
hierarchy and vector processing capability, so F is an attempt to capture in a
single number all the performance issues that we discussed in §1.5.
(c) The time required to move 'f'/ floating point numbers into or out of a processor
is a + f3TJ. In this model, the parameters a and /3 respectively capture the latency
and bandwidth attributes associated with data transfer.
With these simplifications we can roughly assess the effectiveness of assigning p pro
cessors to the update computation C = C + AB.
Let Tarith(p) be the time that each processor must spend doing arithmetic as it
carries out its share of the computation. It follows from assumptions (a) and (b) that
2mnr
Tarith (p) �
pF
·
(1.6.5)
Similarly, let Tdata(P) be the time that each processor must spend acquiring the data
it needs to perform its tasks. Ordinarily, this quantity would vary significantly from
processor to processor. However, the implementation strategies outlined below have the
property that the communication overheads are roughly the same for each processor.
It follows that if Tarith (P) + Tdata(P) approximates the total execution time for the
p-.processor solution, then the quotient
S(p) =
Tarith{l)
=
Tarith (p) + Tdata(p)
p
l + Tdata(P)
Tarith {p)
(1.6.6)

is a reasonable measure of speedup. Ideally, the assignment of p processors to the
C = C + AB update would reduce the single-processor execution time by a factor
of p. However, from {l.6.6) we see that S(p) < p with the compute-to-communicate
ratio Tdata{p)/Tarith(P) explaining the degradation. To acquire an intuition about this
all-important quotient, we need to examine more carefully the data transfer properties
associated with each task.
1 .6.4 Who Needs What
If a processor carries out Task{i, j), then at some time during the calculation, blocks
cij, Ail, . . . ' AiR• B13, . . . ' BRj must find their way into its local memory. Given as
sumptions (a) and (c), Table 1.6.1 summarizes the associated data transfer overheads
for an individual processor:
Required Blocks Data Transfer Time per Block
Ci3 i = µ:prow:M j = 'T:Pco1:N a + f3m1n1
Aii i = µ:pr0w:N/ j = l:R a + f3m1r1
Bii i = l:R j = 'T:Pco1:N a + f3r1n1
TABLE 1. 6. 1 . Communication overheads for Proc(µ, r)
It follows that if
then
'Ye = total number of required C-block transfers,
'YA = total number of required A-block transfers,
'Yo = total number of required B-block transfers,
Tdata(P) ::::::: 'Ye(a + f:Jm1n1) + 'YA(a + f3m1r1) + 'YB(a + f3r1n1),
and so from from {1.6.5) we have
Tdata(P)
:::::::
Fp (a'Ye + 'YA + 'Yo a ( 'Ye 'YA 'YB ))
T. ( ) 2 + JJ MNr + MnR + mNR
.
arith P mnr
{l.6.7)
{1.6.8)
{l.6.9)
{l.6.10)
To proceed further with our analysis, we need to estimate the 7-factors {l.6.7)-(1.6.9),
and that requires assumptions about how the underlying architecture stores a.nd ac
cesses the matrices A, B, and C.
1.6.5 The Shared-Memory Paradigm
In a shared-memory system each processor ha.i;; access to a common, global memory.
See Figure 1.6.3. During program execution, data flows to and from the global memory
and this represents a significant overhead that we proceed to assess. Assume that the
matrices C, A, and B are in global memory at the start and that Proc(µ, r) executes
the following:

1.6. Parallel Matrix Multiplication
Global Memory
Figure 1.6.3. A four-processor shared-memory system
for i = µ:Prow:A/
end
for j = T:Pco1 :N
C(loc) +- C;j
for k = l :R
end
A(loc) +- Aik
B(loc) +- B1.;j
c<JocJ = c<toc) +
A(toe)n<tocJ
Cij +- c<toc)
end
55
( Method 1)
As a reminder ofthe interactions between global and local memory, we use the "+-" no
tation to indicate data transfers between these memory levels and the "loc" superscript
to designate matrices in local memory. The block transfer statistics (1.6.7)-(1.6.9) for
Method 1 are given by
and so from (1.6.10) we obtain
'Ye � 2(MN/p),
'YA � R(!l'!N/p),
'Yn � R(MN/p),
(1.6.11)
By substituting this result into (1.6.6) we conclude that (a) speed-up degrades as the
flop rate F increases and (b) speedup improves if the communication parameters a and
/3 decrease or the block dimensions m1, n1 , and r1 increase. Note that the communicate
to-compute ratio (1.6.11) for Method 1 does not depend upon the number of processors.

Method 1 has the property that it is only necessary to store one C-block, one A
block, and one B-block in local memory at any particular instant, i.e., C(loc), A(loc), and
B(loc). Typically, a processor's local memory is much smaller than global memory, so
this particular solution approach is attractive for problems that are very large relative
to local memory capacity. However, there is a hidden cost associated with this economy
because in Method 1, each A-block is loaded N/Pcol times and each B-block is loaded
M/Prow times. This redundancy can be eliminated if each processor's local memory
is large enough to house simultaneously all the C-blocks, A-blocks, and B-blocks that
are required by its assigned tasks. Should this be the case, then the following method
involves much less data transfer:
for k = l:R
end
A(loc) L_
A·
ik ...---- ik
B(loc) L_
B .
kj ...----
k3
for i = µ:prow:M
for j = r:Pco1:N
C(loc) +--- Cij
for k = l:R
(i = /L:Prow:M)
(j = T:Pcol:N)
C(loc) =
C(loc) + A��oc)Bk�oc)
end
end
Cij t- C(loc)
end
(Method 2)
The block transfer statistics "{�, "{�, and "{� , for Method 2 are more favorable than for
Method 1. It can be shown that
I
'Ye = "fc, (1.6.12)
where the quotients fcol = Pcoi/N and /row = Prow/M are typically much less than
unity. As a result, the communicate-to-compute ratio for Method 2 is given by
Tdata(P) ......, F ( 2 + R (fcol + /row) /3 (� 2_J. 2_ I ))
( )
......, a + + col + Jrow '
Tarith p 2 m1n1r r ni m1
(1.6.13)
which is an improvement over (1.6.11). Methods 1 and 2 showcase the trade-off that
frequently exists between local memory capacity and the overheads that are associated
with data transfer.
1.6.6 Barrier Synchronization
The discussion in the previous section assumes that C, A, and B are available in global
memory at the start. If we extend the model computation so that it includes the

multiprocessor initialization of these three matrices, then an interesting issue arises.
How does a processor "know" when the initialization is complete and it is therefore
safe to begin its share of the C = C + AB update?
Answering this question is an occasion to introduce a very simple synchronization
construct known as the barrier. Suppose the C-matrix is initialized in global memory
by assigning to each processor some fraction of the task. For example, Proc(µ, T) could
do this:
for i = µ:prow:!v/
end
for j = T:Pco1:N
end
Compute the (i, j) block of C and store in C(loc).
Cij +-- C(loc)
Similar approaches can be taken for the setting up of A = (Aij) and B = (Bi3). Even
if this partitioning of the initialization is load balanced, it cannot be assumed that each
processor completes its share of the work at exactly the same time. This is where the
barrier synchronization is handy. Assume that Proc(µ, T) executes the following:
Initialize Cij , i = µ : Prow : }ii
[, j = T :Pcol : N
Initialize Aij , i = µ : Prow : M, j = T : Pcol : R
Initialize Bij , i = µ : Prow : R, j = T : Pcol : N (1.6.14)
barrier
Update Ci3, i = µ : prow : 1'vf, j = T : Pco1 : N
To understand the barrier command, regard a processor as being either "blocked" or
"free." Assume in (1.6.14) that all processors are free at the start. When it executes the
barrier command, a processor becomes blocked and suspends execution. After the last
processor is blocked, all the processors return to the free state and resume execution.
In (1.6.14), the barrier does not allow the Ci3 updating via Methods 1 or 2 to begin
until all three matrices are fully initialized in global memory.
1.6.7 The Distributed-Memory Paradigm
In a distributed-memory system there is no global memory. The data is collectively
housed in the local memories of the individual processors which are connected to form
a network. There arc many possible network topologies. An example is displayed in
Figure 1.6.4. The cost associated with sending a message from one processor to another
is likely to depend upon how "close" they are in the network. For example, with the
torus in Figure 1.6.4, a message from Proc(l,1) to Proc(l,4) involves just one "hop"
while a message from Proc(l,1) to Proc(3,3) would involve four.
Regardless, the message-passing costs in a distributed memory system have a
serious impact upon performance just as the interactions with global memory affect
performance in a shared memory system. Our goal is to approximate these costs as
they might arise in the model computation. For simplicity, we make no assumptions
about the underlying network topology.

Proc(l , l ) Proc(l,4)
Proc(4,l) Proc(4,4)
Figure 1.6.4. A 2-Dimensional Torus
Let us first assume that M = N = R = Prow = Pcol = 2 and that the C, A, and
B matrices are distributed as follows:
Proc(l,l} Proc(l,2}
Cn, Au, Bn
Proc(2,1} Proc(2,2}
Assume that Proc(i, j) oversees the update of Cij and notice that the required data for
this computation is not entirely local. For example, Proc(l,1) needs to receive a copy of
Ai2 from Proc(l,2) and a copy of B21 from Proc(2,1) before it can complete the update
C11 = C11 + AuB11 + Ai2B21· Likewise, it must send a copy of Au to Proc(l,2) and
a copy of Bu to Proc(2,1) so that they can carry out their respective updates. Thus,
the local programs executing on each processor involve a mix of computational steps
and message-passing steps:

Proc(l,1) Proc(l,2)
Send a copy of Au to Proc(l,2) Send a copy of A12 to Proc(l,1)
Receive a copy of A12 from Proc(l,2) Receive a copy of Au from Proc(l,1)
Send a copy of Bu to Proc(2,1) Send a copy of B12 to Proc(2,2)
Receive a copy of B21 from Proc(2,1) Receive a copy of B22 from Proc(2,2)
Cu = Cu + A1 1 B1 1 + A12B2 1 C12 = C12 + A1 1 B1 2 + A12B22
Proc(2,1) Proc(2,2)
Send a copy of A21 to Proc(2,2) Send a copy of A22 to Proc(2,1)
Receive a copy of A22 from Proc(2,2) Receive a copy of A21 from Proc(2,1)
Send a copy of B21 to Proc(l,1) Send a copy of B22 to Proc(l,2)
Receive a copy of B1 1 from Proc(l,1) Receive a copy of B12 from Proc(l,2)
C21 = C2 1 + A2 1 B1 1 + A22B21 C22 = C22 + A21 B12 + A22B22
This informal specification of the local programs does a good job delineating the duties
ofeach processor, but it hides several important issues that have to do with the timeline
of execution. (a) Messages do not necessarily arrive at their destination in the order
that they were sent. How will a receiving processor know if it is an A-block or a B
block? (b) Receive-a-message commands can block a processor from proceeding with
the rest of its calculations. As a result, it is possible for a processor to wait forever for
a message that its neighbor never got around to sending. (c) Overlapping computation
with communication is critical for performance. For example, after Au arrives at
Proc(l,2), the "halr' update C12 = C12 + AuB12 can be carried out while the wait for
B22 continues.
As can be seen, distributed-memory matrix computations are quite involved and
require powerful systems to manage the packaging, tagging, routing, and reception of
messages. The discussion of such systems is outside the scope of this book. Neverthe
less, it is instructive to go beyond the above 2-by-2 example and briefly anticipate the
data transfer overheads ·for the general model computation. Assume that Proc(µ, r )
houses these matrices:
Cii• i = µ :prow : M, j = T : Pco1 : N,
Aij' i = /L =Prow : Jvf, j = T : pcol : R,
Bij, i = µ :Prow : R, j = T : Pcol : N.
From Table 1.6.1 we conclude that if Proc(/.t, T) is to update Cij for i = µ : Prow : M
and j = r:pcol : N, then it must
(a) For i = µ : prow : M and j = r : Pcol : R, send a copy of Aij to
Proc(µ, 1), . . . , Proc(µ, T - 1), Proc(µ, T + 1), . . . , Proc(µ,pc01).
Data transfer time � (Pcol - l)(M/Prow)(R/Pcol) (a + /3m1r1)
(b) For i = µ : Prow : R and j = T : Pcol : N, send a copy of Bij to
Proc(l, r), . . . , Proc(µ - 1), r), Proc(µ + 1, r ), . . . , Proc(Prow 1 r).
Data transfer time � (prow - l)(R/Prow)(N/Pcol) (a + /3r1n1)

(c) Receive copies of the A-blocks that are sent by processors
Proc(µ, 1), . . . , Proc(µ, r - 1), Proc(µ, r + 1), . . . , Proc(µ, Pcoi).
Data transfer time � (Pcol - l)(M/Prow)(R/Pcol) (a + f3m1r1)
(d) Receive copies of the E-blocks that are sent by processors
Proc(l, r), . . . , Proc(µ - 1), r), Proc(µ + 1, r ), . . . , Proc(Prow, r ).
Data transfer time � (Prow - 1)(R/Prow)(N/Peal) (a + f3r1ni)
Let Tdata be the summation of these data transfer overheads and recall that Tarith =
(2mnr)/(Fp) since arithmetic is evenly distributed around the processor network. It
follows that
Tdata(P)
� F (a(�+ Prow ) + {3 (Pcol + Prow)).
Tarith(P) m1r1n mr1n1 n m
(1.6.15)
Thus, as problem size grows, this ratio tends to zero and speedup approaches p accord
ing to (1.6.6).
1.6.8 Cannon's Algorithm
We close with a brief description of the Cannon (1969) matrix multiplication scheme.
The method is an excellent way to showcase the toroidal network displayed in Figure
1.6.4 together with the idea of "nearest-neighbor" thinking which is quite important in
distributed matrix computations. For clarity, let us assume that A = (Aij), B = (Eij),
and C = (Cij) are 4-by-4 block matrices with n1-by-n1 blocks. Define the matrices
[An
A{ll A22
A33
A44
[A,.
A(2) A21
A32
A43
[Arn
A(3) A24
A31
A42
[A12
A(4) =
A23
A34
A41
and note that
cij
Ai2 Ai3 A,.
1
[Bn E22 E33
A23 A24 A21 n<1J E21 E32 E43
A34 A31 A32 ' =
E31 B42 E13
A41 A42 A43 E41 B12 E23
A11 Ai2 Arn
1
[B41 Ei2 E23
A22 A23 A24 n<2J Ell E22 E33
A33 A34 A31 ' =
E21 E32 E43
A44 A41 A42 E31 E42 E13
Ai4 A11 A.,
1
[B31 E42 B13
A21 A22 A23 E(3) E41 Ei2 E23
A32 A33 A34 ' =
Ell E22 B33
A43 A44 A41 E21 E32 E43
A13 Ai4 An
1
[B21 E32 E43
A24 A21 A22 B(4) =
E31 E42 E13
A31 A32 A33 ' E41 E12 E23
A42 A43 A44 Ell B22 E33
A(1)n<tl + A(2)E(2) + A3)E(3)
•J •J •J •J •J •J
+ A(�)E<4l
•J •J
.
B44
]·
Ei4
E24
E34
E34
]·
E44
B14
B24
Bu
1
B34
B44 '
B14
B,.
1
E24
E34 '
E44
(1.6.16)

Refer to Figure 1.6.4 and assume that Proc(i, j) is in charge of computing Cij and that
at the start it houses both Ai}) and B�?. The message passing required to support
the updates
Cij = Cij + Ai})BU),
Cij = Cij + AiJlBg),
cij = cij + AWB�l ,
Cij = Cij + AWB�),
(1.6.17)
(1.6.18)
(1.6.19)
(1.6.20)
involves communication with Proc(i, j)'s four neighbors in the toroidal network. To
see this, define the block downshift permutation
1]
and observe that A(k+l) = A(k)pT and B(k+l) = PB(k)_ That is, the transition from
A(k) to A(k+l) involves shifting A-blocks to the right one column (with wraparound)
while the transition from B(k) to B(k+l) involves shifting the B-blocks down one row
(with wraparound). After each update (1.6.17)-(1.6.20), the housed A-block is passed
to Proc(i, j)'s "east" neighbor and the next A-block is received from its "west" neigh
bor. Likewise, the housed B-block is sent to its "south" neighbor and the next B-block
is received from its "north" neighbor.
Of course, the Cannon algorithm can be implemented on any processor network.
But we see from the above that it is particularly well suited when there are toroidal
connections for then communication is always between adjacent processors.
Problems
Pl.6.1 Justify Equations (1.6.3) and (1.6.4).
Pl.6.2 Contrast the two task distribution strategics in §1.6.2 for the case when the first q block rows
of A are zero and the first q block columns of B are zero.
Pl.6.3 Verify Equations (1.6.13) and (1.6.15).
Pl.6.4 Develop a shared memory method for overwriting A with A2 where it is assumed that A E Rn x n
resides in global memory at the start.
Pl.6.5 Develop a shared memory method for computing B = ATA where it is assumed that A E Rmx n
resides in global memory at the start and that B is stored in global memory at the end.
Pl.6.6 Prove (1.6.16) for general N. Use the block downshift matrix to define A(i) and B(il .
To learn more about the practical implementation of parallel matrix multiplication, see scaLAPACK as
well as:
L. Cannon (1969). "A Cellular Computer to Implement the Kalman Filter Algorithm," PhD Thesis,
Montana State University, Bozeman, MT.

K. Gallivan, W. Jalby, and U. Meier (1987). "The Use of BLAS3 in Linear Algebra on a Parallel
Processor with a Hierarchical Memory," SIAM J. Sci. Stat. Comput. 8, 1079-1084.
P. Bj121rstad, F. Manne, T.S121revik, and M. Vajtersic (1992}. "Efficient Matrix Multiplication on SIMD
Computers," SIAM J. Matrix Anal. Appl. 19, 386-401.
S.L. Johnsson (1993). "Minimizing the Communication Time for Matrix Multiplication on Multipro
cessors," Parallel Comput. 19, 1235--1257.
K. Mathur and S.L. Johnsson (1994). "Multiplication of Matrices of Arbitrary Shape on a Data
Parallel Computer," Parallel Comput. 20, 919-952.
J. Choi, D.W. Walker, and J. Dongarra (1994} "Pumma: Parallel Universal Matrix Multiplication
Algorithms on Distributed Memory Concurrent Computers," Concurrnncy: Pmct. Exper. 6, 543-
570.
R.C. Agarwal, F.G. Gustavson, and M. Zubair (1994). "A High-Performance Matrix-Multiplication
Algorithm on a Distributed-Memory Parallel Computer, Using Overlapped Communication,'' IBM
J. Res. Devel. 98, 673-681.
D. Irony, S. Toledo, and A. Tiskin (2004). "Communication Lower Bounds for Distributed Memory
Matrix Multiplication," J. Parallel Distrib. Comput. 64, 1017-1026.
Lower bounds for communication overheads are important as they establish a target for implementers,
see:
G. Ballard, J. Demmel, 0. Holtz, and 0. Schwartz (2011). "Minimizing Communication in Numerical
Linear Algebra," SIAM. J. Matrix Anal. Applic. 92, 866-901.
Matrix transpose in a distributed memory environment is surprisingly complex. The study of this
central, no-flop calculation is a reminder of just how important it is to control the costs of data
motion. See
S.L. Johnsson and C.T. Ho (1988). "Matrix Transposition on Boolean N-cube Configured Ensemble
Architectures,'' SIAM J. Matrix Anal. Applic. 9, 419-454.
J. Choi, J.J. Dongarra, and D.W. Walker (1995). "Parallel Matrix Transpose Algorithms on Dis-
tributed Memory Concurrent Computers,'' Parallel Comput. 21, 1387-1406.
The parallel matrix computation literature is a vast, moving target. Ideas come and go with shifts
in architectures. Nevertheless, it is useful to offer a small set of references that collectively trace the
early development of the field:
D. Heller (1978). "A Survey of Parallel Algorithms in Numerical Linear Algebra,'' SIAM Review 20,
740-777.
J.M. Ortega and R.G. Voigt (1985). "Solution of Partial Differential Equations on Vector and Parallel
Computers,'' SIAM Review 27, 149-240.
D.P. O'Leary and G.W. Stewart (1985). "Data Flow Algorithms for Parallel Matrix Computations,''
Commun. A CM 28, 841-853.
J.J. Dongarra and D.C. Sorensen (1986). "Linear Algebra on High Performance Computers,'' Appl.
Math. Comput. 20, 57-88.
M.T. Heath, ed. (1987}. Hypercube Multiprocessors, SIAM Publications, Philadelphia, PA.
Y. Saad and M.H. Schultz (1989}. "Data Communication in Parallel Architectures,'" J. Dist. Parallel
Comput. 11, 131-150.
J.J. Dongarra, I. Duff, D. Sorensen, and H. van der Vorst (1990). Solving Linear Systems on Vector
and Shared Memory Computers, SIAM Publications, Philadelphia, PA.
K.A. Gallivan, R.J. Plemmons, and A.H. Sameh (1990). "Parallel Algorithms for Dense Linear Algebra
Computations,'' SIAM Review 92, 54-135.
J.W. Demmel, M.T. Heath, and H.A. van der Vorst (1993). "Parallel Numerical Linear Algebra,'' in
Acta Numerica 1.999, Cambridge University Press.
A. Edelman (1993). "Large Dense Numerical Linear Algebra in 1993: The Parallel Computing Influ
ence,'' Int. J. Supercomput. Applic. 7, 113 -128.

Chapter 2
Matrix Analysis
2.1 Basic Ideas from Linear Algebra
2.2 Vector Norms
2.3 Matrix Norms
2.5 Subspace Metrics
The analysis and derivation of algorithms in the matrix computation area requires
a facility with linear algebra. Some of the basics are reviewed in §2.1. Norms are
particularly important, and we step through the vector and matrix cases in §2.2 and
§2.3. The ubiquitous singular value decomposition is introduced in §2.4 and then
used in the next section to define the CS decomposition and its ramifications for the
measurement of subspace separation. In §2.6 we examine how the solution to a linear
system Ax = b changes if A and b are perturbed. It is the ideal setting for introducing
the concepts of problem sensitivity, backward error analysis, and condition number.
These ideas are central throughout the text. To complete the chapter we develop a
model of finite-precision floating point arithmetic based on the IEEE standard. Several
canonical examples of roundoff error analysis are offered.
Reading Notes
Familiarity with matrix manipulation consistent with §1.1-§1.3 is essential. The
sections within this chapter depend upon each other as follows:
§2.5
t
§2.1 --+ §2.2 --+ §2.3 --+ §2.4
.i
§2.6 --+ §2.7
63

64 Chapter 2. Matrix Analysis
Complementary references include Forsythe and Moler (SLAS), Stewart (IMC), Horn
and Johnson (MA), Stewart (MABD), Ipsen (NMA), and Watkins (FMC). Funda
mentals of matrix analysis that are specific to least squares problems and eigenvalue
problems appear in later chapters.
2.1 Basic Ideas from Linear Algebra
This section is a quick review of linear algebra. Readers who wish a more detailed
coverage should consult the references at the end of the section.
2.1.1 Independence, Subspace, Basis, and Dimension
A set of vectors {a1, . . . ' an} in nr is linearly independent if 2::7=1 Djaj = 0 implies
a(l:n) = 0. Otherwise, a nontrivial combination of the ai is zero and {a1, . . . , an} is
said to be linearly dependent.
A subspace of IR.m is a subset that is also a vector space. Given a collection of
vectors a1, . . . , an E IR.m, the set of all linear combinations of these vectors is a subspace
referred to as the span of {a1, . . . , an}:
n
span{a1, . . . , an} = {L.Biai : .Bi E IR} .
j=l
If {a1, . . . , an} is independent and b E span{a1, . . . , an}, then b is a unique linear com
bination of the aj.
If si , . . . , sk are subspaces of IR.m' then their sum is the subspace defined by
S = { a1 + a2 + · · · + ak : ai E Si, i = l:k }. S is said to be a direct sum if each
v E S has a unique representation v = a1 + · · · + ak with ai E Si. In this case we write
S = S1 EB · · · EB Sk. The intersection of the Si is also a subspace, S = S1 n S2 n · · · n Sk·
The subset {ai1 , • • • , ah } is a maximal linearly independent subset of {a1 , . . . , an}
if it is linearly independent and is not properly contained in any linearly indepen
dent subset of {a1, . . . , an}· If {ai1 , • • • , aik} is maximal, then span{a1 , . . . , an} =
span{ai, , . . . , aik} and {ail ' . . . , aik } is a basis for span{ai, . . . , an}· If S � IR.m is a
subspace, then it is possible to find independent basic vectors a1, . . . , ak E S such that
S = span{a1, . . . , ak}. All bases for a subspace S have the same number of clements.
This number is the dimension and is denoted by dim(S).
2.1.2 Range, Null Space, and Rank
There are two important subspaces associated with an m-by-n matrix A. The range
of A is defined by
ran(A) = {y E IR.m : y = Ax for some x E IR.n}
and the nullspace of A is defined by
null(A) = {x E lEe : Ax = O}.
If A = [ ai I· · · Ian ] is a column partitioning, then
ran(A) = span{a1, . . . , an}·

2.1. Basic Ideas from Linear Algebra
The rank of a matrix A is defined by
rank(A) = dim (ran{A)) .
If A E IRmxn, then
dim{null{A)) + rank{A) = n.
65
We say that A E IRmxn is rank deficient if rank{A) < min{m, n}. The rank of a matrix
is the maximal number of linearly independent columns (or rows).
2.1.3 Matrix Inverse
If A and X are in IR.nxn and satis(y AX = I, then X is the inverse of A and is
denoted by A-1• If A-1 exists, then A is said to be nonsingular. Otherwise, we say A
is singular. The inverse of a product is the reverse product of the inverses:
{2.1.1)
Likewise, the transpose of the inverse is the inverse of the transpose:
(2.1.2)
2.1.4 The Sherman-Morrison-Woodbury Formula
The identity
{2.1.3)
shows how the inverse changes ifthe matrix changes. The Sherman-Morrison- Woodbury
formula gives a convenient expression for the inverse of the matrix (A + UVT) where
A E IR.nxn and U and V are n-by-k:
{2.1.4)
A rank-k correction to a matrix results in a rank-k correction ofthe inverse. In {2.1.4)
we assume that both A and (I + VTA-1U) are nonsingular.
The k = 1 case is particularly useful. If A E IR.nxn is nonsingular, u, v E IRn, and
a = 1 + vTA-1u =J 0, then
(A + uvT)-1 =
A-1 - .!_A-1uvTA-1.
a
This is referred to as the Sherman-Morrison formula.
2.1.5 Orthogonality
{2.1.5)
A set of vectors {x1, . . • , Xp} in IRm is orthogonal if x[xj = 0 whenever i =J j and
orthonormal if x[Xj = Oij· Intuitively, orthogonal vectors are maximally independent
for they point in totally different directions.
A collection of subspaces S1, • . . , Sp in IRm is mutually orthogonal if x
Ty = 0
whenever x E Si and y E Sj for i =J j. The orthogonal complement of a subspace
S <;; IRm is defined by
SJ. = {y E IRm : YTx = 0 for all x E S}.

It is not hard to show that ran (A).L = null(AT) . The vectors v1 , . . . , Vk form an or
thonormal basis for a subspace S � Rm if they are orthonormal and span S.
A matrix Q E Rmxm is said to be orthogonal if QTQ = I. If Q = [ Q1 I · · · I Qm ]
is orthogonal, then the Qi form an orthonormal basis for Rm. It is always possible to
extend such a basis to a full orthonormal basis {v1 , • . . , Vm} for Rm:
Theorem 2.1.1. IfVi E lR�xr has orthonormal columns, then there exists V2 E Rnx(n-r)
such that
V = [ Vi l V2 ]
is orthogonal. Note that ran(Vi).L = ran(V2).
Proof. This is a standard result from introductory linear algebra. It is also a corollary
of the QR factorization that we present in §5.2. D
2.1.6 The Determinant
If A = (a) E R1x1, then its determinant is given by det(A) = a. The determinant of
A E Rnxn is defined in terms of order-(n- 1) determinants:
n
det(A) = L:)-1)i+la1idet(A1j)·
j=l
Here, A1j is an (n - 1)-by-(n - 1) matrix obtained by deleting the first row and jth col
umn of A. Well-known properties of the determinant include det(AB) = det(A)det(B),
det(AT) = det(A), and det(cA) = cndet(A) where A, B E Rnxn and c E R. In addition,
det(A) =/; 0 if and only if A is nonsingular.
2.1.7 Eigenvalues and Eigenvectors
Until we get to the main eigenvalue part of the book (Chapters 7 and 8), we need
a handful of basic properties so that we can fully appreciate the singular value de
composition (§2.4), positive definiteness (§4.2), and various fast linear equation solvers
(§4.8).
The eigenvalues of A E <Cnxn arc the zeros of the characteristic polynomial
p(x) = det(A - xl).
Thus, every n-by-n matrix has n eigenvalues. We denote the set of A's eigenvalues by
>.(A) = { x : det(A - xl) = 0 }.
If the eigenvalues of A are real, then we index them from largest to smallest as follows:
In this case, we sometimes use the notation >.max(A) and >.min(A) to denote >.1 (A) and
>.n(A) respectively.

2.1. Basic Ideas from Linear Algebra 67
If X E ccnxn is nonsingular and B = x-1AX, then A and B are similar. If two
matrices are similar, then they have exactly the same eigenvalues.
If .. E ..(A), then there exists a nonzero vector x so that Ax = .Xx. Such a vector
is said to be an eigenvector for A associated with ... If A E ccnxn has n independent
eigenvectors X1, . . . , Xn and Axi = AiXi for i = 1 :n, then A is diagonalizable. The
terminology is appropriate for if
then
x-1AX = diag(..i, . . . , .Xn)·
Not all matrices are diagonalizable. However, if A E Rnxn is symmetric, then there
exists an orthogonal Q so that
{2.1.6)
This is called the Schur decomposition. The largest and smallest eigenvalues of a
symmetric matrix satisfy
and
2.1.8 Differentiation
xTAx
Amax{A) = max -
T
-
x�O X X
Amin{A)
xTAx
= min --.
x�o xTx
{2.1.7)
{2.1.8)
Suppose o is a scalar and that A(o) is an m-by-n matrix with entries aij{o). If aij(o)
is a differentiable function of o for all i and j, then by A(o) we mean the matrix
. d ( d )
A(o) =
do
A(o) =
do
aii (o) = (aii (o)).
Differentiation is a useful tool that can sometimes provide insight into the sensitivity
of a matrix problem.
Problems
P2.1.1 Show that if A E Rm x n has rank p, then there exists an X E R"' x p and a Y E R'' xp such that
A = XYT, where rank(X) = rank(Y) = p.
P2.1.2 Suppose A(a) E Rm x r and B(a) E wx n are matrices whose entries are differentiable functions
of the scalar a. (a) Show
�
[A(a)B(a)] = [�A(a)] B(a) + A(a) [�B(a)]
da da da
(b) Assuming A(a) is always nonsingular, show
! (A(a)-1] = -A(a)-1 [!A(a)] A(a)-1•
P2.1.3 Suppose A E R" x n
, b E Rn and that <f>(x) = �xTAx - xTb. Show that the gradient of <f> is
given by 'V<f>(x) = �(AT + A)x - b.

P2.1.4 Assume that both A and A + uvT are nonsingular where A E Rn x n and u, v E Rn . Show
that if x solves (A + uvT)x = b, then it also solves a perturbed right-hand-side problem of the form
Ax = b + cm. Give an expression for a in terms of A, u, and v.
P2.1.5 Show that a triangular orthogonal matrix is diagonal.
P2.1.6 Suppose A E Rn x n is symmetric and nonsingular and define
A = A + a(uuT + vvT) + {3(uvT + vuT)
where u, v E Rn and a, {3 E R. Assuming that A is nonsingular, use the Sherman-Morrison-Woodbury
formula to develop a formula for A-l.
P2.1.7 Develop a symmetric version of the Sherman-Morrison-Woodbury formula that characterizes
the inverse of A + USUT where A E Rn x n and S E Rk x k are symmetric and U E Rn x k.
P2.l.8 Suppose Q E Rn x n is orthogonal and z E Rn . Give an efficient algorithm for setting up an
m-by-m matrix A = (aij) defined by aij = vT(Qi)T(Qi)v.-
P2.1.9 Show that if S is real and sT = -S, then I - S is nonsingular and the matrix (I - s)-1 (I + S)
is orthogonal. This is known as the Cayley transform of S.
P2.1.10 Refer to §1.3.10. (a) Show that if S E R2n x 2n is symplectic, then s-1 exists and is also
symplectic. (b) Show that if M E R2n x 2n is Hamiltonian and S E R2n x 2n is symplectic, then the
matrix M1 = s-1 MS is Hamiltonian.
P2.1.11 Use (2.1.6) to prove (2.1.7) and (2.1.8).
In addition to Horn and Johnson (MA) and Horn and Johnson (TMA), the following introductory
applied linear algebra texts are highly recommended:
R. Bellman (1997). Introduction to Matrix Analysis, Second Edition, SIAM Publications, Philadel-
phia, PA.
C. Meyer (2000). Matrix Analysis and Applied Linear Algebra, SIAM Publications, Philadelphia, PA.
D. Lay (2005). Linear Algebra and Its Applications, Third Edition, Addison-Wesley, Reading, MA.
S.J. Leon (2007). Linear Algebra with Applications, Seventh Edition, Prentice-Hall, Englewood Cliffs,
NJ.
G. Strang (2009). Introduction to Linear Algebra, Fourth Edition, SIAM Publications, Philadelphia,
PA.
2.2 Vector Norms
A norm on a vector space plays the same role as absolute value: it furnishes a distance
measure. More precisely, Rn together with a norm on Rn defines a metric space
rendering the familiar notions of neighborhood, open sets, convergence, and continuity.
2.2.1 Definitions
A vector norm on R" is a function f:Rn -t R that satisfies the following properties:
f(x) 2: 0,
f(x+y) � f(x)+f(y),
f(ax) = lal/(x),
x E Rn, (f(x) = 0, iff x= 0),
x,y E Rn,
a E R,xE Rn.
We denote such a function with a double bar notation: f(x) = II x11- Subscripts on
the double bar are used to distinguish between various norms. A useful class of vector

2.2. Vector Norms
norms are the p-norms defined by
p :'.:: 1 .
The 1-, 2-, and oo- norms are the most important:
II x 111
11 x 112
11 x 1100
lx1 I +· · · +lxnl,
1 I
(lx1 12 + . . · +lxnl2) 2 = (xTx) 2 ,
max lxil·
1:5i:5n
69
(2.2.1)
A unit vector with respect to the norm II · II is a vector x that satisfies IIx II=1.
2.2.2 Some Vector Norm Properties
A classic result concerning p-norms is the Holder inequality:
1 1
- + - = l.
p q
A very important special case of this is the Cauchy-Schwarz inequality:
(2.2.2)
(2.2.3)
All norms on R.n are equivalent , i.e., if 11 · 110 and 11 · 11,a are norms on R.n, then
there exist positive constants c1 and c2 such that
for all x E R.n. For example, if x E R.n, then
IIx 112 :::; II x 111 < v'n IIx 112,
11 x lloo :::; 11 x 112 < v'n 11 x 1100,
ll x ll00 :s; ll x ll1 < n ll x ll00•
(2.2.4)
(2.2.5)
(2.2.6)
(2.2.7)
Finally, we mention that the 2-norm is preserved under orthogonal transformation.
Indeed, if Q E R.nxn is orthogonal and x E R.n, then
2.2.3 Absolute and Relative Errors
Suppose x E R.n is an approximation to x E R.n. For a given vector norm II · II we say
that
€abs = II x - x II
is the absolute error in x. If x =/:- 0, then
€rel = ll x - x ll
II x ii

prescribes the relative error in x. Relative error in the oo-norm can be translated into
a statement about the number of correct significant digits in x. In particular, if
II x - xlloo ,...,
10-11
II xlloc
,...,
'
then the largest component of x has approximately p correct significant digits. For
example, if x = [ 1.234 .05674 ]T and x = [ 1.235 .05128 ]T, then II x - x1100/ll x1100 �
.0043 � 10-3. Note than x1 has about three significant digits that arc correct while
only one significant digit in x2 is correct.
2.2.4 Convergence
We say that a sequence {x(k)} of n-vectors converges to x if
lim II x(k) -xII = 0.
k-+oo
Because of (2.2.4), convergence in any particular norm implies convergence in all norms.
Problems
P2.2.1 Show that if x E Rn , then limp-+oo II x llp = II x 1100•
P2.2.2 By considering the inequality 0 � (ax + by)T(a.-i: + by) for suitable scalars a and b , prove
(2.2.3).
P2.2.3 Verify that II · 111 , II · 112, and II · 1100 are vector norms.
P2.2.4 Verify (2.2.5)-(2.2.7). When is equality achieved in each result'!
P2.2.5 Show that in Rn , x!i) --+ x if and only if xii) --+ Xk for k = l:n.
P2.2.6 Show that for any vector norm on Rn that I II x II - 11 y 11 I � 11 x - y 11·
P2.2.7 Let II · II be a vector norm on R!" and assume A E Rm x n . Show that if rank(A) = n, then
II x llA = 11 Ax II is a vector norm on R" .
P2.2.8 Let x and y be in R" and define 1/J:R --+ R by 'l/l(a) = I I x - ay 112· Show that 'I/I is minimized
if ct = xTyjyTy.
P2.2.9 Prove or disprove:
l + y'n 2
v E Rn => ll v ll i ll v lloo � -2
- ll v lb·
P2.2.10 If x E R3 and y E R:i then it can be shown that lxTyl = II x 11211 y 1121 cos(8)1 where 8 is the
angle between x and y. An analogous result exists for the cross product defined by
x x y =
[::�: = ;:�� l·
XJY2 - x2y1
In particular, II x x y 112 = II x 112 11 y 1121 sin(B}I. Prove this.
P2.2.11 Suppose x E Rn and y E Rrn . Show that
II x ® Y llv = II x llvll Y 1111
for p = 1, 2, and co .
Although a vector norm is "just" a generalization of the absolute value concept, there are some
noteworthy subtleties:
J.D. Pryce (1984). "A New Measure of Relative Error for Vectors," SIAM .J. Numer. Anal. 21,
202-221.

2.3. Matrix Norms 71
2.3 Matrix Norms
The analysis of matrix algorithms requires use of matrix norms. For example, the
quality of a linear system solution may be poor if the matrix of coefficients is "nearly
singular." To quanti(y the notion of near-singularity, we need a measure of distance on
the space of matrices. Matrix norms can be used to provide that measure.
2.3.l Definitions
Since Rmxn is isomorphic to Rmn, the definition of a matrix norm should be equivalent
to the definition of a vector norm. In particular, /:Rmxn --+ R is a matrix norm if the
following three properties hold:
f(A) ;::: 0,
I(A + B) � I(A) + J(B),
f(aA) = lodf(A),
A E Rmxn, (f(A) = 0 iff A = 0)
A, B E Rm
xn,
a E 111, A E 1Rmxn.
As with vector norms, we use a double bar notation with subscripts to designate matrix
norms, i.e., 11 A II = f(A).
The most frequently used matrix norms in numerical linear algebra are the Frobe
nius norm
and the p-norms
m n
ll A llF = :L:L laijl2
i= l j=l
II Ax llP
sup
II · ii .
x;i!O X p
(2.3.1)
(2.3.2)
Note that the matrix p-norms are defined in terms of the vector p-norms discussed in
the previous section. The verification that (2.3.1) and (2.3.2) are matrix norms is left
as an exercise. It is clear that II A llP is the p-norm of the largest vector obtained by
applying A to a unit p-norm vector:
max II A:i: llP .
llx llp= l
It is important to understand that (2.3.2) defines a family of norms-the 2-norm
on R3x2 is a different function from the 2-norm on 1R5x6• Thus, the easily verified
inequality
(2.3.3)
is really an observation about the relationship between three different norms. Formally,
we say that norms Ji , /2, and '3 on Rmxq, Rmxn, and Rnxq are mutnally consistent
if for all matrices A E Rmxn and B E Rnxq we have ft (AB) � h(A)f3(B), or, in
subscript-free norm notation:
II AB II � II A 11 11 B 11. (2.3.4)

Not all matrix norms satisfy this property. For example, if II AIla = max laiil and
A=B= [ � � ] .
then IIABIla > IIAllall BIla· For the most part, we work with norms that satisfy
(2.3.4).
The p-norms have the important property that for every A E Rmxn and xE Rn
we have
IIAxllP $ II AllPllxllp·
More generally, for any vector norm II · Ila.on Rn and II · 11.Bon Rm we have II Ax11.B $
II Alla.,.BII xIla. where 11Alla.,.B is a matrix norm defined by
II Ax11.B
11Alla.,.B = sup
II II . (2.3.5)
x;il'O X °'
We say that II · lla.,.B is subordinate to the vector norms II · Ila. and II · 11.B· Since the
set {x E Rn : IIxIla. = 1} is compact and II · 11.B is continuous, it follows that
II Alla.,,B = max II Ax11.B = II Ax* 11.B
llxll.,=1
(2.3.6)
for some x* E Rn having unit a-norm.
2.3.2 Some Matrix Norm Properties
The Frobenius and p-norms (especially p = 1, 2, oo) satisfy certain inequalities that
are frequently used in the analysis of a matrix computation. If A E Rmxn we have
max laiil $ II A112 $ .;mii, max laijl,
iJ iJ
Jn II Alloo $ II A 112 $ rm IIAlloo'
1
rm llAll1 $ llAll2 $ vnllAll1·
(2.3.7)
(2.3.8)
(2.3.9)
(2.3.10)
(2.3.11)
(2.3.12)
(2.3.13)

2.3. Matrix Norms 73
The proofs of these relationships are left as exercises. We mention that a sequence
{A(k)} E Rmxn converges if there exists a matrix A E 1R.mxn such that
lim II A(k) - A II = 0.
k-+oo
The choice of norm is immaterial since all norms on 1R.mxn are equivalent.
2.3.3 The Matrix 2-Norm
A nice feature of the matrix 1-norm and the matrix oo-norm is that they are easy, O(n2)
computations. (See (2.3.9) and (2.3.10).) The calculation of the 2-norm is considerably
more complicated.
Theorem 2.3.1. If A E 1R.mxn, then there exists a unit 2-norm n-vector z such that
ATAz = µ2z where µ = II A 112·
Proof. Suppose z E Rn is a unit vector such that II Az 112 = II A 112· Since z maximizes
the function
( ) 1 II Ax 11� 1 xTATAx
9 x = 2 11 x 11�
= 2 xrx
it follows that it satisfies Vg(z) = 0 where Vg is the gradient of g. A tedious differen
tiation shows that for i = 1:n
In vector notation this says that ATAz = (zTATAz)z. The theorem follows by setting
µ = II Az 112· D
The theorem implies that II A II� is a zero ofp(.X) = det(ATA - A/). In particular,
We have much more to say about eigenvalues in Chapters 7 and 8. For now, we merely
observe that 2-norm computation is itl:!rative and a more involved calculation than
those of the matrix 1-norm or oo-norm. Fortunately, if the object is to obtain an
order-of-magnitude estimate of II A 112, then (2.3.7), {2.3.8), {2.3.11), or {2.3.12) can be
used.
As another example of norm analysis, here is a handy result for 2-norm estimation.
Corollary 2.3.2. If A E 1R.mxn, then II A 112 � Jll A 111 11 A 1100 •
Proof. If z =F 0 is such that ATAz = µ2z with µ = II A 112, then µ2 11 z 111 =
ll ATAz ll1 � ll AT ll1 ll A ll1 ll z ll1 = ll A ll00ll A ll1 ll z ll1· D

2.3.4 Perturbations and the Inverse
We frequently use norms to quantify the effect of perturbations or to prove that a
sequence of matrices converges to a specified limit. As an illustration of these norm
applications, let us quantify the change in A-1 as a function of change in A.
Lemma 2.3.3. If P E 1Rnxn and IIP llP < 1, then I - P is nonsingular and
00
(I - P)-1
= Lpk
k=O
with
Proof. Suppose I- P is singular. It follows that (I - P)x = 0 for some nonzero x. But
then IIx llP = II Px llP implies II F llP � 1, a contradiction. Thus, I - P is nonsingular.
To obtain an expression for its inverse consider the identity
(tpk)(I - P) = I - PN+l.
k=O
Since II P llP < 1 it follows that lim pk=0 because II pkllP
�
IIF 11;. Thus,
k-+oo
(lim tpk)(I - P) = I.
N-+ook=O
N
It follows that (I - F)-1 = lim Lpk.From this it is easy to show that
N-+ook=O 00
II (I - F)-l llp
�
L II F 11;
k=O 1
completing the proof of the theorem. 0
Note that II (I - F)-1 - I llP
�
IIF llP/(1 - IIP llP) is a consequence of the lemma.
Thus, if f « 1, then 0(€) perturbations to the identity matrix induce O(E) perturba
tions in the inverse. In general, we have
Theorem 2.3.4. IfA is nonsingular and r = II A-1E llP < 1, then A+E is nonsingular
and
II(A + E)-1 - A-1 II
� II E llp II A-l
II�
P 1 - r
Proof. Note that A + E = (I + P)A where P = -EA-1• Since IIP llP =r < 1, it
follows from Lemma 2.3.3 that I + P is nonsingular and II (I + P)-1 llP � 1/(1 - r).

2.3. Matrix Norms
Thus, (A + E)-1 = A-1(! + F)-1 is nonsingular and
The theorem follows by taking norms. D
2.3.5 Orthogonal Invariance
If A E R.mxn and the matrices Q E R.mxm and Z E R.nxnare orthogonal, then
and
II QAZ 112 = II A 112 .
75
(2.3.14)
(2.3.15)
These properties readily follow from the orthogonal invariance of the vector 2-norm.
For example,
n n
II QA 11�- = L II QA(:,j) II� L ll A(:,j) ll� = ll A ll!
j=l j=l
and so II Q(AZ) II! = II (AZ) II! = II zTAT II! = II AT II! = II A II!·
Problems
P2.3.1 Show II AB llp � II A llp ll B llp where 1 � p � oo.
P2.3.2 Let B be any submatrix of A. Show that II B llp � 11 A llp·
P2.3.3 Show that if D = diag(µi , . . . , µk) E RTn x n with k = min{m, n}, then II D llP = max lµ; I .
P2.3.4 Verify (2.3.7) and (2.3.8).
P2.3.5 Verify (2.3.9) and (2.3.10).
P2.3.6 Verify (2.3.11) and (2.3.12).
P2.3.7 Show that if 0 f. .� E Rn and E E w x n, then
II Es ll�
- --;T';- .
P2.3.B Suppose u E RTn and v E Rn . Show that if E = uvT, then II E ll F = II E lb = II u 11211 v 112 and
II E llCX> � II u ll CX> ll v 111 ·
P2.3.9 Suppose A E RTn x n, y E RTn , and 0 f. s E Rn . Show that E = (y - As)sT/sTs has the
smallest 2-norm of all m-by-n matrices E that satisfy (A + E)s = y.
P2.3.10 Verify that there exists a scalar c > 0 such that
II A 116.,c = max claij I
i, j
satisfies the submultiplicative property (2.3.4) for matrix norms on Rn x n. What is the smallest value
for such a constant? Referring to this value as c. , exhibit nonzero matrices B and C with the property
that II BC 116.,c. = II B 116.,c. II C 116.,c. ·
P2.3.11 Show that if A and B are matrices, then 11 A ® B ll F = II A 11"" 11 B llF ·

For further discussion of matrix norms, see Stewart (IMC) as well as:
F.L. Bauer and C.T. Fike (1960). "Norms and Exclusion Theorems," Numer. Math. 2, 137-144.
L. Mirsky (1960). "Symmetric Gauge Functions and Unitarily Invariant Norms," Quart. J. Math. 11,
50-59.
A.S. Householder (1964). The Theory of Matrices in Numerical Analysis, Dover Publications, New
York.
N.J. Higham (1992). "Estimating the Matrix p-Norm," Numer. Math. 62, 539-556.
It is fitting that the first matrix decomposition that we present in the book is the
singular value decomposition (SVD). The practical and theoretical importance of the
SVD is hard to overestimate. It has a prominent role to play in data analysis and in
the characterization of the many matrix "nearness problems."
2.4.1 Derivation
The SVD is an orthogonal matrix reduction and so the 2-norm and Frobenius norm
figure heavily in this section. Indeed, we can prove the existence of the decomposition
using some elementary facts about the 2-norm developed in the previous two sections.
Theorem 2.4.1 (Singular Value Decomposition ). If A is a real m-by-n matrix,
then there exist orthogonal matrices
such that
U = [ U1 I · · · I Um ] E 1Rmxm and V = [ V1 I · · · I Vn ] E JRnxn
UTAV = E = diag(ui . . . . , up) E 1Rmxn,
where u1 � u2 � • . • � O'p � 0.
p = min{m, n},
Proof. Let x E 1Rn and y E 1Rm be unit 2-norm vectors that satisfy Ax = uy with
O' = II A 112· From Theorem 2.1.1 there exist V2 E 1Rnx(n-l) and U2 E 1Rmx(m-l) so
V = [ x I '2 ] E 1Rnxn and U = [ y I U2 ] E 1Rmxm are orthogonal. It is not hard to show
that
where w E Rn-l and B E R(m-l)x(n-1>. Since
we have 11 A1 11� � (u2+wTw). But u2 = II A II� = II A1 II�, and so we must have w = 0.
An obvious induction argument completes the proof of the theorem. D
The O'i are the singular values of A, the Ui are the left singular vectors of A, and the
Vi are right singular vectors of A. Separate visualizations of the SVD are required

2.4. The Singular Value Decomposition 77
depending upon whether A has more rows or columns. Here are the 3-by-2 and 2-by-3
examples:
[U11 U12 U13 lT [au
U21 U22 U23 a21
U31 U32 U33 ll31
[ 'U11
U21
a12 l [ v
a u
22 v
a32 21
V12 Q
]
[0"1 0
l
V22 =
0 �
2
'
] [V11 V12 V13 l [
V21 V22 V23 = �l
V31 V32 V33
0 0 ]
0"2 0
.
Jn later chapters, the notation cri(A) is used to designate the ith largest singular value
of a matrix A. The largest and smallest singular values are important and for them we
also have a special notation:
O"max(A) = the largest singular value of matrix A,
O"min(A) = the smallest singular value of matrix A.
2.4.2 Properties
We establish a number of important corollaries to the SVD that are used throughout
the book.
Corollary 2.4.2. !JUTAV = E is the SVD ofA E JRmxn and m 2:: n, then for i = l:n
Avi = O"iUi and ATUi = O"(Vi.
Proof. Compare columns in AV = UE and ATU = VET. D
There is a nice geometry behind this result. The singular values of a matrix A are the
lengths of the semiaxes of the hyperellipsoid E defined by E = { Ax : II x 112 = 1 }. The
semiaxis directions are defined by the 'Ui and their lengths are the singular values.
It follows immediately from the corollary that
ATAvi = cr'fvi,
AATUi = cr'fui
(2.4.1)
(2.4.2)
for i = l:n. This shows that there is an intimate connection between the SVD of A
and the eigensystems of the symmetric matrices ATA and AAT. See §8.6 and §10.4.
The 2-norm and the Frobenius norm have simple SVD characterizations.
Corollary 2.4.3. If A E JRmxn, then
where p = min{m, n}.
Proof. These results follow immediately from the fact that II UTAV II = II E II for both
the 2-norm and the Frobenius norm. D

We show in §8.6 that if A is perturbed by a matrix E, then no singular value can move
by more than II E 112 . The following corollary identifies two useful instances of this
result.
Corollary 2.4.4. If A E IRmxn and E E IRmxn, then
O"max(A + E) < O"max(A) + II E 112,
O"min(A + E) > O"min(A) - II E 112·
Proof. Using Corollary 2.4.2 it is easy to show that
O"min(A) · II X 112 � II Ax 112 � O"max(A) · II X 112·
The required inequalities follow from this result. D
If a column is added to a matrix, then the largest singular value increases and the
smallest singular value decreases.
Corollary 2.4.5. If A E IRmxn, m > n, and z E IRm, then
O"max ( [ A I Z ] ) > O"max(A),
O"min ( [ A I Z ] ) < O"min(A).
Proof. Suppose A = UEVT is the SVD of A and let x = V(:, 1) and A = [ A I z ].
Using Corollary 2.4.4, we have
O"max(A) = II Ax 112 =
IIA [ � ] 112
< O"max(A).
The proof that O"min(A) 2: O"min(A) is similar. D
The SVD neatly characterizes the rank of a matrix and orthonormal bases for
both its nullspace and its range.
Corollary 2.4.6. If A has r positive singular values, then rank(A) = r and
null(A) = span{Vr+li . . . , Vn},
ran(A) = span{u1 , . . . , Ur}.
Proof. The rank of a diagonal matrix equals the number of nonzero diagonal entries.
Thus, rank(A) = rank(E) = r. The assertions about the nullspace and range follow
from Corollary 2.4.2. D

2.4. The Singular Value Decomposition 79
If A has rank r , then it can be written as the sum of r rank-I matrices. The SVD
gives us a particularly nice choice for this expansion.
Corollary 2.4.7. If A E :nrxn and rank(A) = r, then
r
A = I >iUiVr.
i=l
Proof. This is an exercise in partitioned matrix multiplication:
r
[vf l
(UE) VT = ([ a1u1 I a2u2 I . . · I UrUr I 0 I · · · I 0 ] )
v
� = L:aiuivf.
i=l
D
The intelligent handling of rank degeneracy is an important topic that we discuss in
Chapter 5. The SVD has a critical role to play because it can be used to identify
nearby matrices of lesser rank.
Theorem 2.4.8 (The Eckhart-Young Theorem). If k < r = rank(A) and
k
then
Ak = Luiuiv[,
i=l
min II A - B 112 = II A - Ad2
rank(B)=k
(2.4.3)
{2.4.4)
Proof. Since UTAkV = diag(a1, . . . , ak, O, . . . , O) it follows that Ak is rank k. More-
over, UT(A - Ak )V = diag(O, . . . , 0, O"k+i. . . . ,up) and so II A - Ak 112 = O"k+l·
Now suppose rank(B) = k for some B E Rmxn. It follows that we can find
orthonormal vectors x1, . . . ,Xn-k so null(B) = span{x1, . . . ,Xn-k}· A dimension argu
ment shows that
span{x1, . . . , Xn-d n span{v1, . . . , Vk+i} '# {O}.
Let z be a unit 2-norm vector in this intersection. Since Bz = 0 and
k+l
Az = L ui(v[z)ui,
we have
i=l
II A - B II� � II (A - B)z II� = II Az II�
completing the proof of the theorem. D
k+l
L: ut(v[z)2 > a�+l•
i=l
Note that this theorem says that the smallest singular value of A is the 2-norm distance
of A to the set of all rank-deficient matrices. We also mention that the matrix Ak
defined in (2.4.3) is the closest rank-k matrix to A in the Frobenius norm.

2.4.3 The Thin SVD
If A = UEVT E 1Rmxn is the SYD of A and m � n, then
A = U1E1VT
where
U1 = U(:, l:n) = [ U1 I · · · I Un ] E 1Rmxn
and
E1 = E(l:n, l:n)
We refer to this abbreviated version of the SYD as the thin SVD.
2.4.4 Unitary Matrices and the Complex SVD
Over the complex field the unitary matrices correspond to the orthogonal matrices.
In particular, Q E <Cnxn is unitary if QHQ = QQH = In. Unitary transformations
preserve both the 2-norm and the Frobenius norm. The SYD of a complex matrix
involves unitary matrices. If A E ccmxn, then there exist unitary matrices U E ccmxm
and V E <Cnxn such that
p = min{m , n}
where a1 � a2 � • • • � up � 0. All of the real SYD properties given above have obvious
complex analogs.
Problems
P2.4.1 Show that if Q = Q1 + iQ2 is unitary with Q1 , Q2 E Rn xn, then the 2n-by-2n real matrix
is orthogonal.
P2.4.2 Prove that if A E Rmxn, then
Umax(A) max
y E Rm
x e nn
P2.4.3 For the 2-by-2 matrix A = [ � � ], derive expressions for Umax(A) and Umin(A) that are
functions of w, x, y, and z.
P2.4.4 Show that any matrix in Rmx " is the limit of a sequence of full rank matrices.
P2.4.5 Show that if A E R"'xn has rank n, then II A(ATA)-1AT 112 = 1.
P2.4.6 What is the nearest rank-1 matrix to
in the Frobenius norm?
A -
[ 1 M ]
- 0 1
P2.4.7 Show that if A E R"' xn, then II A llF � Jrank{A) II A 112, thereby sharpening {2.3.7).
P2.4.8 Suppose A E R'xn. Give an SVD solution to the following problem:
min ll A - B ll F·
det(B)=idet(A)I

2.5. Subspace Metrics 81
P2.4.9 Show that if a nonzero row is added to a matrix, then both the largest and smallest singular
values increase.
P2.4.10 Show that if Bu and 8v are real numbers and
then UTAV = E where
u = [ cos(7r/4)
sin(7r/4)
A =
[ cos(8u)
cos(8v)
sin(8u) ]
sin(8v) '
- sin(7r/4) ] _ [ cos(a)
, V -
cos(7r/4) sin(a)
- sin(a) ]
cos(a) '
and E = diag(v'2 cos(b), v'2 sin(b)) with a= (8v + 8u )/2 and b = (8v - 8u)/2.
Forsythe and Moler (SLAS) offer a good account of the SVD's role in the analysis of the Ax = b
problem. Their proof of the decomposition is more traditional than ours in that it makes use of the
eigenvalue theory for symmetric matrices. Historical SYD references include:
E. Beltrami (1873). "Sulle Funzioni Bilineari," Gionale di Mathematiche 11, 98-106.
C. Eckart and G. Young (1939). "A Principal Axis Transformation for Non-Hermitian Matrices," Bull.
AMS 45, 1 1 8-21.
G.W. Stewart (1993). "On the Early History of the Singular Value Decomposition," SIAM Review
35, 551 566.
One of the most significant developments in scientific computation has been the increased use of the
SYD in application areas that require the intelligent handling of matrix rank. This work started with:
C. Eckart, and G. Young (1936). "The Approximation of One Matrix by Another of Lower Rank,"
Psychometrika 1, 211- 218.
For generalizations ofthe SYD to infinite dimensional Hilbert space, see:
I.C. Gohberg and M.G. Krein (1969). Introduction to the Theory of Linear Non-Self Adjoint Opera
tors, Amer. Math. Soc., Providence, RI.
F. Smithies (1970). Integral Equations, Cambridge University Press, Cambridge.
Reducing the rank of a matrix as in Corollary 2.4.6 when the perturbing matrix is constrained is
discussed in:
J.W. Demmel (1987). "The Smallest Perturbation of a Submatrix which Lowers the Rank and Con
strained Total Least Squares Problems, SIAM J. Numer. Anal. �4, 199-206.
G.H. Golub, A. Hoffman, and G.W. Stewart (1988). "A Generalization of the Eckart-Young-Mirsky
Approximation Theorem." Lin. Alg. Applic. 88/89, 317-328.
G.A. Watson (1988). "The Smallest Perturbation of a Submatrix which Lowers the Rank of the
Matrix," IMA J. Numer. Anal. 8, 295-304.
2.5 Subspace Metrics
If the object of a computation is to compute a matrix or a vector, then norms are
useful for assessing the accuracy of the answer or for measuring progress during an
iteration. If the object of a computation is to compute a subspace, then to make
similar comments we need to be able to quantify the distance between two subspaces.
Orthogonal projections are critical in this regard. After the elementary concepts are
established we discuss the CS decomposition. This is an SYD-like decomposition that
is handy when we have to compare a pair of subspaces.

2.5.1 Orthogonal Projections
Let S � Rn be a subspace. P E Rnxn is the orthogonal projection onto S if ran(P) = S,
P2 = P, and pT = P. From this definition it is easy to show that if x E Rn, then
Px E S and (I - P)x E Sl..
If P1 and P2 are each orthogonal projections, then for any z E Rn we have
If ran(P1) = ran(P2) = S, then the right-hand side of this expression is zero, show
ing that the orthogonal projection for a subspace is unique. If the columns of V =
[ v1 I · · · I Vk ] are an orthonormal basis for a subspace S, then it is easy to show that
P = VVT is the unique orthogonal projection onto S. Note that if v E Rn, then
P = vvTjvTv is the orthogonal projection onto S = span{v}.
2.5.2 SVD-Related Projections
There are several important orthogonal projections associated with the singular value
decomposition. Suppose A = UEVT E Rmxn is the SVD of A and that r = rank(A).
If we have the U and V partitionings
then
VrVrT
- - T
VrVr =
UrU'f =
- - T
UrUr
r m-r r n-r
projection on to null(A)l. = ran(AT),
projection on to null(A),
projection on to ran(A),
projection on to ran(A)l. = null(AT).
2.5.3 Distance Between Subspaces
The one-to-one correspondence between subspaces and orthogonal projections enables
us to devise a notion of distance between subspaces. Suppose 81 and 82 are subspaces
of Rn and that dim(S1) = dim(S2). We define the distance between these two spaces
by
(2.5.1)
where Pi is the orthogonal projection onto Si. The distance between a pair of subspaces
can be characterized in terms of the blocks of a certain orthogonal matrix.
Theorem 2.5.1. Suppose
'
z = [ Z1 I Z2 ]
k n-k
are n-by-n orthogonal matrices. If 81 = ran(W1) and 82 = ran(Z1), then

2.S. Subspace Metrics 83
proof. We first observe that
dist(S1, 82) = II W1w[ - Z1Z[ 112
= II wT(W1W[ - Z1Z[)z112
Note that the matrices WfZ1 and W[Z2 are submatrices of the orthogonal matrix
(2.5.2)
Our goal is to show that II Q21 112 = II Q12 112 . Since Q is orthogonal it follows from
Q [x l = [Qux l
0 Q21X
that 1 = II Q11x II� + II Q21X II� for all unit 2-norm x E JR.k. Thus,
II Q21 II� = max II Q21X II� = 1 -
min II Qux II� = 1 - Umin(Qu)2.
llxll2=l llxll2=l
Analogously, by working with QT (which is also orthogonal) it is possible to show that
and therefore
II Q12 II� = 1 - Umin(Q11)2.
Thus, II Q21 1'2 = II Q12 112· D
Note that if 81 and 82 are subspaces in Ile with the same dimension, then
It is easy to show that
dist(S1 , 82) = 0 ::::} 81 = 82,
dist(Si . 82) = 1 ::::} 81 n S:f -:/; {O}.
A more refined analysis of the blocks of the matrix Q in (2.5.2) sheds light on the dif
ference between a pair of subspaces. A special, SVD-like decomposition for orthogonal
matrices is required.

2.5.4 The CS Decomposition
The blocks of an orthogonal matrix partitioned into 2-by-2 form have highly related
SVDs. This is the gist of the CS decomposition. We prove a very useful special case
first.
Theorem 2.5.2 (The CS Decomposition (Thin Version)). Consider the matri.i:
where m1 � ni and m2 � ni . If the columns of Q are orthonormal, then there exist
orthogonal matrices U1 E Rm1 xmi , U2 E Rm2 xm2
, and Vi E Rn1 xni such that
where
and
Co = diag( cos(Oi), . . . , cos(On1 ) ) E Rm1 xn, ,
So = diag( sin(01), . . . , sin(On1 ) ) E Rm2 xni ,
Proof. Since 11 Qi 112 $ II Q 112 = 1, the singular values of Qi arc all in the interval
[O, 1]. Let
Co = diag(c1, . . . , cn1 ) =
t n1 - t
be the SVD of Q1 where we assume
1 = CJ = · · · = Ct > Ct+l � · · · � Cn1 � 0.
To complete the proof of the theorem we must construct the orthogonal matrix U2. If
Q2Vi = [ W1 I W2 l
n1 - t
then
Since the columns of this matrix have unit 2-norm, W1 = 0. The columns of W2 are
nonzero and mutually orthogonal because
W[W2 = ln1 -t - ETE = diag(l - c�+l• . . . , 1 - c;1 )

2.5. Subspace Metrics 85
is nonsingular. If sk = Jl - c% for k = l:n1 , then the columns of
Z = W2 diag(l/St+1 , . . . , 1/Sn)
are orthonormal. By Theorem 2.1.1 there exists an orthogonal matrix U2 E 1Rm2xm2
with U2(:,t + l:ni) = Z. It is easy to verify that
u:rQ2Vi = diag(si, . . . , Sn1 ) = So.
Since c� + s� = 1 for k = l:n1, it follows that these quantities are the required cosines
and sines. D
By using the same techniques it is possible to prove the following, more general version
of the decomposition:
Theorem 2.5.3 (CS Decomposition). Suppose
Q - -
- -
is a square orthogonal matrix and that m1 2 n1 and m1 2 m2. Define the nonnegative
integers p and q by p = max{O, n1 - m2} and q = max{O, m2 - ni }. There exist
orthogonal U1 E 1R"'1 xm, , U2 E Drn2x"'2, Vi E 1Rn1 xni , and V2 E 1Rn2xn2 such that if
U =
[*l and
then
I 0 0 0 0 p
0 c s 0 0 ni-P
urQv 0 0 0 0 I m1-n1
0 s - C 0 0 n1-P
0 0 0 I 0 q
p ni-P n1-p q m1-n1
where
C = diag( cos(Op+i), . . . , cos(On, ) ) = diag(ep+i , . . . , Cn, ),
S = diag( sin(Op+l) , . . . , sin(On, ) ) = diag(sp+l , . . . , Sn1 ),
and 0 � Op+l � · · · � Bn1 � rr/2.
Proof. See Paige and Saunders (1981) for details. D
We made the assumptions m1 2 n1 and m1 2 m2 for clarity. Through permutation and
transposition, any 2-by-2 block orthogonal matrix can be put into the form required

by the theorem. Note that the blocks in the transformed Q, i.e., the U[QiiV,, are
diagonal-like but not necessarily diagonal. Indeed, as we have presented it, the CS
decomposition gives us four unnormalized SVDs. If Q21 has more rows than columns,
then p = 0 and the reduction looks like this (for example):
C1 0 81 0 0 0 0
0 C2 0 82 0 0 0
0 0 0 0 0 1 0
UTQV 0 0 0 0 0 0 1
81 0 -C1 0 0 0 0
0 82 0 -c2 0 0 0
0 0 0 0 1 0 0
On the other hand, if Q21 has more columns than rows, then q = 0 and the decompo
sition has the form
1 0 0 0 0
I
Q C2 Q S2 Q
0 0 C3 Q 83 •
0 82 0 -C2 0
0 0 83 0 -C3
Regardless of the partitioning, the essential message of the CS decomposition is that
the SVDs of the Q-blocks are highly related.
Problems
P2.5.1 Show that if P is an orthogonal projection, then Q = I - 2P is orthogonal.
P2.5.2 What are the singular values of an orthogonal projection?
P2.5.3 Suppose S1 = span{x} and S2 = span{y}, where x and y are unit 2-norm vectors in R2.
Working only with the definition of dist(., ·), show that dist(S1 , S2) = Ji - (xTy)2, verifying that
the distance between S1 and S2 equals the sine of the angle between x and y.
P2.5.4 Refer to §1.3.10. Show that if Q E R2nx2n is orthogonal and symplectic, then Q has the form
P2.5.5 Suppose P E R"xn and P2 = P. Show that II P 112 > 1 if null(P) is not a subspace of ran(A).l..
Such a matrix is called an oblique projector. Sec Stewart (2011).
The computation of the CS decomposition is discussed in §8.7.6. For a discussion of its analytical
properties, see:
C. Davis and W. Kahan (1970). "The Rotation of Eigenvectors by a Perturbation Ill," SIAM J.
Numer. Anal. 7, 1-46.
G.W. Stewart (1977). "On the Perturbation of Pseudo-Inverses, Projections and Linear Least Squares
Problems," SIAM Review 19, 634-662.
C.C. Paige and M. Saunders (1981). "Toward a Generalized Singular Value Decomposition,'' SIAM
J. Numer. Anal. 18, 398-405.
C.C. Paige and M. Wei (1994). "History and Generality of the CS Decomposition,'' Lin. Alg. Applic.
208/209, 303-326.
A detailed numerical discussion of oblique projectors (P2.5.5) is given in:
G.W. Stewart (2011). "On the Numerical Analysis of Oblique Projectors,'' SIAM J. Matrix Anal.
Applic. 32, 309-348.

2.6. The Sensitivity of Square Systems 87
We use tools developed in previous sections to analyze the linear system problem Ax = b
where A E IR''xn is nonsingular and b E IR" . Our aim is to examine how perturbations
in A and b affect the solution x. Higham (ASNA) offers a more detailed treatment.
2.6.1 An SVD Analysis
If
A
is the SVD of A, then
n
L O"(UiV'[
i= l
(2.6.1)
This expansion shows that small changes in A or b can induce relatively large changes
in x if an is small.
It should come as no surprise that the magnitude of an should have a bearing
on the sensitivity of the Ax = b problem. Recall from Theorem 2.4.8 that an is the
2-norm distance from A to the set of singular matrices. As the matrix of coefficients
approaches this set, it is intuitively clear that the solution x should be increasingly
sensitive to perturbations.
2.6.2 Condition
A precise measure of linear system sensitivity can be obtained by considering the
parameterized system
x(O) = x,
where F E IRnxn and f E IRn. If A is nonsingular, then it is clear that .x(<:) is differen
tiable in a neighborhood of zero. Moreover, x(O) = A-1(! - Fx) and so the Taylor
series expansion for x (<:) has the form
Using any vector norm and consistent matrix norm we obtain
(2.6.2)
For square matrices A define the condition number 11:(A) by
11:(A) = II A 11 11 A-1 II (2.6.3)
with the convention that 11:(A) = oc for singular A. From 11 b 11 < 11 A 11 11 x II and
(2.6.2) it follows that
(2.6.4)

where
llFll llfll
pA = I € I fAiT and Pb = I € I m
represent the relative errors in A and b, respectively. Thus, the relative error in x can
be K(A) times the relative error in A and b. In this sense, the condition number K(A)
quantifies the sensitivity of the Ax=b problem.
Note that K(·) depends on the underlying norm and subscripts are used accord
ingly, e.g.,
(2.6.5)
Thus, the 2-norm condition of a matrix Ameasures the elongation ofthe hyperellipsoid
{Ax: llxll2 = l}.
We mention two other characterizations of the condition number. For p-norm
condition numbers, we have
1
=
Kp(A) min
A+�A singular
ll�AllP
II Allp . (2.6.6)
This result may be found in Kahan (1966) and shows that Kp(A) measures the relative
p-norm distance from A to the set of singular matrices.
For any norm, we also have
K(A) = lim sup
f--+0 ll�Ail$filAll
II (A+�A)-1 -A-1 II 1
€ II A-1 II
(2.6.7)
This imposing result merely says that the condition number is a normalized Frechet
derivative of the map A � A-1. Further details may be found in Rice (1966). Recall
that we were initially led to K(A) through differentiation.
If K(A) is large, then A is said to be an ill-conditioned matrix. Note that this is
a norm-dependent property.1 However, any two condition numbers Ka(·) and K13(·) on
Rnxn are equivalent in that constants c1 and c2 can be found for which
C1Ka(A) � K13(A) � C2Ka(A),
For example, on Rnxn we have
1
- K2(A) � Ki(A) < nK2(A),
n
(2.6.8)
Thus, if a matrix is ill-conditioned in the a-norm, it is ill-conditioned in the ,6-norm
modulo the constants c1 and c2 above.
For any of the p-norms, we have Kv(A) 2: 1. Matrices with small condition num
bers are said to be well-conditioned. In the 2-norm, orthogonal matrices are perfectly
conditioned because if Q is orthogonal, then K2(Q) = 11 QlbllQT112 = 1.
1It also depends upon the definition of "large." The matter is pursued in §3.5

2.6.3 Determinants and Nearness to Singularity
It is natural to consider how well determinant size measures ill-conditioning. Ifdet(A) =
o is equivalent to singularity, is det(A) ::::::: 0 equivalent to near singularity? Unfortu
nately, there is little correlation between det(A) and the condition of Ax = b. For
example, the matrix Bn defined by
Bn �
[! -: ::: =I:lE ir'" (2.6.9)
has unit determinant, but K,00(Bn) = n · 2n-1. On the other hand, a very well
conditioned matrix can have a very small determinant. For example,
Dn = diag(l0-1 , . . . ' 10-1) E JR.nxn
satisfies "'p(Dn) = 1 although det(Dn) = 10-n.
2.6.4 A Rigorous Norm Bound
Recall that the derivation of (2.6.4) was valuable because it highlighted the connection
between "'(A) and the rate ofchange of x(i:) at f = 0. However, it is a little unsatisfying
because it is contingent on f being "small enough" and because it sheds no light on
the size of the O(i:2) term. In this and the next subsection we develop some additional
Ax = b perturbation theorems that are completely rigorous.
We first establish a lemma that indicates in terms of "'(A) when we can expect a
perturbed system to be nonsingular.
Lemma 2.6.1. Suppose
Ax b,
(A + �A)y b + �b, �A E 1Rnxn, �b E JR.n,
with ll �A ll $ i: ll A ll and ll �b ll $ i: ll b ll· lf e "'(A) = r < 1, then A + �A is
nonsingular and
11 Y II 1 + r
IIx II
::::;
1 - r
Proof. Since 11 A-1�A 11 $ f II A-1 II II A II = r < 1 it follows from Theorem 2.3.4
that (A + �A) is nonsingular. Using Lemma 2.3.3 and the equality
we find
(I + A-1�A)y = x + A-1�b
IIY II ::::; II (I + A-1�A)-1 II (II x II +i: II A-1 11 11 b II)
::::;
1 �r (II x II + i: II A-1 11 11 b 11) =
1 �r (11 x 11 + r
1
1
!1
1i).

Since 11 b 11 = 11 Ax 11 :5 II A 11 11x II it follows that
1
IIy II :5
1 - r
(IIx II+ rll x II)
and this establishes the required inequality. 0
We are now set to establish a rigorous Ax = b perturbation bound.
Theorem 2.6.2. If the conditions of Lemma 2.6. 1 hold, then
ll y - x II < �K(A).
IIx II 1 - r
Proof. Since
we have
Thus,
ll y - x ll ll b ll ll Y ll ( l + r)
II x II :5 f K(A)
II A 11 11x II
+ f K(A)
w :5 f 1 + 1 - r
K(A),
from which the theorem readily follows. 0
(2.6.10)
(2.6.11)
A small example helps put this result in perspective. The Ax = b problem
has solution x = [ 1 , 1 jT and condition K00(A) = 106. If .6.b = [ 10-6 , 0 ]T, .6.A = 0,
and (A + .6.A)y = b + .6.b, then y = [ 1 + 10-5 , 1 jT and the inequality (2.6.10) says
10_6 = IIX - Y lloo « II .6.b lloo Koo{A) = 10-6106 = 1.
IIx lloo II b lloo
Thus, the upper bound in {2.6.10) can be a gross overestimate of the error induced by
the perturbation.
On the other hand, if .6.b = ( 0 , 10-6 f, .6.A = 0, and (A + .6.A)y = b + .6.b, then
this inequality says that
Thus, there are perturbations for which the bound in {2.6.10) is essentially attained.

2.6.5 More Refined Bounds
An interesting refinement of Theorem 2.6.2 results if we extend the notion of absolute
value to matrices:
This notation together with a matrix-level version of ":'.S" makes it easy to specify
componentwise error bounds. If F, G E Rmxn, then
IFI :'.S IGI
for all i and j. Also note that if F E Rmxq and G E Rqxn, then IFGI :'.S IFI · IGI. With
these definitions and facts we obtain the following refinement of Theorem 2.6.2.
Ax = b,
and that l�AI :'.S EIAI and l�bl :'.S Elbl. If81too(A) = r < 1, then (A+�A) is nonsingular
and
(2.6.12)
Proof. Since II �A lloo :'.S Ell A lloo and II �b lloo :'.S Ell b lloo the conditions of Lemma
2.6.1 are satisfied in the infinity norm. This implies that A + �A is nonsingular and
Now using (2.6.11) we find
II y 1100 1 + r
-- < -
II X lloo - 1 - r·
jy - xi :'.S IA-1 1 l�bl + IA-1 1 l�AI IYI
:'.S EIA-1 1 lbl + EIA-11 IAI IYI :'.S EIA-1 1 IAI (lxl + jyl) .
If we take norms, then
( 1 + r )
ll y - x lloo :'.S Ell lA-1l lAl ll00 ll x lloo + 1 _ r ll x lloo ·
The theorem follows upon division by II x II00. 0
The quantity II IA-11 IAI 1100 is known as the Skeel condition number and there are
examples where it is considerably less than 1t00(A). In these situations, (2.6.12) is
more informative than (2.6.10).
Norm bounds are frequently good enough when assessing error, but sometimes it
is desirable to examine error at the component level. Oettli and Prager (1964) have
an interesting result that indicates if an approximate solution x E Rn to the n-by-n

system Ax = b satisfies a perturbed system with prescribed structure. Consider the
problem of finding �A E Rnxn, �b E Rn, and w ;::: 0 such that
(A + �A)x = b + �b l�AI � wlEI , l�bl � wl/I . (2.6.13)
where E E Rnxn and f E Rn are given. With proper choice of E and /, the perturbed
system can take on certain qualities. For example, if E = A and f = b and w is small,
then x satisfies a nearby system in the componentwise sense. The authors show that
for a given A, b, x, E, and f the smallest w possible in (2.6.13) is given by
IAx - bli
Wmin =
(IEI · lxl + l/l)i
.
If Ax = b, then Wmin = 0. On the other hand, if Wmin = oo, then x does not satisfy
any system of the prescribed perturbation structure.
Problems
P2.6.1 Show that if II I II � 1, then 1t(A) � 1.
P2.6.2 Show that for a given norm, 11:(AB) :::; 1t(A)1t(B) and that 1t(oA) = it(A) for all nonzero o.
P2.6.3 Relate the 2-norm condition of X E ll'"'xn (m � n) to the 2-norm condition of the matrices
and C = [ � ] ·
P2.6.4 Suppose A E ll"x n is nonsingular. Assume for a particular i and j that there is no way to
make A singular by changing the value of aii · What can you conclude about A-1? Hint: Use the
Sherman-Morrison formula.
P2.6.5 Suppose A E Rnxn is nonsingular, b E Rn , Ax = b, and C = A-1 . Use the Sherman-Morrison
formula to show that
The condition concept is thoroughly investigated in:
J. Rice (1966). "A Theory of Condition," SIAM J. Nu.mer. Anal. 3, 287-310.
W. Kahan (1966). "Numerical Linear Algebra," Canadian Math. Bull. 9, 757-801.
References for componentwise perturbation theory include:
W. Oettli and W. Prager (1964). "Compatibility of Approximate Solutions of Linear Equations with
Given Error Bounds for Coefficients and Right Hand Sides," Nu.mer. Math. 6, 405-409.
J.E. Cope and B.W. Rust (1979). "Bounds on Solutions of Systems with Accurate Data," SIAM J.
Nu.mer. Anal. 16, 95Q-63.
R.D. Skeel (1979). "Scaling for Numerical Stability in Gaussian Elimination," J. ACM 26, 494-526.
J.W. Demmel (1992). "The Componentwise Distance to the Nearest Singular Matrix," SIAM J.
Matrix Anal. Applic. 13, 10--19.
D.J. Higham and N.J. Higham (1992). "Componentwise Perturbation Theory for Linear Systems with
Multiple Right-Hand Sides," Lin. Alg. Applic. 174, 111- 129.
N.J. Higham (1994). "A Survey of Componentwise Perturbation Theory in Numerical Linear Algebra,"
in Mathematics of Computation 1943-1993: A Half Century of Computational Mathematics, W.
Gautschi (ed.), Volume 48 of Proceedings of Symposia in Applied Mathematics, American Mathe
matical Society, Providence, RI.

2.7. Finite Precision Matrix Computations 93
s. Chandrasekaren and l.C.F. Ipsen (1995). "On the Sensitivity of Solution Components in Linear
Systems of Equations," SIAM J. Matrix Anal. Applic. 16, 93- 112.
S.M. Rump (1999). "Ill-Conditioned Matrices Are Componentwise Near to Singularity," SIAM Review
41, 102-112.
The reciprocal of the condition number measures how near a given Ax = b problem is to singularity.
The importance of knowing how near is a given problem to a difficult or insoluble problem has come
to be appreciated in many computational settings, see:
A. Laub(1985). "Numerical Linear Algebra Aspects of Control Design Computations,'' IEEE Trans.
Autom. Control. AC-30, 97-108.
J.W. Demmel (1987). "On the Distance to the Nearest Ill-Posed Problem,'' Numer. Math. 51,
251-289.
N.J. Higham (1989). "Matrix Nearness Problems and Applications," in Applications ofMatrix Theory,
M.J.C. Gover and S. Barnett (eds.), Oxford University Press, Oxford, UK, 1-27.
Much has been written about problem sensitivity from the statistical point of view, see:
J.W. Demmel (1988). "The Probability that a Numerical Analysis Problem is Difficult," Math. Com
put. 50, 449-480.
G.W. Stewart (1990). "Stochastic Perturbation Theory," SIAM Review 82, 579-610.
C. S. Kenney, A.J. Laub, and M.S. Reese (1998). "Statistical Condition Estimation for Linear Sys
tems," SIAM J. Sci. Comput. 1.9, 566-583.
The problem of minimizing ii:z(A + UVT) where UVT is a low-rank matrix is discussed in:
C. Greif and J.M. Varah (2006). "Minimizing the Condition Number for Small Rank Modifications,"
SIAM J. Matrix Anal. Applic. 2.9, 82 97.
Rounding errors are part ofwhat makes the field of matrix computations so challenging.
In this section we describe a model of floating point arithmetic and then use it to
develop error bounds for floating point dot products, saxpys, matrix-vector products,
and matrix-matrix products.
2.7.1 A 3-digit Calculator
Suppose we have a base-10 calculator that represents nonzero numbers in the following
style:
where
{ 1 � do � 9,
0 � di � 9,
0 � d2 � 9,
-9 � e � 9.
Let us call these numbers floating point numbers. After playing around a bit we make
a number of observations:
• The precision of the calculator has to do with the "length" of the significand
do.d1d2. For example, the number 7r would be represented as 3.14 x 10°, which
has a relative error approximately equal to 10-3.
• There is not enough "room" to store exactly the results from most arithmetic
operations between floating point numbers. Sums and products like
(1.23 x 106) + (4.56 x 104) = 1275600,
(1.23 x 101) * (4.56 x 102) = 5608.8

involve more than three significant digits. Results must be rounded in order
to "fit" the 3-digit format, e.g., round{1275600) = 1.28 x 106, round{5608.8) =
5.61 x 103.
• If zero is to be a floating point number (and it must be), then we need a special
convention for its representation, e.g., 0.00 x 10°.
• In contrast to the real numbers, there is a smallest positive floating point number
{Nmin = 1.00x 10-9) and there is a largest positive floating point number (Nmax =
9.99 x 109).
• Some operations yield answers whose exponents exceed the 1-digit allocation,
e.g., {1.23 x 104) * (4.56 x 107) and {1.23 x 10-2)/(4.56 x 108).
• The set of floating point numbers is finite. For the toy calculator there are
2 x 9 x 10 x 10 x 19 + 1 = 34201 floating point numbers.
• The spacing between the floating point numbers varies. Between 1.00 x 10e and
1.00 x 10e+l the spacing is 10e-2•
The careful design and analysis of a floating point computation requires an understand
ing of these inexactitudes and limitations. How are results rounded? How accurate
is floating point arithmetic? What can we say about a sequence of floating point
operations?
2.7.2 I EEE Floating Point Arithmetic
To build a solid, practical understanding of finite precision computation, we set aside
our toy, motivational base-10 calculator and consider the key ideas behind the widely
accepted IEEE floating point standard. The IEEE standard includes a 32-bit single
format and a 64-bit double format. We will illustrate concepts using the latter as an
example because typical accuracy requirements make it the format of choice.
The importance of having a standard for floating point arithmetic that is upheld
by hardware manufacturers cannot be overstated. After all, floating point arithmetic
is the foundation upon which all of scientific computing rests. The IEEE standard pro
motes software reliability and enables numerical analysts to make rigorous statements
about computed results. Our discussion is based on the excellent book by Overton
{2001).
The 64-bit double format allocates a single bit for the sign of the floating point
number, 52 bits for the mantissa , and eleven bits for the exponent:
{2.7.1)
The "formula" for the value of this representation depends upon the exponent bits:
If ai . . . an is neither all O's nor all 1's, then x is a normalized floating point
number with value
{2.7.2)
The "1023 biaK' in the exponent supports the graceful inclusion of various "unnormal
ized" floating numbers which we describe shortly. Several important quantities capture

the finiteness of the representation. The machine epsilon is the gap between 1 and the
next largest floating point number. Its value is 2-52 � 10-15 for the double format.
Among the positive normalized floating point numbers, Nmin = 2-1022 � 10-308 is the
smallest and Nmax = (2 - 2-52)21023 � 10308 is the largest. A real number x is within
the normalized range if Nmin � lxl � Nmax·
If a1 . . . au is all O's, then the value of the representation (2.7.1) is
(2.7.3)
This includes 0 and the subnormal floating point numbers. This feature creates a
uniform spacing of the floating point numbers between -Nmin and +Nmin·
If a1 . . . au is all l's, then the encoding (2.7.1) represents inf for +oo, -inf for
-oo, or NaN for "not-a-number." The determining factor is the value of the bi. (If the bi
are not all zero, then the value of x is NaN.) Quotients like 1/0, -1/0, and 0/0 produce
these special floating point numbers instead of prompting program termination.
There are four rounding modes: round down (toward -oo), round up (toward
+oo), round-toward-zero, and round-toward-nearest. We focus on round-toward-nearest
since it is the mode ahnost always used in practice.
then
If a real number x is outside the range of the normalized floating point numbers
{ -OO if X < -Nmax ,
round(x) =
+oo if X > Nmax·
Otherwise, the rounding process depends upon its floating point "neighbors" :
x_ is the nearest floating point number to x that is � x,
x+ is the nearest floating point number to x that is 2::: x.
Define d_ = x - x _ and d+ = x+ - x and let "lsb" stand for "least significant bit." If
Nmin � lxl � Nmax1 then
( ) { x_ if d- < d+ or d_ = d+ and lsb(x-) = 0,
round x =
x+ if d+ < d_ or d+ = d_ and lsb(x+) = O.
The tie-breaking criteria is well-defined because x_ and x+ are adjacent floating point
numbers and so must differ in their least significant bit.
Regarding the accuracy of the round-to-nearest strategy, suppose x is a real num
ber that satisfies Nmin � lxl � Nmax· Thus,
2-52 2-52
lround(x) - xi � -
2
-2e � -
2
-lxl
which says that relative error is bounded by half of the machine epsilon:
lround(x) - xi
2_53
lxl
� ·
The IEEE standard stipulates that each arithmetic operation be correctly rounded,
meaning that the computed result is the rounded version of the exact result. The
implementation of correct rounding is far from trivial and requires registers that are
equipped with several extra bits of precision.
We mention that the IEEE standard also requires correct rounding in the square
root operation, the remainder operation, and various format conversion operations.

2.7.3 The "fl" Notation
With intuition gleaned from the toy calculator example and an understanding of IEEE
arithmetic, we are ready to move on to the roundoff analysis of some basic algebraic
calculations. The challenge when presenting the effects of finite precision arithmetic
in this section and throughout the book is to communicate essential behavior without
excessive detail. To that end we use the notation fl(· ) to identify a floating point
storage and/or computation. Unless exceptions are a critical part of the picture, we
freely invoke the fl notation without mentioning "-oo," "oo," "NaN," etc.
If x E IR, then fl(x) is its floating point representation and we assume that
fl(x) = x(l + 8), (2.7.4)
where u is the unit roundoff defined by
u = � x (gap between 1 and next largest floating point number). (2.7.5)
The unit roundoff for IEEE single format is about 10-7 and for double format it is
about 10-16.
If x and y are floating point numbers and "op" is any of the four arithmetic oper
ations, then fl(x op y) is the floating point result from the floating point op. Following
Trefethen and Bau (NLA), the fundamental axiom offloating point arithmetic is that
fl(x op y) = (x op y)(l + 8), 181 :::; u, (2.7.6)
where x and y are floating point numbers and the "op" inside the fl operation means
"floating point operation." This shows that there is small relative error associated with
individual arithmetic operations:
lfl(x op y) - (x op y)I
< u,
Ix op YI - x op y =f. 0.
Again, unless it is particularly relevant to the discussion, it will be our habit not to
bring up the possibilities of an exception arising during the floating point operation.
2.7.4 Become a Floating Point Thinker
It is a good idea to have a healthy respect for the subleties of floating point calculation.
So before we proceed with our first serious roundoff error analysis we offer three maxims
to keep in mind when designing a practical matrix computation. Each reinforces the
distinction between computer arithmetic and exact arithmetic.
Maxim 1 . Order is Important.
Floating point arithmetic is not associative. For example, suppose
x = 1.24 x 10°, y = -1.23 x 10°, z = 1.00 x 10-3,
Using toy calculator arithmetic we have
fl(fl(x + y) + z)) = 1.10 x 10-2

while
fl(x + fl(y + z)) = 1.00 x 10-2•
A consequence of this is that mathematically equivalent algorithms may produce dif
ferent results in floating point.
Maxim 2. Larger May Mean Smaller.
Suppose we want to compute the derivative of f(x) = sin{x) using a divided
difference. Calculus tells us that d = {sin(x+h) -sin(x))/h satisfies ld-cos{x)I = O(h)
which argues for making h as small as possible. On the other hand, any roundoff error
sustained in the sine evaluations is magnified by 1/h. By setting h = y'ii, the sum of
the calculus error and roundoff error is approximately minimized. In other words, a
value of h much greater than u renders a much smaller overall error. See Overton{2001,
pp. 70-72).
Maxim 3. A Math Book Is Not Enough.
The explicit coding of a textbook formula is not always the best way to design an
effective computation. As an example, we consider the quadratic equation x2- 2px-
q =
O where both p and q are positive. Here are two methods for computing the smaller
(necessarily real) root:
Method 1: rmin = p - Jp2 + q ,
Method 2: r -
q
min -
p + VP2 + q
·
The first method is based on the familiar quadratic formula while the second uses the
fact that -q is the product of rmin and the larger root. Using IEEE double format
arithmetic with input p = 12345678 and q = 1 we obtain these results:
Method 1: rmin = -4.097819328308106 X 10-8,
Method 2: rmin = -4.050000332100021 x 10-8 (correct).
Method 1 produces an answer that has almost no correct significant digits. It attempts
to compute a small number by subtracting a pair of nearly equal large numbers. Al
most all correct significant digits in the input data are lost during the subtraction, a
phenomenon known as catastrophic cancellation. In contrast, Method 2 produces an
answer that is correct to full machine precision. It computes a small number as a
division of one number by a much larger number. See Forsythe {1970).
Keeping these maxims in mind does not guarantee the production of accurate,
reliable software, but it helps.
2.7.5 Application: Storing a Real Matrix
Suppose A E lR.mxn and that we wish to quantify the errors associated with its floating
point representation. Denoting the stored version of A by fl(A), we see that
(2.7.7)

for all i and j, i.e.,
lfl(A) - Al :::; ulAI .
A relation such as this can be easily turned into a norm inequality, e.g.,
II fl(A) - A 111 :::; ull A 111·
However, when quantifying the rounding errors in a matrix manipulation, the absolute
value notation is sometimes more informative because it provides a comment on each
entry.
2.7.6 Roundoff in Dot Products
We begin our study of finite precision matrix computations by considering the rounding
errors that result in the standard dot product algorithm:
s = O
for k = l:n
s = s + XkYk
end
Here, x and y are n-by-1 floating point vectors.
(2.7.8)
In trying to quantify the rounding errors in this algorithm, we are immediately
confronted with a notational problem: the distinction between computed and exact
quantities. If the underlying computations are clear, we shall use the fl(·) operator to
signify computed quantities. Thus, fl(xry) denotes the computed output of (2.7.8).
Let us bound lfl(xry) - xryl. If
Sp = fl (�XkYk),
then s1 = X1Y1(l + 81) with l81I :::; u and for p = 2:n
Sp = fl(sp-1 + fl(xpyp))
= (sp-1 + XpYp(l + 8p)) (1 + f:p)
A little algebra shows that
where
n
fl(xTy) = Sn = L Xkyk(l + 'Yk)
k=l
n
(l + 'Yk) = (1 + 8k) II(l + f:j)
j=k
with the convention that f:1 = 0. Thus,
n
lfl(xry) - xTYI < L lxkYkll'Ykl·
k=l
(2.7.9)
(2.7.10)

To proceed further, we must bound the quantities l'YkI in terms of u. The following
result is useful for this purpose.
n
Lemma 2.7.1. If (1 + o:) = IT(1 + O:k) where lo:kl $ u and nu $ .01, then lo:I $
k=l
1.0lnu.
Proof. See Higham (ASNA, p. 75). D
Application of this result to (2.7.10) under the "reasonable" assumption nu $ .01 gives
lfl(xTy) - xTYI $ 1.0lnulxlTIYI· (2.7.11)
Notice that if lxTYI « lxlTlyl, then the relative error in fl(xTy) may not be small.
2. 7.7 Alternative Ways to Quantify Roundoff Error
An easier but less rigorous way ofbounding o: in Lemma 2.7.1 is to say lo:I $ nu+O(u2).
With this convention we have
(2.7.12)
Other ways of expressing the same result include
lfl(xTy) - xTYI $ </>(n)ulxlTIYI (2.7.13)
and
(2.7.14)
where </>(n) is a "modest" function of n and c is a constant of order unity.
We shall not express a preference for any of the error bounding styles shown in
{2.7.11)-(2.7.14). This spares us the necessity of translating the roundoff results that
appear in the literature into a fixed format. Moreover, paying overly close attention to
the details of an error bound is inconsistent with the "philosophy" of roundoff analysis.
As Wilkinson {1971, p. 567) says,
There is still a tendency to attach too much importance to the precise error
bounds obtained by an a priori error analysis. In my opinion, the bound
itself is usually the least important part of it. The main object of such an
analysis is to expose the potential instabilities, if any, of an algorithm so
that hopefully from the insight thus obtained one might be led to improved
algorithms. Usually the bound itself is weaker than it might have been
because of the necessity of restricting the mass of detail to a reasonable
level and because of the limitations imposed by expressing the errors in
terms of matrix norms. A priori bounds are not, in general, quantities
that should be used in practice. Practical error bounds should usually be
determined by some form of a posteriori error analysis, since this takes
full advantage of the statistical distribution of rounding errors and of any
special features, such as sparseness, in the matrix.
It is important to keep these perspectives in mind.

2.7.8 Roundoff in Other Basic Matrix Computations
It is easy to show that if A and B are floating point matrices and a is a floating point
number, then
fl(aA) = aA + E, IEI $ ulaAI, (2.7.15)
and
fl(A + B) = (A + B) + E, IEI $ ulA + BI. (2.7.16)
As a consequence of these two results, it is easy to verify that computed saxpy's and
outer product updates satisfy
fl(y + ax) = y + ax + z,
fl(C + uvT) = C + uvT + E,
lzl $ u (IYI + 2laxl) + O(u2),
IEI $ u (ICI + 2luvTI) + O(u2).
(2.7.17)
(2.7.18)
Using (2.7.11) it is easy to show that a dot-product-based multiplication of two floating
point matrices A and B satisfies
fl(AB) = AB + E, IEI $ nulAllBI + O(u2). (2.7.19)
The same result applies if a gaxpy or outer product based procedure is used. Notice
that matrix multiplication does not necessarily give small relative error since IABI may
be much smaller than IAllBI, e.g.,
[ 1 1 ] [ 1 0 ] = [ .01 0 ]
0 0 -.99 0 0 0
.
It is easy to obtain norm bounds from the roundoff results developed thus far. If we
look at the 1-norm error in floating point matrix multiplication, then it is ea..<>y to show
from (2.7.19) that
(2.7.20)
2.7.9 Forward and Backward Error Analyses
Each roundoff bound given above is the consequence of a forward error analysis. An
alternative style of characterizing the roundoff errors in an algorithm is accomplished
through a technique known as backward error analysis. Here, the rounding errors are
related to the input data rather than the answer. By way of illustration, consider the
n = 2 version of triangular matrix multiplication. It can be shown that:
[a11b11(1 + f1) (a11b12(l + f2) + ai2b22(l + f3))(l + f4) l
fl(AB) =
0 a22�2(l + Es)
where kil $ u, for i = 1:5. However, if we define

2.7. Finite Precision Matrix Computations
and
,
_
[bn(l + Ei ) b12(l + E2)(l + E4) l
B - '
0 b22
then it is easily verified that fl(AB) = AB. Moreover,
A = A + E,
fJ = B + F,
IEI < 2ulAI + O(u2),
!Fl < 2ulBI + O(u2).
101
which shows that the computed product is the exact product of slightly perturbed A
and B.
2.7.10 Error in Strassen Multiplication
In §1.3.11 we outlined a recursive matrix multiplication procedure due to Strassen. It is
instructive to compare the effect of roundoff in this method with the effect of roundoff
in any of the conventional matrix multiplication methods of §1.1.
It can be shown that the Strassen approach (Algorithm 1.3.1) produces a C =
fl(AB) that satisfies an inequality �f the form (2.7.20). This is perfectly satisfactory in
many applications. However, the C that Strassen's method produces does not always
satisfy an inequality of the form (2.7.19). To see this, suppose that
A = B =
[ .99 .0010 l
.0010 .99
and that we execute Algorithm 1.3.1 using 2-digit floating point arithmetic. Among
other things, the following quantities are computed:
F3 = fl(.99(.001 - .99)) = -.98,
A = fl((.99 + .001).99) = .98,
C12 = fl(F3 + A) = 0.0.
In exact arithmetic c12 = 2(.001)(.99) = .00198 and thus Algorithm 1.3.l produces a
c12 with no correct significant digits. The Strassen approach gets into trouble in this
example because small off-diagonal entries are combined with large diagonal entries.
Note that in conventional matrix multiplication the sums b12 +b22 and an +ai2 do not
arise. For that reason, the contribution of the small off-diagonal elements is not lost
in this example. Indeed, for the above A and B a conventional matrix multiplication
gives C12 = .0020.
Failure to produce a componentwise accurate C can be a serious shortcoming in
some applications. For example, in Markov processes the aij, bij, and Cij are transition
probabilities and are therefore nonnegative. It may be critical to compute Cij accurately
if it reflects a particularly important probability in the modeled phenomenon. Note
t�at if A � 0 and B � 0, then conventional matrix multiplication produces a product
C that has small componentwise relative error:
IC - Cl :::; nulAI IBI + O(u2) = nulCI + O(u2) .

This follows from (2.7.19). Because we cannot say the same for the Strassen approach,
we conclude that Algorithm 1.3.1 is not attractive for certain nonnegative matrix mul
tiplication problems if relatively accurate Ci; are required.
Extrapolating from this discussion we reach two fairly obvious but important
conclusions:
• Different methods for computing the same quantity can produce substantially
different results.
• Whether or not an algorithm produces satisfactory results depends upon the type
of problem solved and the goals of the user.
These observations are clarified in subsequent chapters and are intimately related to
the concepts of algorithm stability and problem condition. See §3.4.10.
2.7.11 Analysis of an Ideal Equation Solver
A nice way to conclude this chapter and to anticipate the next is to analyze the quality
of a "make-believe" Ax = b solution process in which all floating point operations are
performed exactly except the storage of the matrix A and the right-hand-side b. It
follows that the computed solution x satisfies
where
(A + E)x = (b + e), II E llao :::; u II A llao. II e lloo :::; u II b lloo .
fl(b) = b + e, fl(A) = A + E.
If u 1t00(A) :::; ! (say), then by Theorem 2.6.2 it can be shown that
11x - x lloo
II X lloo
:::; 4uKao(A) .
(2.7.21)
(2.7.22)
The bounds (2.7.21) and (2.7.22) are "best possible" norm bounds. No general oo
norm error analysis of a linear equation solver that requires the storage of A and b can
render sharper bounds. As a consequence, we cannot justifiably criticize an algorithm
for returning an inaccurate x if A is ill-conditioned relative to the unit roundoff, e.g.,
u1t00(A) � 1. On the other hand, we have every "right" to pursue the development
of a linear equation solver that renders the exact solution to a nearby problem in the
style of (2.7.21).
Problems
P2.7.1 Show that if (2.7.8) is applied with y = x, then fl(xTx) = xTx(l + a) where lal $ nu + O(u2).
P2.7.2 Prove (2.7.4) assuming that fl(x) is the nearest floating point number to x E R.
P2.7.3 Show that if E E Rmxn with m � n, then 11 IEI 112 $ v'nll E 112. This result is useful when
deriving norm bounds from absolute value bounds.
P2.7.4 Assume the existence of a square root function satisfying fl(JX) = JX(l + e) with lei $ u.
Give an algorithm for computing II x 112 and bound the rounding errors.
P2.7.5 Suppose A and B are n-by-n upper triangular floating point matrices. If 6 = fl(AB) is
computed using one of the conventional §1.1 algorithms, does it follow that 6 = AB where A and fJ
are close to A and B?

P2.7.6 Suppose A and B are n-by-n floating point matrices and that II IA-1 J JAi lloo = T. Show that
if (J = fl(AB) is obtained using any of the §1.1 algorithms, then there exists a B so that C = AB and
JI f:J _ B lloo � nur lJ B lloo + O(u2).
P2.7.7 Prove (2.7.19).
P2.7.8 For the IEEE double format, what is the largest power of 10 that can be represented exactly?
What is the largest integer that can be represented exactly?
P2.7.9 For k = 1:62 , what is the largest power of 10 that can be stored exactly if k bits are are
allocated for the mantissa and 63 - k are allocated for the exponent?
P2.7.10 Consider the quadratic equation
This quadratic has two real roots r1 and r2. Assume that Jri - zl � Jr2 - zl. Give an algorithm that
computes r1 to full machine precision.
For an excellent, comprehensive treatment of IEEE arithmetic and its implications, see:
M.L. Overton (2001). Numerical Computing with IEEE Arithmetic, SIAM Publications, Philadelphia,
PA.
The following basic references are notable for the floating point insights that they offer: Wilkinson
(AEP), Stewart (IMC), Higham (ASNA), and Demmel (ANLA). For high-level perspectives we rec
ommend:
J.H. Wilkinson (1963). Rounding Errors in Algebraic Processes, Prentice-Hall, Englewood Cliffs, NJ.
G.E. Forsythe (1970). "Pitfalls in Computation or Why a Math Book is Not Enough," Amer. Math.
Monthly 77, 931-956.
J.H. Wilkinson (1971). "Modern Error Analysis," SIAM Review 13, 548-68.
U.W. Kulisch and W.L. Miranker (1986). "The Arithmetic of the Digital Computer," SIAM Review
28, 1-40.
F. Chaitin-Chatelin and V. Fraysee (1996). Lectures on Finite Precision Computations, SIAM Pub
lications, Philadelphia, PA.
The design of production software for matrix computations requires a detailed understanding of finite
precision arithmetic, see:
J.W. Demmel (1984). "Underflow and the Reliability of Numerical Software," SIAM J. Sci. Stat.
Comput. 5, 887-919.
W.J. Cody (1988). "ALGORITHM 665 MACHAR: A Subroutine to Dynamically Determine Machine
Parameters," ACM Trans. Math. Softw. 14, 303-311.
D. Goldberg (1991). "What Every Computer Scientist Should Know About Floating Point Arith-
metic," ACM Surveys 23, 5-48.
Other developments in error analysis involve interval analysis, the building of statistical models of
roundoff error, and the automating of the analysis itself:
J. Larson and A. Sameh (1978). "Efficient Calculation of the Effects of Roundoff Errors," ACM Trans.
Math. Softw. 4, 228-36.
W. Miller and D. Spooner (1978). "Software for Roundoff Analysis, II," ACM Trans. Math. Softw.
4, 369-90.
R.E. Moore (1979). Methods and Applications ofInterval Analysis, SIAM Publications, Philadelphia,
PA.
J.M. Yohe (1979). "Software for Interval Arithmetic: A Reasonable Portable Package," ACM Trans.
Math. Softw. 5, 50-63.
The accuracy of floating point summation is detailed in:
S.M. Rump, T. Ogita, and S. Oishi (2008). "Accurate Floating-Point Summation Part I: Faithful
Rounding," SIAM J. Sci. Comput. 31, 189-224.

S.M. Rump, T. Ogita, and S. Oishi (2008). "Accurate Floating-Point Summation Part II: Sign, K-fold
Faithful and Rounding to Nearest," SIAM J. Sci. Comput. 31, 1269-1302.
For an analysis of the Strassen algorithm and other "fast" linear algebra procedures, see:
R.P. Brent (1970). "Error Analysis of Algorithms for Matrix Multiplication and Triangular Decom
position Using Winograd's Identity," Numer. Math. 16, 145-156.
W. Miller (1975). "Computational Complexity and Numerical Stability," SIAM J. Comput. 4, 97-107.
N.J. Higham (1992). "Stability of a Method for Multiplying Complex Matrices with Three Real Matrix
Multiplications," SIAM J. Matrix Anal. Applic. 13, 681-687.
J.W. Demmel and N.J. Higham (1992). "Stability of Block Algorithms with Fast Level-3 BLAS,"
ACM 7rans. Math. Softw. 18, 274-291.
B. Dumitrescu (1998). "Improving and Estimating the Accuracy of Strassen's Algorithm," Numer.
Math. 79, 485-499.
The issue of extended precision has received considerable attention. For example, a superaccurate
dot product results if the summation can be accumulated in a register that is "twice as wide" as the
floating representation of vector components. The overhead may be tolerable in a given algorithm if
extended precision is needed in only a few critical steps. For insights into this topic, see:
R.P. Brent (1978). "A Fortran Multiple Precision Arithmetic Package," ACM 7rans. Math. Softw.
4, 57-70.
R.P. Brent (1978). "Algorithm 524 MP, a Fortran Multiple Precision Arithmetic Package," ACM
TI-ans. Math. Softw. 4, 71-81.
D.H. Bailey (1993). "Algorithm 719: Multiprecision Translation and Execution of FORTRAN Pro
grams," ACM 7rans. Math. Softw. 19, 288··319.
X.S. Li, J.W. Demmel, D.H. Bailey, G. Henry, Y. Hida, J. lskandar, W. Kahan, S.Y. Kang, A. Kapur,
M.C. Martin, B.J. Thompson, T. Tung, and D.J. Yoo (2002). "Design, Implementation and Testing
of Extended and Mixed Precision BLAS," ACM 7rans. Math. Softw. 28, 152-205.
J.W. Demmel and Y. Hida (2004). "Accurate and Efficient Floating Point Summation," SIAM J. Sci.
Comput. 25, 1214-1248.
M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov
(2009). "Accelerating Scientific Computations with Mixed Precision Algorithms," Comput. Phys.
Commun. 180, 2526-2533.

Chapter 3
General Linear Systems
3.1 Triangular Systems
3.2 The LU Factorization
3.3 Roundoff Error in Gaussian Elimination
3.4 Pivoting
3.5 Improving and Estimating Accuracy
3.6 Parallel LU
The problem of solving a linear system Ax = b is central to scientific computation.
In this chapter we focus on the method of Gaussian elimination, the algorithm of
choice if A is square, dense, and unstructured. Other methods are applicable if A
does not fall into this category, see Chapter 4, Chapter 11, §12.1, and §12.2. Solution
procedures for triangular systems are discussed first. These are followed by a derivation
of Gaussian elimination that makes use of Gauss transformations. The process of
eliminating unknowns from equations is described in terms of the factorization A = LU
where L is lower triangular and U is upper triangular. Unfortunately, the derived
method behaves poorly on a nontrivial class of problems. An error analysis pinpoints
the difficulty and sets the stage for a discussion of pivoting, a permutation strategy
that keeps the numbers "nice" during the elimination. Practical issues associated with
scaling, iterative improvement, and condition estimation are covered. A framework for
computing the LU factorization in parallel is developed in the final section.
Reading Notes
Familiarity with Chapter 1, §§2.1-2.5, and §2.7 is assumed. The sections within
this chapter depend upon each other as follows:
§3.5
t
§3.1 --+ §3.2 --+ §3.3 --+ §3.4
.!.
§3.6
105

106 Chapter 3. General Linear Systems
Useful global references include Forsythe and Moler (SLAS), Stewart( MABD), Higham
(ASNA), Watkins (FMC), Trefethen and Bau (NLA), Demmel (ANLA), and Ipsen
(NMA).
3.1 Triangular Systems
Traditional factorization methods for linear systems involve the conversion ofthe given
square system to a triangular system that has the same solution. This section is about
the solution of triangular systems.
3.1.1 Forward Substitution
Consider the following 2-by-2 lower triangular system:
[ �� l�2 ] [ :� ] = [ � ].
If £11£22 -=/: 0, then the unknowns can be determined sequentially:
X1 = bi/£11 ,
X2 = (b2 - £21X1)/£22°
This is the 2-by-2 version of an algorithm known as forward substitution. The general
procedure is obtained by solving the ith equation in Lx = b for xi:
If this is evaluated for i = l:n, then a complete specification of x is obtained. Note
that at the ith stage the dot product of L(i, l:i - 1) and x(l:i - 1) is required. Since
bi is involved only in the formula for Xi, the former may be overwritten by the latter.
Algorithm 3.1.1 (Row-Oriented Forward Substitution) If L E Rnxn is lower trian
gular and b E Rn, then this algorithm overwrites b with the solution to Lx = b. L is
assumed to be nonsingular.
b(l) = b(l)/L(l, 1)
for i = 2:n
b(i) = (b(i) - L(i, l:i - l) ·b(l:i - 1))/L(i, i)
end
This algorithm requires n2 flops. Note that L is accessed by row. The computed
solution x can be shown to satisfy
(L + F)x = b IFI :::;; nulLI + O(u2). (3.1.1)
For a proof, see Higham (ASNA, pp. 141-142). It says that the computed solution
exactly satisfies a slightly perturbed system. Moreover, each entry in the perturbing
matrix F is small relative to the corresponding element of L.

3.1. Triangular Systems 107
3.1.2 Back Substitution
The analogous algorithm for an upper triangular system Ux = b is called back substi
tution. The recipe for Xi is prescribed by
and once again bi can be overwritten by Xi .
Algorithm 3.1.2 (Row-Oriented Back Substitution) If U E Rnxn is upper triangular
and b E Rn, then the following algorithm overwrites b with the solution to Ux = b. U
is assumed to be nonsingular.
b(n) = b(n)/U(n, n)
for i = n - 1: -1:1
b(i) = (b(i) - U(i, i + l:n) ·b(i + l:n))/U(i, i)
end
This algorithm requires n2 flops and accesses U by row. The computed solution x
obtained by the algorithm can be shown to satisfy
(U + F)x = b, IFI � nulUI + O(u2). (3.1.2)
3.1.3 Column-Oriented Versions
Column-oriented versions of the above procedures can be obtained by reversing loop
orders. To understand what this means from the algebraic point of view, consider
forward substitution. Once x1 is resolved, it can be removed from equations 2 through
n leaving us with the reduced system
L(2:n, 2:n)x(2:n) = b(2:n) - x(l) ·L(2:n, 1).
We next compute x2 and remove it from equations 3 through n, etc. Thus, if this
approach is applied to
we find x1 = 3 and then deal with the 2-by-2 system
Here is the complete procedure with overwriting.
Algorithm 3.1.3 (Column-Oriented Forward Substitution) If the matrix L E Rnxn
is lower triangular and b E Rn, then this algorithm overwrites b with the solution to
Lx = b. L is assumed to be nonsingular.

for j = l:n - 1
b(j) = b(j)/L(j, j)
b(j + l:n) = b(j + l:n) - b(j) ·L(j + l:n,j)
end
b(n) = b(n)/L(n, n)
It is also possible to obtain a column-oriented saxpy procedure for back substitution.
Algorithm 3.1.4 (Column-Oriented Back Substitution) If U E 1Rnxn is upper trian
gular and b E 1Rn, then this algorithm overwrites b with the solution to Ux = b. U is
assumed to be nonsingular.
for j = n: - 1:2
b(j) = b(j)/U(j,j)
b(l:j - 1) = b(l:j - 1) - b(j) ·U(l:j - 1, j)
end
b(l) = b(l)/U(l, 1)
Note that the dominant operation in both Algorithms 3.1.3 and 3.1.4 is the saxpy
operation. The roundoff behavior of these implementations is essentially the same as
for the dot product versions.
3.1.4 Multiple Right-Hand Sides
Consider the problem of computing a solution X E 1Rnxq to LX = B where L E 1Rnxn
is lower triangular and B E 1Rnxq. This is the multiple-right-hand-side problem and
it amounts to solving q separate triangular systems, i.e., LX(:, j) = B(:, j), j = l:q.
Interestingly, the computation can be blocked in such a way that the resulting algorithm
is rich in matrix multiplication, assuming that q and n are large enough. This turns
out to be important in subsequent sections where various block factorization schemes
are discussed.
It is sufficient to consider just the lower triangular case as the derivation of block
back substitution is entirely analogous. We start by partitioning the equation LX = B
as follows:
0
0
[f�� L�2
LNl LN2 LNN
(3.1.3)
Assume that the diagonal blocks are square. Paralleling the development of Algorithm
3.1.3, we solve the system L11X1 = B1 for X1 and then remove X1 from block equations
2 through N:
Continuing in this way we obtain the following block forward elimination scheme:

3.1. Triangular Systems
for j = l:N
end
Solve L11X1 = Bi
for i = j + l:N
Bi = Bi - LiiXi
end
Notice that the i-loop oversees a single block saxpy update of the form
[Bi
;
+i l [B1
:
+1 l [L1�1,1 lXi.
BN BN LN,j
109
(3.1.4)
To realize level-3 performance, the submatrices in (3.1.3) must be sufficiently large in
dimension.
3.1.5 The Level-3 Fraction
It is handy to adopt a measure that quantifies the amount of matrix multiplication in
a given algorithm. To this end we define the level-3fraction of an algorithm to be the
fraction of flops that occur in the context of matrix multiplication. We call such flops
level-3 flops.
Let us determine the level-3 fraction for (3.1.4) with the simplifying assumption
that n = rN. (The same conclusions hold with the unequal blocking described above.)
Because there are N applications of r-by-r forward elimination (the level-2 portion of
the computation) and n2 flops overall, the level-3 fraction is approximately given by
Nr2 1
1 - � = l -
N
.
Thus, for large N almost all flops are level-3 flops. It makes sense to choose N as
large as possible subject to the constraint that the underlying architecture can achieve
a high level of performance when processing block saxpys that have width r = n/N or
greater.
3.1.6 Nonsquare Triangular System Solving
The problem of solving nonsquare, m-by-n triangular systems deserves some attention.
Consider the lower triangular case when m ;::::: n, i.e.,
[ Lu ]x
L21
Ln E JRnxn,
L E R(m-n)xn
21 '
Assume that L11 is lower triangular and nonsingular. If we apply forward elimination
to Lux = bi , then x solves the system provided L21 (L!"lb1) = b2. Otherwise, there
is no solution to the overall system. In such a case least squares minimization may be
appropriate. See Chapter 5.
Now consider the lower triangular system Lx = b when the number of columns
n exceeds the number of rows m. We can apply forward substitution to the square

system L(l:m, l:m)x(l:m, l:m) = b and prescribe an arbitrary value for x(m + l:n).
See §5.6 for additional comments on systems that have more unknowns than equations.
The handling of nonsquare upper triangular systems is similar. Details are left to the
reader.
3.1.7 The Algebra of Triangular Matrices
A unit triangular matrix is a triangular matrix with l's on the diagonal. Many of the
triangular matrix computations that follow have this added bit of structure. It clearly
poses no difficulty in the above procedures.
For future reference we list a few properties about products and inverses of tri-
angular and unit triangular matrices.
• The inverse of an upper (lower) triangular matrix is upper (lower) triangular.
• The product of two upper (lower) triangular matrices is upper (lower) triangular.
• The inverse of a unit upper (lower) triangular matrix is unit upper (lower) trian
gular.
• The product of two unit upper (lower) triangular matrices is unit upper (lower)
triangular.
Problems
P3.l.1 Give an algorithm for computing a nonzero z E Rn such that Uz = 0 where U E Rnxn is
upper triangular with Unn = 0 and uu · · · Un-1,n-l ¥: 0.
P3.1.2 Suppose L = In - N is unit lower triangular where N E Rnxn. Show that
L-1 = In + N + N2 + · · · + Nn-1.
What is the value of II L-1 llF if Nii = 1 for all i > j?
P3.l.3 Write a detailed version of (3.1.4). Do not assume that N divides n.
P3.l.4 Prove all the facts about triangular matrices that are listed in §3.1.7.
P3.l.5 Suppose S, T E Rnxn are upper triangular and that (ST - ),,I)x = b is a nonsingular system.
Give an O(n2) algorithm for computing x. Note that the explicit formation of ST - ),,I requires O(n3)
flops. Hint: Suppose
S+ = [ � T+ = [ � � ] , b+ = [ � ] ,
where S+ = S(k - l:n, k - l:n), T+ = T(k - l:n, k - l:n), b+ = b(k - l:n), and u, T, f3 E R. Show that
if we have a vector Xe such that
and We = Texe is available, then
/3 - uvTXe - UTWe
"( =
UT - ),,
solves (S+T+ - ),,I)x+ = b+· Observe that x+ and w+ = T+x+ each require O(n - k) flops.
P3.l.6 Suppose the matrices Ri , . . . , Rp E Rnxn are all upper triangular. Give an O(pn2) algorithm
for solving the system (R1 · · · Rp - ),,I)x = b assuming that the matrix of coefficients is nonsingular.
Hint. Generalize the solution to the previous problem.
P3.l.7 Suppose L, K E R"'xn are lower triangular and B E Rnxn. Give an algorithm for computing
X E Rnxn so that LXK = B.

3.2. The LU Factorization 111
The accuracy of a computed solution to a triangular system is often surprisingly good, see:
N.J. Higham (1989). "The Accuracy of Solutions to Triangular Systems,'' SIAM J. Numer. Anal. 26,
1252-1265.
Solving systems of the form (Tp · · · Ti - >.I)x = b where each Ti is triangular is considered in:
C.D. Martin and C.F. Van Loan (2002). "Product Triangular Systems with Shift," SIAM J. Matrix
Anal. Applic. 24, 292-301.
The trick to obtaining an O(pn2) procedure that does not involve any matrix-matrix multiplications
is to look carefully at the back-substitution recursions. See P3.1.6.
A survey of parallel triangular system solving techniques and their stabilty is given in:
N.J. Higham (1995). "Stability of Parallel Triangular System Solvers,'' SIAM J. Sci. Comput. 16,
400-413.
3.2 The LU Factorization
Triangular system solving is an easy O(n2) computation. The idea behind Gaussian
elimination is to convert a given system Ax = b to an equivalent triangular system.
The conversion is achieved by taking appropriate linear combinations of the equations.
For example, in the system
3x1 + 5x2 = 9,
6x1 + 7x2 = 4,
ifwe multiply the first equation by 2 and subtract it from the second we obtain
3x1 + 5x2 = 9,
-3x2 = -14.
This is n = 2 Gaussian elimination. Our objective in this section is to describe the
procedure in the language of matrix factorizations. This means showing that the algo
rithm computes a unit lower triangular matrix L and an upper triangular matrix U so
that A = LU, e.g.,
[ � � ] = [ ; � ] [ � -� ] .
The solution to the original Ax = b problem is then found by a two-step triangular
solve process:
Ly = b, Ux = y Ax = LUx = Ly = b. (3.2.1)
The LU factorization is a "high-level" algebraic description of Gaussian elimination.
Linear equation solving is not about the matrix vector product A-1b but about com
puting LU and using it effectively; see §3.4.9. Expressing the outcome of a matrix
algorithm in the "language" of matrix factorizations is a productive exercise, one that
is repeated many times throughout this book. It facilitates generalization and high
lights connections between algorithms that can appear very different at the scalar level.

3.2.1 Gauss Transformations
To obtain a factorization description of Gaussian elimination as it is traditionally pre
sented, we need a matrix description of the zeroing process. At the n = 2 level, if
vi # 0 and T = 112/v1 , then
[ -� �] [ �� ] [ � ] .
More generally, suppose v E nr· with Vk # o. If
TT = [ 0, .. . , 0, Tk+b · · · , Tn], Vi i = k + l:n,
Ti ,
� Vk
k
and we define
Mk = In - rer, (3.2.2)
then
1 0 0 0 Vt VJ
Nhv
0 1 0 0 Vk Vk
0 0 = 0
-Tk+l 1 Vk+l
0 -Tn 0 1 Vn 0
A matrix of the form Mk = In - ref E Rnxn is a Gauss transformation if the first k
components ofT E Rn are zero. Such a matrix is unit lower triangular. The components
of r(k + l:n) are called multipliers. The vector r is called the Gauss vector.
3.2.2 Applying Gauss Transformations
Multiplication by a Gauss transformation is particularly simple. If C E Rnxr and
Mk = In - ref is a Gauss transformation, then
is an outer product update. Since r(l:k) = 0 only C(k + l:n, :) is affected and the
update C = lvhC can be computed row by row as follows:
for i = k + l:n
C(i, :) = C(i, :) - Ti·C(k, :)
end
This computation requires 2(n - k)r flops. Here is an example:
C = [� ! �], r = [ �i
3 6 10 - 1
(I - re[)C = [4
1
1
i �l·
10 17

3.2.3 Roundoff Properties of Gauss Transformations
If f is the computed version of an exact Gauss vector r, then it is easy to verify that
f = r + e, lei ::; ujrj.
If f is used in a Gauss transform update and H((In - fe'f}C) denotes the computed
result, then
fl ((In - fe'f)C) = (l - rek}C + E ,
where
IEI � 3u (ICI + lrllC(k, :)I) + O(u2).
Clearly, if r has large components, then the errors in the update may be large in
comparison to ICI· For this reason, care must be exercised when Gauss transformations
are employed, a matter that is pursued in §3.4.
3.2.4 Upper Triangularizing
Assume that A E R."'xn. Gauss transformations M1, . . . , Mn-1 can usually be found
such that Mn-1 · · · M2111A = U is upper triangular. To see this we first look at the
n = 3 case. Suppose
and note that
M1 = [-� � �l
-3 0 1
Likewise, in the second step we have
[1 0 0
l
M2 = 0 1 0
0 -2 1
=>
=>
4
7l
5 8
6 10
-!
-
� l
-6 -11
M2(M1A) = 0 -3 -6 .
[1 4
7l
0 0 1
Extrapolating from this example to the general n case we conclude two things.
• At the start of the kth step we have a matrix A(k-l} = Mk-l · · · M1A that is
upper triangular in columns 1 through k - 1.
• The multipliers in the kth Gauss transform Mk are based on A(k-l} (k + l:n, k)
and ai�-l} must be nonzero in order to proceed.
Noting that complete upper triangularization is achieved after n - 1 steps, we obtain
the following rough draft of the overall process:
A<1> = A
for k = l:n - 1
end
For i = k + l:n, determine the multipliers ri(k} = a�Z>/ak�.
Apply Mk = I - r<k>ef to obtain A(k+i) = MkA(k}.
(3.2.3)

For this process to be well-defined, the matrix entries ag>,a��, . . . ,a��--;_��-l must be
nonzero. These quantities are called pivots.
3.2.5 Existence
If no zero pivots are encountered in (3.2.3), then Gauss transformations Mi , . . . , Mn-
l
are generated such that Mn-1 · · · M1A = U is upper triangular. It is easy to check
that if Mk = In - r<k>eI, then its inverse is prescribed by M;;1 = In + r <k>eI and so
A = LU (3.2.4)
where
(3.2.5)
It is clear that L is a unit lower triangular matrix because each M;;1 is unit lower
triangular. The factorization (3.2.4) is called the LUfactorization.
The LU factorization may not exist. For example, it is impossible to find lij and
Uij so
[1 2 3 l [ 1 0 0 l [U11 U12 U13 l
2 4 7 = f21 1 Q Q U22 U23 .
3 5 3 f31 f32 1 Q Q U33
To see this, equate entries and observe that we must have u11 = 1, u12 = 2, £21 = 2,
u22 = 0, and £31 = 3. But then the (3,2) entry gives us the contradictory equation
5 = f31U12 + f32U22 = 6. For this example, the pivot a��) = a22 - (a2i/au)a12 is zero.
It turns out that the kth pivot in (3.2.3) is zero if A(l:k, l:k) is singular. A
submatrix of the form A(l:k, l:k) is called a leading principal submatrix.
Theorem 3.2.1. (LU Factorization). If A E Rnxn and det(A(l:k, l:k)) # 0 for
k = l:n- 1, then there exists a unit lower triangular L E Rnxn and an upper triangular
U E Rnxn such that A = LU. If this is the case and A is nonsingular, then the
factorization is unique and det(A) = uu · · · Unn·
Proof. Suppose k - 1 steps in (3.2.3) have been executed. At the beginning of step k
the matrix A has been overwritten by Mk-l · · · M1 A = A(k-l). Since Gauss transfor
mations are unit lower triangular, it follows by looking at the leading k-by-k portion
of this equation that
( ( )) (k-1) (k-1)
det A l:k, l:k = a11 • • • akk .
Thus, if A(l:k, l:k) is nonsingular, then the kth pivot ai�-l) is nonzero.
(3.2.6)
As for uniqueness, if A = LiUi and A = L2U2 are two LU factorizations of a
nonsingular A, then L"2iLi = U2U11
. Since L21Li is unit lower triangular and U2U11
is upper triangular, it follows that both of these matrices must equal the identity.
Hence, Li = L2 and U1 = U2. Finally, if A = LU, then
det(A) = det(LU) = det(L)det(U) = det(U).
It follows that det(A) = uu · · · Unn· D

3.2. The LU Factorization
3.2.6 L Is the Matrix of Multipliers
115
It turns out that the construction of L is not nearly so complicated as Equation (3.2.5)
suggests. Indeed,
L M-1 M-1
= 1 · · · n-1
= (In - T(l)er)-l
· · · (In - T(n-l)e�-l)-l
= (In + T(l)er) · · · (In + T(n-l)e�_1)
n-1
= In + L: r(k>er
k=l
showing that
L(k + l:n, k) = r(k)(k + l:n) k = l:n - 1. (3.2.7)
In other words, the kth column of L is defined by the multipliers that arise in the k-th
step of (3.2.3). Consider the example in §3.2.4:
3.2.7 The Outer Product Point of View
Since the application of a Gauss transformation to a matrix involves an outer product,
we can regard (3.2.3) as a sequence of outer product updates. Indeed, if
A = [ a WT ] 1
v B n-1
n-1
then the first step in Gaussian elimination results in the decomposition
[Q
WT l [1 0 l [1 0
l [Q
WT l
z B
=
z/a In-1 0 B - zwT/a 0 In-1 .
Steps 2 through n - 1 compute the LU factorization
for then

3.2.8 Practical Implementation
Let us consider the efficient implementation of (3.2.3). First, because zeros have already
been introduced in columns 1 through k - 1, the Gauss transformation update need
only be applied to columns k through n. Of course, we need not even apply the kth
Gauss transform to A(:, k) since we know the result. So the efficient thing to do is
simply to update A(k + l:n, k + l:n). Also, the observation (3.2.7) suggests that we
can overwrite A(k + l:n, k) with L(k + l:n, k) since the latter houses the multipliers
that are used to zero the former. Overall we obtain:
Algorithm 3.2.1 (Outer Product LU) Suppose A E 1Rnxn has the property that
A(l:k, l:k) is nonsingular for k = l:n - 1. This algorithm computes the factorization
A = LU where L is unit lower triangular and U is upper triangular. For i = l:n - 1,
A(i, i:n) is overwritten by U(i, i:n) while A(i + l:n, i) is overwritten by L(i + l:n, i).
for k = l:n - 1
p = k + l:n
A(p, k) = A(p, k)/A(k, k)
A(p, p) = A(p, p) - A(p, k) ·A(k, p)
end
This algorithm involves 2n3/3 flops and it is one of several formulations of Gaussian
elimination. Note that the k-th step involves an (n - k)-by-(n - k) outer product.
3.2.9 Other Versions
Similar to matrix-matrix multiplication, Gaussian elimination is a triple-loop procedure
that can be arranged in several ways. Algorithm 3.2.1 corresponds to the "kij'' version
of Gaussian elimination if we compute the outer product update row by row:
for k = l:n - 1
end
A(k + l:n, k) = A(k + l:n, k)/A(k, k)
for i = k + l:n
end
for j = k + l:n
A(i, j) = A(i, j) - A(i, k) ·A(k, j)
end
There are five other versions: kji, ikj, ijk, jik, and jki. The last of these results in
an implementation that features a sequence of gaxpys and forward eliminations which
we now derive at the vector level.
The plan is to compute the jth columns of L and U in step j. If j = 1, then by
comparing the first columns in A = LU we conclude that
L(2:n, j) = A(2:n, 1)/A(l, 1)
and U(l, 1) = A(l, 1). Now assume that L(:, l:j - 1) and U(l:j - 1, l:j - 1) are known.
To get the jth columns of L and U we equate the jth columns in the equation A = LU

and infer from the vector equation A(:,j) = LU(:,j) that
A(l:j - l,j) = L(l:j - 1, l:j - l) ·U(l:j - l,j)
and j
A(j:n, j) = L, L(j:n, k) ·U(k,j).
k= l
The first equation is a lower triangular linear system that can be solved for the vector
U(l:j - 1, j). Once this is accomplished, the second equation can be rearranged to
produce recipes for U(j,j) and L(j + l:n, j). Indeed, if we set
j- 1
v(j:n) =A(j:n, j) - L, L(j:n, k)U(k, j)
k= l
= A(j:n, j) - L(j:n, l:j - l) ·U(l:j - 1, j),
then L(j + l:n, j) = v(j + l:n)/v(j) and U(j, j) = v(j). Thus, L(j + l:n,j) is a scaled
gaxpy and we obtain the following alternative to Algorithm 3.2.1:
Algorithm 3.2.2 (Gaxpy LU) Suppose A E m_nxn has the property that A(l:k, l:k) is
nonsingular for k = l:n - 1. This algorithm computes the factorization A = LU where
L is unit lower triangular and U is upper triangular.
Initialize L to the identity and U to the zero matrix.
for j = l:n
end
if j = 1
else
v = A(:, l)
ii = A(:, j)
Solve L(l:j - 1, l:j - l) ·z = ii(l:j - 1) for z E ]Ri-1.
U(l:j - 1, j) = z
v(j:n) = ii(j:n) - L(j:n, l:j - l) ·z
end
U(j, j) = v(j)
L(j+l:n, j) = v(j+l:n)/v(j)
(We chose to have separate arrays for L and U for clarity; it is not necessary in practice.)
Algorithm 3.2.2 requires 2n3/3 flops, the same volume of floating point work required
by Algorithm 3.2.1. However, from §1.5.2 there is less memory traffic associated with a
gaxpy than with an outer product, so the two implementations could perform differently
in practice. Note that in Algorithm 3.2.2, the original A(:, j) is untouched until step j.
The terms right-looking and left-looking are sometimes applied to Algorithms
3.2.1 and 3.2.2. In the outer-product implementation, after L(k:n, k) is determined,
the columns to the right of A(:, k) are updated so it is a right-looking procedure. In
contrast, subcolumns to the left of A(:, k) are accessed in gaxpy LU before L(k+ l:n, k)
is produced so that implementation left-looking.

3.2.10 The LU Factorization of a Rectangular Matrix
The LU factorization of a rectangular matrix A E IRnxr can also be performed. The
n > r case is illustrated by
while
[! !] � [! !] [ � -� l
[ ! � : ] = [ ! � ] [ � -� _: ]
depicts the n < r situation. The LU factorization of A E IRnxr is guaranteed to exist
if A(l:k, l:k) is nonsingular for k = l:min{n, r}.
The square LU factorization algorithms above needs only minor alterations to
handle the rectangular case. For example, if n > r, then Algorithm 3.2.1 modifies to
the following:
for k = l:r
end
p = k + l:n
A(p, k) = A(p, k)/A(k, k)
if k < r (3.2.8)
µ = k + l:r
A(p, µ) = A(p, µ) - A(p, k) ·A(k, µ)
end
This calculation requires nr2 - r3/3 flops. Upon completion, A is overwritten by
the strictly lower triangular portion of L E IRnxr and the upper triangular portion of
U E Ilfxr.
3.2.11 Block LU
It is possible to organize Gaussian elimination so that matrix multiplication becomes
the dominant operation. Partition A E IRnxn as follows:
A =
r n-r
where r is a blocking parameter. Suppose we compute the LU factorization
[�:: l =
[�:: lUn.
Here, Ln E m;xr is unit lower triangular and U11 E m;xr is upper triangular and
assumed to be nonsingular. If we solve LnU12 = Ai2 for U12 E wxn-r, then
[An Ai2 ] = [Ln
A21 A22 L21
0
l [Ir 0
l [U11 U12 l
ln-r 0 A 0 ln-r
'

3.2. The LU Factorization
where
A = A22 - L21U12 = A22 - A21AilAi2
is the Schur complement of An in A. Note that if
A = L22U22
is the LU factorization of A, then
[Lu
A =
L21
119
(3.2.9)
is the LU factorization of A. This lays the groundwork for a recursive implementation.
Algorithm 3.2.3 (Recursive Block LU) Suppose A E Rnxn has an LU factorization
and r is a positive integer. The following algorithm computes unit lower triangular
L E Rnxn and upper triangular U E Rnxn so A = LU.
function [L, U] = BlockLU{A, n, r)
if n :5 r
else
end
end
Compute the LU factorization A = LU using (say) Algorithm 3.2.1.
Use (3.2.8) to compute the LU factorization A(:, l:r) = [ f�� ]Uu.
Solve LuU12 = A{l:r, r + l:n) for U12·
A = A(r + l:n, r + l:n) - L21U12
[L22, U22) = BlockLU{A, n - r, r)
L = [ f�� L2� ]'
U = [ Ui
b g�: ]
The following table explains where the flops come from:
Activity Flops
Lu , L2i. Uu
U12
.A
nr2 - r3/3
(n - r)r2
2{n - r)2
If n » r, then there are a total of about 2n3/3 flops, the same volume of atithmetic
as Algorithms 3.2.1 and 3.2.2. The vast majority of these flops are the level-3 flops
associated with the production of A.
The actual level-3 fraction, a concept developed in §3.1.5, is more easily derived
from a nonrecursive implementation. Assume for clarity that n = Nr where N is a
positive integer and that we want to compute
(3.2.10)

where all blocks are r-by-r. Analogously to Algorithm 3.2.3 we have the following.
Algorithm 3.2.4 (Nonrecursive Block LU) Suppose A E Rnxn has an LU factoriza
tion and r is a positive integer. The following algorithm computes unit lower triangular
L E Rnxn and upper triangular U E Rnxn so A = LU.
for k = l:N
end
Rectangular Gaussian elimination:
[A�k l [L�k l
: = : ukk
ANk LNk
Multiple right hand side solve:
Lkk [ uk,k+1 I . . . I ukN ] = [ Ak,k+i I . . . I AkN ]
Level-3 updates:
Aii = Aii - LikUki• i = k + I:N, j = k + l:N
Here is the flop situation during the kth pass through the loop:
Activity Flops
Gaussian elimination (N - k + l)r3 - r3/3
Multiple RHS solve (N - k)r3
Level-3 updates 2(N - k)2r2
Summing these quantities for k = l:N we find that the level-3 fraction is approximately
2n3/3 _
1 _ �
2n3/3 + n2r - 2N "
Thus, for large N almost all arithmetic takes place in the context of matrix multipli
cation. This ensures a favorable amount of data reuse as discussed in §1.5.4.
Problems
P3.2.1 Verify Equation (3.2.6}.
P3.2.2 Suppose the entries of A(E} E E'xn are continuously differentiable functions of the scalar E.
Assume that A = A(O) and all its principal submatrices are nonsingular. Show that for sufficiently
small E, the matrix A(E} has an LU factorization A(E} = L(E)U(E} and that L(E) and U(E} are both
continuously differentiable.
P3.2.3 Suppose we partition A E Rnxn
A =
[ An Ai2 ]
A21 A22
where An is r-by-r and nonsingular. Let S be the Schur complement of An in A as defined in (3.2.9).
Show that after r steps of Algorithm 3.2.1, A(r + l:n, r + l:n) houses S. How could S be obtained
after r steps of Algorithm 3.2.2?

p3.2.4 Suppose A E R"x" has an LU factorization. Show how Ax = b can be solved without storing
the multipliers by computing the LU factorization of the n-by-(n + 1) matrix (A b].
p3.2.5 Describe a variant of Gaussian elimination that introduces zeros into the columns of A in the
order, n: - 1:2 and which produces the factorization A = UL where U is unit upper triangular and L
is lower triangular.
p3.2.6 Matrices in Rnxn of the form N(y, k) = I - yef where y E R" are called Gauss-Jordan
transformations. (a) Give a formula for N(y, k)-1 assuming it exists. (b) Given x E Rn, under what
conditions can y be found so N(y, k)x = ek? (c) Give an algorithm using Gauss-Jordan transformations
that overwrites A with A-1 . What conditions on A ensure the success of your algorithm?
P3.2.7 Extend Algorithm 3.2.2 so that it can also handle the case when A has more rows than
columns.
P3.2.8 Show how A can be overwritten with L and U in Algorithm 3.2.2. Give a 3-loop specification
so that unit stride access prevails.
P3.2.9 Develop a version of Gaussian elimination in which the innermost of the three loops oversees
a dot product.
The method of Gaussian elimination has a long and interesting history, see:
J.F. Grear (2011). "How Ordinary Elimination Became Gaussian Elimination," Historica Mathemat-
ica, 98, 163-218.
J.F. Grear (2011). "Mathematicians of Gaussian Elimination," Notices of the AMS 58, 782--792.
Schur complements (3.2.9) arise in many applications. For a survey of both practical and theoretical
interest, see:
R.W. Cottle (1974). "Manifestations of the Schur Complement," Lin. Alg. Applic. 8, 189-211.
Schur complements are known as "Gauss transforms" in some application areas. The use of Gauss
Jordan transformations (P3.2.6) is detailed in Fox (1964). See also:
T. Dekker and W. Hoffman (1989). "Rehabilitation of the Gauss-Jordan Algorithm," Numer. Math.
54, 591-599.
AB we mentioned, inner product versions of Gaussian elimination have been known and used for some
time. The names of Crout and Doolittle are associated with these techniques, see:
G.E. Forsythe (1960). "Crout with Pivoting," Commun. ACM 9, 507-508.
W.M. McKeeman (1962). "Crout with Equilibration and Iteration," Commun. A CM. 5, 553-555.
Loop orderings and block issues in LU computations are discussed in:
J.J. Dongarra, F.G. Gustavson, and A. Karp (1984). "Implementing Linear Algebra Algorithms for
Dense Matrices on a Vector Pipeline Machine," SIAM Review 26, 91-112.
.J.M. Ortega (1988). "The ijk Forms of Factorization Methods I: Vector Computers," Parallel Comput.
7, 135-147.
D.H. Bailey, K.Lee, and H.D. Simon (1991). "Using Strassen's Algorithm to Accelerate the Solution
of Linear Systems," J. Supercomput. 4, 357-371.
J.W. Demmel, N.J. Higham, and R.S. Schreiber (1995). "Stability ofBlock LU Factorization," Numer.
Lin. Alg. Applic. 2, 173-190.
Suppase A = LU and A+AA = (L+AL)(U+AU) are LU factorizations. Bounds on the perturbations
j,.L and AU in terms of AA are given in:
G.W. Stewart (1997). "On the Perturbation of LU and Cholesky Factors," IMA J. Numer. Anal. 17,
1-6.
X.-W. Chang and C.C. Paige (1998). "On the Sensitivity of the LU factorization," BIT 98, 486-501.

In certain limited domains, it is possible to solve linear systems exactly using rational arithmetic. For
a snapshot of the challenges, see:
P. Alfeld and D.J. Eyre (1991). "The Exact Analysis of Sparse Rectangular Linear Systems," ACM
'.lrans. Math. Softw. 1 7, 502-518.
P. Alfeld (2000). "Bivariate Spline Spaces and Minimal Determining Sets," J. Comput. Appl. Math.
119, 13-27.
3.3 Roundoff Error in Gaussian Elimination
We now assess the effect of rounding errors when the algorithms in the previous two
sections are used to solve the linear system Ax = b. A much more detailed treatment
of roundoff error in Gaussian elimination is given in Higham (ASNA).
3.3.1 Errors in the LU Factorization
Let us see how the error bounds for Gaussian elimination compare with the ideal
bounds derived in §2.7.11. We work with the infinity norm for convenience and focus
our attention on Algorithm 3.2.1, the outer product version. The error bounds that
we derive also apply to the gaxpy formulation (Algorithm 3.2.2). Our first task is to
quantify the roundoff errors associated with the computed triangular factors.
Theorem 3.3.1. Assume that A is an n-by-n matrix offloating point numbers. Ifno
zero pivots are encountered during the execution ofAlgorithm 3.2.1, then the computed
triangular matrices L and 0 satisfy
i,0 = A+H,
JHJ :::; 2(n - l)u (JAi+JLJJOJ) + O(u2) .
(3.3.1)
(3.3.2)
Proof. The proof is by induction on n. The theorem obviously holds for n = 1.
Assume that n � 2 and that the theorem holds for all (n -1)-by-(n - 1) floating point
matrices. If A is partitioned as follows
A = WT ] 1
B n-1
n-1
then the first step in Algorithm 3.2.1 is to compute
z = fl(v/a),
from which we conclude that
z = v/a +f,
lfl < uJv/aJ,
A.1 = fl(B - C),
(3.3.3)
(3.3.4)
(3.3.5)
(3.3.6)

3.3. Roundoff Error in Gaussian Elimination
A
T
A1 = B - (zw + F1) + F2,
IF2I :::; u (IBI + lzl lwTI) + O(u2),
IA1 I :::; IBI + lzllwTI + O(u).
123
(3.3.7)
(3.3.8)
(3.3.9)
The algorithm proceeds to compute the LU factorization of A1 . By induction, the
computed factors L1 and (Ji satisfy
where
If
(; =
[a:
w__Tl
0 U1 '
then it is easy to verify that
LU = A + H
where
To prove the theorem we must verify (3.3.2), i.e.,
Considering (3.3.12), this is obviously the case if
Using (3.3.9) and (3.3.11) we have
IH1 I :::; 2(n - 2)u (IBI + lzl lwTI + IL1 llU1 I) + O(u2),
while (3.3.6) and (3.3.8) imply
IF1 I + IF2I :::; u(IBI + 2lzllwl) + O(u2).
These last two results establish (3.3.13) and therefore the theorem. D
(3.3.10)
(3.3.11)
(3.3.12)
(3.3.13)
We mention that if A is m-by-n, then the theorem applies with n replaced by the
smaller of n and m in Equation 3.3.2.

3.3.2 Triangular Solving with Inexact Triangles
We next examine the effect of roundoff error when L and 0are used by the triangular
system solvers of §3.1.
Theorem 3.3.2. Let L and 0 be the computed LU factors obtained by Algorithm 3.2.1
when it is applied to an n-by-n floating point matrix A. If the methods of§3.1 are used
to produce the computed solution if to Ly = b and the computed solution x to 0x = if,
then (A + E)x = b with
Proof. From (3.1.1) and (3.1.2) we have
(L + F)iJ = b,
(0+ e)x = iJ,
and thus
IFI < nulLI + O(u2),
1e1 < nulOI + O(u2),
(L + F)(O + e)x = (LO + FO + Le + Fe)x = b.
If follows from Theorem 3.3.1 that LO = A + H with
IHI :S 2(n - l)u(IAI + ILllOI) + O(u2),
and so by defining
E = H + F0 + Le + Fe
we find (A + E)x = b. Moreover,
IEI < IHI + IFI 101 + ILi 1e1 + O(u2)
< 2nu (IAI + ILllOI) + 2nu (ILllOI) + O(u2),
(3.3.14)
If it were not for the possibility of a large ILllOI term, (3.3.14) would compare favorably
with the ideal bound (2.7.21). (The factor n is of no consequence, cf. the Wilkinson
quotation in §2.7.7.) Such a possibility exists, for there is nothing in Gaussian elimi
nation to rule out the appearance of small pivots. If a small pivot is encountered, then
we can expect large numbers to be present in L and 0.
We stress that small pivots are not necessarily due to ill-conditioning as the
example
A =
[� � l = [l�E � l [� _;jE l
shows. Thus, Gaussian elimination can give arbitrarily poor results, even for well
conditioned problems. The method is unstable. For example, suppose 3-digit floating
point arithmetic is used to solve
[.001 1.00 l [ X1 ] =
[1.00 l·
1.00 2.00 X2 3.00

3.4. Pivoting
{See §2.7.1.) Applying Gaussian elimination we get
L -
. [ 1
1000 � l 0 = [
and a calculation shows that
to = [.001
1
.001
0 -1�00 l·
A + H.
125
If we go on to solve the problem using the triangular system solvers of §3.1, then using
the same precision arithmetic we obtain a computed solution x =[O , 1jT. This is in
contrast to the exact solution x =[1.002 . . . , .
99
8 . . .jT.
Problems
P3.3.1 Show that if we drop the assumption that A is a floating point matrix in Theorem 3.3. 1 , then
Equation 3.3.2 holds with the coefficient "2"replaced by "3."
P3.3.2 Suppose A is an n-by-n matrix and that L and 0 are produced by Algorithm 3.2.1. (a) How
many flops are required to compute II ILi IUl lloo? (b) Show fl(ILl lUI) � (1 + 2nu) ILl lUI + O(u2).
The original roundoff analysis of Gaussian elimination appears in:
J.H. Wilkinson (1961). "Error Analysis of Direct Methods of Matrix Inversion," J. ACM 8, 281-330.
Various improvements and insights regarding the bounds and have been ma.de over the years, see:
8.A. Chartres and J.C. Geuder (1967). "Computable Error Bounds for Direct Solution of Linear
Equations," J. A CM 14, 63-71.
J.K. Reid (1971). "A Note on the Stability of Gaussian Elimination," J. Inst. Math. Applic. 8,
374-75.
C.C. Paige (1973). "An Error Analysis of a Method for Solving Matrix Equations,'' Math. Comput.
27, 355-59.
H.H. Robertson (1977). "The Accuracy ofError Estimates for Systems ofLinear Algebraic Equations,''
J. Inst. Math. Applic. 20, 409-14.
J.J. Du Croz and N.J. Higham (1992). "Stability of Methods for Matrix Inversion,'' IMA J. Numer.
Anal. 12, 1-19.
J.M. Banoczi, N.C. Chiu, G.E. Cho, and l.C.F. Ipsen (1998). "The Lack of Influence ofthe Right-Hand
Side on the Accuracy of Linear System Solution,'' SIAM J. Sci. Comput. 20, 203-227.
P. Amodio and F. Mazzia (1999). "A New Approach to Backward Error Analysis of LU Factorization
BIT 99, 385-402.
An interesting account ofvon Neuman's contributions to the numerical analysis ofGaussian elimination
is detailed in:
J.F. Grear (2011). "John von Neuman's Analysis of Gaussian Elimination and the Origins of Modern
Numerical Analysis,'' SIAM Review 59, 607 · 682.
3.4 Pivoting
The analysis in the previous section shows that we must take steps to ensure that no
large entries appear in the computed triangular factors L and 0. The example
A = [ .0001
1 � ] [ 1
10000 �] [ .0001
0
-9�99 ] = LU

correctly identifies the source of the difficulty: relatively small pivots. A way out of
this difficulty is to interchange rows. For example, if P is the permutation
p = [ � � ]
then
pA = [ .0�01 � ] = [ .0�01 �] [ � .9�99 ] = LU.
Observe that the triangular factors have modestly sized entries.
In this section we show how to determine a permuted version of A that has a
reasonably stable LU factorization. There arc several ways to do this and they each
corresponds to a different pivoting strategy. Partial pivoting, complete pivoting, and
rook pivoting are considered. The efficient implementation of these strategies and their
properties are discussed. We begin with a few comments about permutation matrices
that can be used to swap rows or columns.
3.4.1 Interchange Permutations
The stabilizations of Gaussian elimination that are developed in this section involve
data movements such as the interchange of two matrix rows. In keeping with our
desire to describe all computations in "matrix terms," we use permutation matrices
to describe this process. (Now is a good time to review §1.2.8-§1.2.11.) Interchange
permutations are particularly important. These are permutations obtained by swapping
two rows in the identity, e.g.,
rr� [n ! i]
Interchange permutations can be used to describe row and column swapping. If
A E R4x4, then II·A is A with rows 1 and 4 interchanged while A·II is A with columns
1 and 4 swapped.
If P = IIm · · · II1 and each Ilk is the identity with rows k and piv(k) interchanged,
then piv(l:m) encodes P. Indeed, x E Rn can be overwritten by Px as follows:
for k = l:m
x(k) tt x(piv(k))
end
Here, the "tt" notation means "swap contents." Since each Ilk is symmetric, we have
pT = II1 · · · IIm. Thus, the piv representation can also be used to overwrite x with
pTX:
for k = m: - 1:1
x(k) tt x(piv(k))
end
We remind the reader that although no floating point arithmetic is involved in a per
mutation operation, permutations move data and have a nontrivial effect upon perfor
mance.

3.4. Pivoting 127
3.4.2 Partial Pivoting
Interchange permutations can be used in LU computations to guarantee that no mul
tiplier is greater than 1 in absolute value. Suppose
[
3 17 10
l
A = 2 4 -2 .
6 18 -12
To get the smallest possible multipliers in the first Gauss transformation, we need au
to be the largest entry in the first column. Thus, if II1 is the interchange permutation
II1 =
[�
then
[�
II1A =
It follows that
[ 1 0 0
l
Mi = -1/3 1 0 =}
-1/2 0 1
0
1
0
18
4
17
�l
-12
l
-2 .
10
M,IT,A �
[� 18 -12
l
-2 2 .
8 16
To obtain the smallest possible multiplier in M2, we need to swap rows 2 and 3. Thus,
if
and
then
[�
[
1 0
M2 = 0 1
0 1/4
1
� -�� l·
0 6
For general n we have
for k = l:n - 1
Find an interchange permutation Ilk E Rnxn that swaps
A(k, k) with the largest element in IA(k:n, k)j.
A = IIkA
Determine the Gauss transformation lvh = In - rCklef such that if
v is the kth column of MkA, then v(k + l:n) = 0.
A = MkA
end
(3.4.1)
This particular row interchange strategy is called partial pivoting and upon completion,
we have
(3.4.2)
where U is upper triangular. As a consequence of the partial pivoting, no multiplier is
larger than one in absolute value.

3.4.3 Where is L?
It turns out that (3.4.1) computes the factorization
PA = LU (3.4.3)
where P = IIn-l · · · 111, U is upper triangular, and L is unit lower triangular with
lli; I � 1. We show that L(k + l:n, k) is a permuted version of Mk's multipliers. From
(3.4.2) it can be shown that
where
Jiih = (IIn-1 . . . IIk+i)Mk(IIk+l . . . IIn-1)
for k = l:n - 1. For example, in the n = 4 case we have
since the Ili are symmetric. Moreover,
(3.4.4)
(3.4.5)
with f(k) = IIn-l · · · Ilk+lT(k). This shows that Nh is a Gauss transformation. The
transformation from T(k) to f(k) is easy to implement in practice.
Algorithm 3.4.1 (Outer Product LU with Partial Pivoting) This algorithm computes
the factorization PA = LU where P is a permutation matrix encoded by piv(l:n - 1),
L is unit lower triangular with lli; I $ 1, and U is upper triangular. For i = l:n,
A(i, i:n) is overwritten by U(i, i:n) and A(i + l:n, i) is overwritten by L(i + l:n, i). The
permutation P is given by P = IIn-1 · · · 111 where Ilk is an interchange permutation
obtained by swapping rows k and piv(k) of In.
for k = l:n - 1
end
Determine µ with k $ µ $ n so IA(µ, k) I = II A(k:n, k) lloo
piv(k) = µ
A(k, :) H A(µ, :)
if A(k, k) =f 0
p = k + l:n
A(p, k) = A(p, k)/A(k, k)
A(p, p) = A(p, p) - A(p, k)A(k, p)
end
The floating point overhead a..'isociated with partial pivoting is minimal from the stand
point of arithmetic as there are only O(n2) comparisons associated with the search for
the pivots. The overall algorithm involves 2n3/3 flops.

3.4. Pivoting
If Algorithm 3.4.1 is applied to
then upon completion
A �
u
A = [1/�
1
� -��l
1/3 -1/4 6
129
and piv = [3 , 3] . These two quantities encode all the information associated with the
reduction:
[
�
0 0 l[0 0 1 l [ 1 0 0 l [6 18 -12 l
0 1 0 1 O A = 1/2 1 0 O 8 16 .
1 0 1 0 0 1/3 -1/4 1 0 0 6
To compute the solution to Ax = b after invoking Algorithm 3.4.1, we solve
Ly = Pb for y and Ux = y for x. Note that b can be overwritten by Pb as follows
for k = l:n - 1
b(k) +-t b(piv(k))
end
We mention that if Algorithm 3.4.1 is applied to the problem,
using 3-digit floating point arithmetic, then
P =
[o
l o
1 l·
L =
[i.oo o l·
.001 1.00
u
=
[1.00 2.00 l
0 1.00 '
and x = [LOO, .996jT. Recall from §3.3.2 that if Gaussian elimination without pivoting
is applied to this problem, then the computed solution has 0(1) error.
We mention that Algorithm 3.4.1 always runs to completion. If A(k:n, k) = 0 in
step k, then Mk = In.
3.4.4 The Gaxpy Version
In §3.2 we developed outer product and gaxpy schemes for computing the LU factor
ization. Having just incorporated pivoting in the outer product version, it is equally
straight forward to do the same with the gaxpy approach. Referring to Algorithm
3.2.2, we simply search the vector lv(j:n)I in that algorithm for its maximal element
and proceed accordingly.

Algorithm 3.4.2 (Gaxpy LU with Partial Pivoting) This algorithm computes the
factorization PA = LU where P is a permutation matrix encoded by piv(l:n - 1),
L is unit lower triangular with lliil $ 1, and U is upper triangular. For i = l:n,
A(i, i:n) is overwritten by U(i, i:n) and A(i + l:n, i) is overwritten by L(i + l:n, i). The
permutation P is given by P = IIn-1 · · · II1 where Ilk is an interchange permutation
obtained by swapping rows k and piv(k) of In.
Initialize L to the identity and U to the zero matrix.
for j = l:n
ifj = 1
v = A(:, 1)
else
end
ii = IIj-1 · · · II1A(:,j)
Solve L(l:j - 1, l:j - l)z = ii(l:j - 1) for z E R?-1
U(l:j- 1, j) = z, v(j:n) = ii(j:n) - L(j:n, l:j - 1) · z
Determine µ with j $ µ $ n so lv(µ)I = 11 v(j:n) 1100 and set piv(j) = µ
v(j) ++ v(µ), L(j, l:j - 1) ++ L(µ, l:j - 1), U(j, j) = v(j)
end
if v(j) "# 0
L(j+l:n, j) = v(j+l:n)/v(j)
end
As with Algorithm 3.4.1, this procedure requires 2n3/3 flops and O(n2) comparisons.
3.4.5 Error Analysis and the Growth Factor
We now examine the stability that is obtained with partial pivoting. This requires
an accounting of the rounding errors that are sustained during elimination and during
the triangular system solving. Bearing in mind that there are no rounding errors
associated with permutation, it is not hard to show using Theorem 3.3.2 that the
computed solution x satisfies (A + E)x = b where
(3.4.6)
Here we are assuming that P, l, and (J are the computed analogs of P, L, and U as
produced by the above algorithms. Pivoting implies that the elements of l are bounded
by one. Thus II l 1100 $ n and we obtain the bound
II E lloo $ nu (211 A lloo + 4nll (J lloo) + O(u2).
The problem now is to bound II (J 1100• Define the growth factor p by
p = max
i,j,k
(3.4.7)
(3.4.8)

3,4. Pivoting 131
where _A(k) is the computed version of the matrix A(k) = MkIIk · · · M1II1A. It follows
that
(3.4.9)
Whether or not this compares favorably with the ideal bound (2.7.20) hinges upon the
size of the growth factor of p. (The factor n3 is not an operating factor in practice and
may be ignored in this discussion.)
The growth factor measures how large the A-entries become during the process
ofelimination. Whether or not we regard Gaussian elimination with partial pivoting is
safe to use depends upon what we can say about this quantity. From an average-case
point of view, experiments by Trefethen and Schreiber (1990) suggest that p is usually
in the vicinity of n213. However, from the worst-case point of view, p can be as large
as 2n-1. In particular, if A E lRnxn is defined by
{ 1 �f � = � or j = n,
aii = -1 1f i > J,
0 otherwise,
then there is no swapping of rows during Gaussian elimination with partial pivoting.
We emerge with A = LU and it can be shown that Unn = 2n-1. For example,
[-
�
-1
-1
0 0
1 0
-1 1
-1 -1 �l [-� � � �l[� � � ; l
1 - 1 -1 1 0 0 0 1 4
1 -1 -1 -1 1 0 0 0 8
Understanding the behavior of p requires an intuition about what makes the U
factor large. Since PA = LU implies U = L-IPA it would appear that the size of L-1
is relevant. However, Stewart (1997) discusses why one can expect the £-factor to be
well conditioned.
Although there is still more to understand about p, the consensus is that serious
element growth in Gaussian elimination with partial pivoting is extremely rare. The
method can be used with confidence.
3.4.6 Complete Pivoting
Another pivot strategy called complete pivoting has the property that the associated
growth factor bound is considerably smaller than 2n-1. Recall that in partial pivoting,
the kth pivot is determined by scanning the current subcolumn A(k:n, k). In complete
pivoting, the largest entry in the current submatrix A(k:n, k:n) is permuted into the
(k, k) position. Thus, we compute the upper triangularization
Mn-1IIn-1 · · · J!fiII1Af1 · · · fn-1 = U.
In step k we are confronted with the matrix
A<k-l) = Mk-1IIk-1 · · · M1II1Af1 · · · fk-1
and determine interchange permutations Ilk and rk such that

Algorithm 3.4.3 (Outer Product LU with Complete Pivoting) This algorithm com
putes the factorization PAQT = LU where P is a permutation matrix encoded by
piv(l:n - 1), Q is a permutation matrix encoded by colpiv(l:n - 1), L is unit lower
triangular with l£i;I ::;: 1, and U is upper triangular. For i = l:n, A(i, i:n) is overwritten
by U(i, i:n) and A(i + l:n, i) is overwritten by L(i+ l:n, i). The permutation P is given
by P = Iln-l · · · Il1 where Ilk is an interchange permutation obtained by swapping
rows k and rowpiv(k) of In. The permutation Q is given by Q = rn-1 · · · ri where rk
is an interchange permutation obtained by swapping rows k and colpiv(k) of In.
for k = l:n - 1
end
Determine µ with k ::;: µ ::;: n and ..X with k ::;: A ::;: n so
IA(µ, .X)I = max{ IA(i, j)I : i = k:n, j = k:n }
rowpiv(k) = µ
A(k, l:n) ++ A(µ, l:n)
colpiv(k) = A
A(l:n, k) ++ A(l:n, .X)
if A(k, k) =f: 0
p = k + l:n
A(p, k) = A(p, k)/A(k, k)
A(p, p) = A(p, p) - A(p, k)A(k, p)
end
This algorithm requires 2n3/3 fl.ops and O(n3) comparisons. Unlike partial pivoting,
complete pivoting involves a significant floating point arithmetic overhead because of
the two-dimensional search at each stage.
With the factorization PAQT = LU in hand the solution to Ax = b proceeds as
follows:
Step 1. Solve Lz = Pb for z.
Step 2. Solve Uy = z for y.
Step 3. Set x = QTy.
The rowpiv and colpiv representations can be used to form Pb and Qy, respectively.
Wilkinson (1961) has shown that in exact arithmetic the elements of the matrix
A(k) = MkITk · · · M1Il1Ar1 · · · rk satisfy
(3.4.10)
The upper bound is a rather slow-growing function of k. This fact coupled with vast
empirical evidence suggesting that p is always modestly sized (e.g, p = 10) permit us to
conclude that Gaussian elimination with complete pivoting is stable. The method solves
a nearby linear system (A + E)x = b in the sense of (2.7.21). However, in general there
is little reason to choose complete pivoting over partial pivoting. A possible exception
is when A is rank deficient. In principal, complete pivoting can be used to reveal the
rank of a matrix. Suppose rank(A) = r < n. It follows that at the beginning of step

3.4. Pivoting 133
r + 1, A(r+ l:n, r+ l:n) = 0. This implies that Ilk = rk = Mk = I for k = r + l:n
and so the algorithm can be terminated after step r with the following factorization in
band:
p
AQT = LU = [ Lu 0 ] [ Uu U12 ] .
L21 ln-r 0 0
Here, Lu and Uu are r-by-r and L21 and U'h are (n - r)-by-r. Thus, Gaussian
elimination with complete pivoting can in principle be used to determine the rank of a
matrix. Nevertheless, roundoff errors make the probability of encountering an exactly
zero pivot remote. In practice one would have to "declare" A to have rank k if the
pivot element in step k+ 1 was sufficiently small. The numerical rank determination
problem is discussed in detail in §5.5.
3.4.7 Rook Pivoting
A third type of LU stablization strategy called rook pivoting provides an interesting
alternative to partial pivoting and complete pivoting. As with complete pivoting,
it computes the factorization PAQ = LU. However, instead of choosing as pivot
the largest value in IA(k:n, k:n)I, it searches for an element of that submatrix that is
maximal in both its row and column. Thus, if
[24 36 13 61 l
42 67 72 50
A(k:n, k:n) = 38 11 36 43 '
52 37 48 16
then "72" would be identified by complete pivoting while "52," "72," or "61" would
be acceptable with the rook pivoting strategy. To implement rook pivoting, the scan
and-swap portion of Algorithm 3.4.3 is changed to
µ = k, .. = k, T = laµA I , s = 0
while T < 11 (A(k:n, ..) II.xi V T < II (A(µ, k:n) 1100
if mod(s, 2) = 0
end
Update µ so that laµA I = II (A(k:n, ..) 1100 with k � µ � n.
else
Update .. so that laµA I = II (A(µ, k:n) 1100 with k � .. � n.
end
s = s + l
rowpiv(k) = µ, A(k, :) ++ A(µ, :) colpiv(k) = .., A(:, k) ++ A(:, ..)
The search for a larger laµA I involves alternate scans of A(k:n, ..) and A(µ, k:n). The
value of T is monotone increasing and that ensures termination of the while-loop.
In theory, the exit value of s could be O(n - k)2), but in practice its value is 0(1).
See Chang (2002). The bottom line is that rook pivoting represents the same O(n2)
overhead as partial pivoting, but that it induces the same level of reliability as complete
pivoting.

3.4.8 A Note on Underdetermined Systems
If A E nmxn with m < n, rank(A) = m, and b E nm, then the linear system Ax = b
is said to be underdetermined. Note that in this case there are an infinite number
of solutions. With either complete or rook pivoting, it is possible to compute an LU
factorization of the form
(3.4.11)
where P and Q are permutations, L E nmxm is unit lower triangular, and U1 E nmxm
is nonsingular and upper triangular. Note that
where c = Pb and
[ ;� ] = Qx.
This suggests the following solution procedure:
Step 1. Solve Ly = Pb for y E nm.
Step 2. Choose Z2 E nn-m and solve U1Z1 = y - U2z2 for Z1.
Step 3. Set
Setting z2 = 0 is a natural choice. We have more to say about underdetermined systems
in §5.6.2.
3.4.9 The LU Mentality
We offer three examples that illustrate how to think in terms of the LU factorization
when confronted with a linear equation situation.
Example 1. Suppose A is nonsingular and n-by-n and that B is n-by-p. Consider
the problem of finding X (n-by-p) so AX = B. This is the multiple right hand side
problem. If X = [ X1 I · · · I Xp ] and B = [ bi I · · · I bp ] are column partitions, then
Compute PA = LU
for k = l:p
Solve Ly = Pbk and then UXk = y.
end
If B = In, then we emerge with an approximation to A-1 •
(3.4.12)
Example 2. Suppose we want to overwrite b with the solution to Akx = b where
A E nnxn, b E nn, and k is a positive integer. One approach is to compute C = Ak
and then solve Cx = b. However, the matrix multiplications can be avoided altogether:

3.4. Pivoting
Compute PA = LU.
for j = l:k
end
Overwrite b with the solution to Ly = Pb.
Overwrite b with the solution to Ux = b.
As in Example 1, the idea is to get the LU factorization "outside the loop."
135
(3.4.13)
Example 3. Suppose we are given A E IRnxn, d E IRn, and c E IRn and that we
want to compute s = cTA-1d. One approach is to compute X = A-1 as discussed in
(i) and then compute s = cTXd. However, it is more economical to proceed as follows:
Compute PA = LU.
Solve Ly = Pd and then Ux = y.
S = CTX
An "A-1" in a formula almost always means "solve a linear system" and almost never
means "compute A-1."
3.4.10 A Model Problem for Numerical Analysis
We are now in possession of a very important and well-understood algorithm (Gaus
sian elimination) for a very important and well-understood problem (linear equations).
Let us take advantage of our position and formulate more abstractly what we mean
by ''problem sensitivity" and "algorithm stability." Our discussion follows Higham
(ASNA, §1.5-1.6), Stewart (MA, §4.3), and Trefethen and Bau (NLA, Lectures 12, 14,
15, and 22).
A problem is a function /:D --+ S from "data/input space" D to "solution/output
space" S. A problem instance is f together with a particular d E D. We assume D
and S are normed vector spaces. For linear systems, D is the set of matrix-vector pairs
(A, b) where A E IRnxn is nonsingular and b E IRn. The function f maps (A, b) to A-1b,
an element of S. For a particular A and b, Ax = b is a problem instance.
A perturbation theory for the problem f sheds light on the difference between f(d)
and f(d + Ad) where d E D and d + Ad E D. For linear systems, we discussed in §2.6
the difference between the solution to Ax = b and the solution to (A + AA)(x + Ax) =
(b + Ab). We bounded II Ax 11/11 x II in terms of II AA 11/11 A II and II Ab 11/11 b II .
The conditioning of a problem refers to the behavior of f under perturbation
at d. A condition number of a problem quantifies the rate of change of the solution
with respect to the input data. If small changes in d induce relatively large changes
in f(d), then that problem instance is ill-conditioned. If small changes in d do not
induce relatively large changes in f(d), then that problem instance is well-conditioned.
Definitions for "small" and "large" are required. For linear systems we showed in
§2.6 that the magnitude of the condition number K(A) = II A 11 11 A-1 II determines
whether an Ax = b problem is ill-conditioned or well-conditioned. One might say that
a linear equation problem is well-conditioned if K(A) :::::: 0(1) and ill-conditioned if
11:(A) :::::: 0(1/u).
An algorithm for computing f(d) produces an approximation f(d). Depending
on the situation, it may be necessary to identify a particular software implementation

of the underlying method. The j function for Gaussian elimination with partial pivot
ing, Gaussian elimination with rook pivoting, and Gaussian elimination with complete
pivoting are all different.
An algorithm for computing f(d) is stable if for some small Ad, the computed
solution j(d) is close to f(d + Ad). A stable algorithm nearly solves a nearby problem.
An algorithm for computing f(d) is backward stable if for some small Ad, the computed
solution j(d) satisfies j(d) = f(d + Ad). A backward stable algorithm exactly solves a
nearby problem. Applied to a given linear system Ax = b, Gaussian elimination with
complete pivoting is backward stable because the computed solution x satisfies
(A + A)x = b
and II A 11/11 A II � O(u). On the other hand, if b is specified by a matrix-vector product
b = Mv, then
(A + A)x = Mv + 8
where II A 11/11 A II � O(u) and 8/(11 M 11 11 v II) � O(u). Here, the underlying f is
defined by f:(A, M, v) � A-1 (Mv). In this case the algorithm is stable but not
backward stable.
Problems
P3.4.l Let A = LU be the LU factorization of n-by-n A with liii l � 1. Let af and uf denote the
ith rows of A and U, respectively. Verify the equation
i-1
uf = af - Li,iu]
j=l
and use it to show that II U lloo � 2n-l ll A lloo . {Hint: Take norms and use induction.)
P3.4.2 Show that if PAQ = LU is obtained via Gaussian elimination with complete pivoting, then
no element of U(i, i:n) is larger in absolute value than luiil· Is this true with rook pivoting?
P3.4.3 Suppose A E Rnxn has an LU factorization and that L and U are known. Give an algorithm
which can compute the {i, j) entry of A-1 in approximately (n - j)2 + (n - i)2 flops.
P3.4.4 Suppose X is thecomputed inverseobtained via (3.4.12). Give an upper bound for II AX - I llr
P3.4.5 Extend Algorithm 3.4.3 so that it can produce the factorization (3.4.11). How many flops are
required?
Papers concerned with element growth and pivoting include:
C.W. Cryer {1968). "Pivot Size in Gaussian Elimination," Numer. Math. 12, 335-345.
J.K. Reid {1971). "A Note on the Stability of Gaussian Elimination," .!.Inst. Math. Applic. 8,
374-375.
P.A. Businger (1971). "Monitoring the Numerical Stability of Gaussian Elimination," Numer. Math.
16, 360-361.
A.M. Cohen (1974). "A Note on Pivot Size in Gaussian Elimination," Lin. Alg. Applic. 8, 361-68.
A.M. Erisman and J.K. Reid {1974). "Monitoring the Stability of the Triangular Factorization of a
Sparse Matrix," Numer. Math. 22, 183-186.
J. Day and B. Peterson {1988). "Growth in Gaussian Elimination,'' Amer. Math. Monthly 95,
489-513.
N.J. Higham and D.J. Higham {1989). "Large Growth Factors in Gaussian Elimination with Pivoting,"
SIAM J. Matrix Anal. Applic. 10, 155-164.
L.N. Trefethen and R.S. Schreiber {1990). "Average-Case Stability of Gaussian Elimination,'' SIAM
J. Matrix Anal. Applic. 11, 335-360.

3.5. Improving and Estimating Accuracy 137
N. Gould (1991). "On Growth in Gaussian Elimination with Complete Pivoting," SIAM J. Matrix
Anal. Applic. 12, 354-361.
A. Edelman (1992). "The Complete Pivoting Conjecture for Gaussian Elimination is False," Mathe
matica J. 2, 58-61.
S.J. Wright (1993). "A Collection of Problems for Which Gaussian Elimination with Partial Pivoting
is Unstable," SIAM J. Sci. Stat. Comput. 14, 231-238.
L.V. Foster (1994). "Gaussian Elimination with Partial Pivoting Can Fail in Practice," SIAM J.
Matrix Anal. Applic. 15, 1354-1362.
A. Edelman and W. Mascarenhas (1995). "On the Complete Pivoting Conjecture for a Hadamard
Matrix of Order 12," Lin. Multilin. Alg. 38, 181-185.
J.M. Pena (1996). "Pivoting Strategies Leading to Small Bounds of the Errors for Certain Linear
Systems," IMA J. Numer. Anal. 16, 141-153.
J.L. Barlow and H. Zha (1998). "Growth in Gaussian Elimination, Orthogonal Matrices, and the
2-Norm," SIAM J. Matrix Anal. Applic. 19, 807-815.
P. Favati, M. Leoncini, and A. Martinez (2000). "On the Robustness of Gaussian Elimination with
Partial Pivoting,'' BIT 40, 62-73.
As we mentioned, the size of L-1 is relevant to the growth factor. Thus, it is important to have an
understanding of triangular matrix condition, see:
D. Viswanath and L.N. Trefethen (1998). "Condition Numbers of Random Triangular Matrices,"
The connection between small pivots and near singularity is reviewed in:
T.F. Chan (1985). "On the Existence and Computation of LU Factorizations with Small Pivots,''
Math. Comput. 42, 535-548.
A pivot strategy that we did not discuss is pairwise pivoting. In this approach, 2-by-2 Gauss trans
formations are used to zero the lower triangular portion of A. The technique is appealing in certain
multiprocessor environments because only adjacent rows are combined in each step, see:
D. Sorensen (1985). "Analysis of Pairwise Pivoting in Gaussian Elimination,'' IEEE Trans. Comput.
C-34, 274· 278.
A related type of pivoting called tournament pivoting that is of interest in distributed memory com
puting is outlined in §3.6.3. For a discussion of rook pivoting and its properties, see:
L.V. Foster (1997). "The Growth Factor and Efficiency of Gaussian Elimination with Rook Pivoting,"
J. Comput. Appl. Math., 86, 177-194.
G. Poole and L. Neal (2000). "The Rook's Pivoting Strategy," J. Comput. Appl. Math. 123, 353-369.
X-W Chang (2002) "Some Features of Gaussian Elimination with Rook Pivoting,'' BIT 42, 66-83.
3.5 Improving and Estimating Accuracy
Suppose we apply Gaussian elimination with partial pivoting to the n-by-n system
Ax = b and that IEEE double precision arithmetic is used. Equation (3.4.9) essentially
says that if the growth factor is modest then the computed solution x satisfies
(A + E)x = b, II E lloo � ull A lloo· (3.5.1)
In this section we explore the practical ramifications of this result. We begin by stress
ing the distinction that should be made between residual size and accuracy. This is
followed by a discussion of scaling, iterative improvement, and condition estimation.
See Higham (ASNA) for a more detailed treatment of these topics.
We make two notational remarks at the outset. The infinity norm is used through
out since it is very handy in roundoff error analysis and in practical error estimation.
Second, whenever we refer to "Gaussian elimination" in this section we really mean
Gaussian elimination with some stabilizing pivot strategy such as partial pivoting.

3.5.1 Residual Size versus Accuracy
The residual of a computed solution x to the linear system Ax = b is the vector
b - Ax. A small residual means that Ax effectively "predicts" the right hand side b.
From Equation 3.5.1 we have II b - Ax 1100 � ull A 110011 x lloo and so we obtain
Heuristic I. Gaussian elimination produces a solution x with a relatively small resid
ual.
Small residuals do not imply high accuracy. Combining Theorem 2.6.2 and (3.5.1), we
see that
11 x - x lloo
II X lloo
� Ull:oo(A) •
This justifies a second guiding principle.
(3.5.2)
Heuristic II. If the unit roundoff and condition satisfy u � 10-d and 11:00(A) � lOq,
then Gaussian elimination produces a solution x that has about d - q correct
decimal digits.
If u 11:00(A) is large, then we say that A is ill-conditioned with respect to the machine
precision.
As an illustration of the Heuristics I and II, consider the system
[.986 .579 l [X1 l = [.235l
.409 .237 X2 .107
in which 11:00(A) � 700 and x = [ 2, -3 ]T. Here is what we find for various machine
precisions:
u x1 x2
10-3 2.11 -3.17
10-4 1.986 -2.975
10-5 2.0019 -3.0032
10-6 2.00025 -3.00094
II x - x lloo
II x lloo
5 . 10-2
8 . 10-3
1 · 10-3
3 . 10-4
II b - Ax lloo
II A lloo ll X lloo
2.0 . 10-3
1.5 . 10-4
2.1 . 10-6
4.2 . 10-7
Whether or not to be content with the computed solution x depends on the require
ments of the underlying source problem. In many applications accuracy is not im
portant but small residuals are. In such a situation, the x produced by Gaussian
elimination is probably adequate. On the other hand, if the number of correct dig
its in x is an issue, then the situation is more complicated and the discussion in the
remainder of this section is relevant.
3.5.2 Scaling
Let /3 be the machine base (typically /3 = 2) and define the diagonal matrices D1 ==
diag(/37"1 , • • • , 13r.,.) and D2 = diag(/3ci , . . . , 13c.,.). The solution to the n-by-n linear
system Ax = b can be found by solving the scaled system (D11AD2)y = D11b using

Gaussian elimination and then setting x = D2y. The scalings of A, b, and y require
only O(n2) flops and may be accomplished without roundoff. Note that D1 scales
equations and D2 scales unknowns.
It follows from Heuristic II that if x and y are the computed versions of x and y,
then
(3.5.3)
Thus, if K00(D11AD2) can be made considerably smaller than K00(A), then we might
expect a correspondingly more accurate x, provided errors are measured in the "D2"
norm defined by II z llv2 = II D2iz lloo· This is the objective of scaling. Note that it
encompasses two issues: the condition of the scaled problem and the appropriateness
of appraising error in the D2-norm.
An interesting but very difficult mathematical problem concerns the exact mini
mization of Kp(D1iAD2) for general diagonal Di and various p. Such results as there
are in this direction are not very practical. This is hardly discouraging, however, when
we recall that (3.5.3) is a heuristic result, it makes little sense to minimize exactly a
heuristic bound. What we seek is a fast, approximate method for improving the quality
of the computed solution x.
One technique of this variety is simple row scaling. In this scheme D2 is the
identity and Di is chosen so that each row in DiiA has approximately the same oo
norm. Row scaling reduces the likelihood of adding a very small number to a very large
number during elimination-an event that can greatly diminish accuracy.
Slightly more complicated than simple row scaling is row-column equilibration.
Here, the object is to choose Di and D2 so that the oo-norm of each row and column
of DiiAD2 belongs to the interval [1/.B, 1) where .B is the base of the floating point
system. For work along these lines, see McKeeman (1962).
It cannot be stressed too much that simple row scaling and row-column equilibra
tion do not "solve" the scaling problem. Indeed, either technique can render a worse
z than if no scaling whatever is used. The ramifications of this point are thoroughly
discussed in Forsythe and Moler (SLE, Chap. 11). The basic recommendation is that
the scaling of equations and unknowns must proceed on a problem-by-problem basis.
General scaling strategies are unreliable. It is best to scale (if at all) on the basis of
what the source problem proclaims about the significance of each ai;. Measurement
units and data error may have to be considered.
3.5.3 Iterative Improvement
Suppose Ax = b has been solved via the partial pivoting factorization PA = LU and
that we wish to improve the accuracy of the computed solution x. If we execute
r = b - Ax
Solve Ly = Pr.
Solve Uz = y.
Xnew = x + z
(3.5.4)
then in exact arithmetic Axnew = Ax + Az = (b - r) + r = b. Unfortunately, the naive
floating point execution of these formulae renders an Xnew that is no more accurate

than x. This is to be expected since f = fl(b - Ax) has few, if any, correct significant
digits. (Recall Heuristic I.) Consequently, z = fl(A-1r) � A-1 · noise � noise is
a very poor correction from the standpoint of improving the accuracy of x. However,
Skeel (1980) has an error analysis that indicates when (3.5.4) gives an improved Xnew
from the standpoint of backward error. In particular, if the quantity
is not too big, then (3.5.4) produces an Xncw such that (A + E)xnew = b for very
small E. Of course, if Gaussian elimination with partial pivoting is used, then the
computed x already solves a nearby system. However, this may not be the case for
certain pivot strategies used to preserve sparsity. In this situation, the fixed precision
iterative improvement step (3.5.4) can be worthwhile and cheap. See Arioli, Demmel,
and Duff (1988).
In general, for (3.5.4) to produce a more accurate x, it is necessary to compute
the residual b - Ax with extended precision floating point arithmetic. Typically, this
means that if t-digit arithmetic is used to compute PA = LU, x, y, and z, then 2t-digit
arithmetic is used to form b - AX. The process can be iterated. In particular, once we
have computed PA = LU and initialize x = 0, we repeat the following:
r = b - Ax (higher precision)
Solve Ly = Pr for y and Uz = y for z. (3.5.5)
x = x + z
We refer to this process as mixed-precision iterative improvement. The original A
must be used in the high-precision computation of r. The basic result concerning the
performance of (3.5.5) is summarized in the following heuristic:
Heuristic III. Ifthe machine precision u and condition satisfy u = 10-d and K00(A) RS
lOq, then after k executions of {3.5.5), x has approximately min{d,k(d - q)} cor
rect digits if the residual computation is performed with precision u2
•
Roughly speaking, if u it00(A) � 1, then iterative improvement can ultimately produce
a solution that is correct to full (single) precision. Note that the process is relatively
cheap. Each improvement costs O(n2), to be compared with the original O(n3) invest
ment in the factorization PA = LU. Of course, no improvement may result if A is
badly conditioned with respect to the machine precision.
3.5.4 Condition Estimation
Suppose that we have solved Ax = b via PA = LU and that we now wish to ascertain
the number ofcorrect digits in the computed solution x. It follows from Heuristic II that
in order to do this we need an estimate of the condition K00(A) = II A 110011 A-1 lloo·
Computing II A lloo poses no problem as we merely use the O(n2) formula (2.3.10).
The challenge is with respect to the factor II A-1 lloo· ConceiV'<1.bly, we could esti
mate this quantity by II X 11001 where X = [ X1 I · · · I Xn ] and Xi is the computed
solution to Axi = ei. (See §3.4.9.) The trouble with this approach is its expense:
P;,00 = II A 110011 X 1100 costs about three times as much as x.

The central problem of condition estimation is how to estimate reliably the con
dition number in O(n2) flops assuming the availability of PA = LU or one of the
factorizations that are presented in subsequent chapters. An approach described in
Forsythe and Moler (SLE, p. 51) is based on iterative improvement and the heuristic
UK:oo(A) � II Z lloo/ll X lloo
where z is the first correction of x in (3.5.5).
Cline, Moler, Stewart, and Wilkinson (1979) propose an approach to the condition
estimation problem thatis based on the implication
Ay = d :=::} II A-1 lloo � II y lloo/ll d lloo·
The idea behind their estimator is to choose d so that the solution y is large in norm
and then set
Koo = II A lloo ll Y lloo/ll d lloo·
The success of this method hinges on how close the ratio II y 1100/ll d 1100 is to its maxi
mum value II A-1 lloo·
Consider the case when A = T is upper triangular. The relation between d and
y is completely specified by the following column version of back substitution:
p(l:n) = 0
for k = n: - 1:1
Choose d(k).
y(k) = (d(k) - p(k))/T(k, k)
p(l:k - 1) = p(l:k - 1) + y(k)T(l:k - 1, k)
end
(3.5.6)
Normally, we use this algorithm to solve a given triangular system Ty = d. However,
in the condition estimation setting we are free to pick the right-hand side d subject to
the "constraint" that y is large relative to d.
One way to encourage growth in y is to choose d(k) from the set {-1, +1} so as
to maximize y(k). If p(k) � 0, then set d(k) = -1. If p(k) < 0, then set d(k) = +l.
In other words, (3.5.6) is invoked with d(k) = -sign(p(k)). Overall, the vector d has
the form d(l:n) = [±1, . . . , ±lf. Since this is a unit vector, we obtain the estimate
f;,oo = II T llooll Y lloo·
A more reliable estimator results if d(k) E {-1, +1} is chosen so as to encourage
growth both in y(k) and the running sum update p(l:k - 1, k) + T(l:k - 1, k)y(k). In
particular, at step k we compute
y(k)+ = (1 - p(k))/T(k, k),
s(k)+ = ly(k)+I + II p(l:k - 1) + T(l:k - 1, k)y(k)+ 111 ,
y(k)- = (-1 - p(k))/T(k, k),
s(k)- = ly(k)- 1 + II p(l:k - 1) + T(l:k - 1, k)y(k)- 111,

and set
{ y(k)+ if s(k)+ ;::: s(k)- ,
y(k) =
y(k)- if s(k)+ < s(k)- .
This gives the following procedure.
Algorithm 3.5.1 (Condition Estimator) Let T E Rnxn be a nonsingular upper trian
gular matrix. This algorithm computes unit co-norm y and a scalar K so II Ty 1100 �
1/11 T-
1
lloo and K � Koo(T)
p(l:n) = 0
for k = n: - 1:1
end
y(k)+ = (1 - p(k))/T(k, k)
y(k)- = (-1 - p(k))/T(k, k)
p(k)+ = p(l:k - 1) + T(l:k - 1, k)y(k)+
p(k)- = p(l:k - 1) + T(l:k - 1, k)y(k)-
if ly(k)+ I + ll P(k)+ 11 1 ;::: ly(k)- 1 + ll P(k)- 11 1
y(k) = y(k)+
p(l:k - 1) = p(k)+
else
y(k) = y(k)
p(l:k - 1) = p(k)
end
K = II Y lloo II T lloo
Y = Y/11 Y lloo
The algorithm involves several times the work of ordinary back substitution.
We are now in a position to describe a procedure for estimating the condition of
a square nonsingular matrix A whose PA = LU factorization is available:
Step 1. Apply the lower triangular version of Algorithm 3.5.1 to UT and
obtain a large-norm solution to UTy = d.
Step 2. Solve the triangular systems LTr = y, Lw = Pr, and Uz = w.
Step 3. Set Koo = II A llooll z lloo/ll r lloo·
Note that II z 1100 :$ II A-1 1100 11 r 1100• The method is based on several heuristics. First,
ifA is ill-conditioned and PA = LU, then it is usually the casethat U is correspondingly
ill-conditioned. The lower triangle L tends to be fairly well-conditioned. Thus, it is
more profitable to apply the condition estimator to U than to L. The vector r, because
it solves ATpTr = d, tends to be rich in the direction of the left singular vector
associated with CTmin(A). Right-hand sides with this property render large solutions to
the problem Az = r.
In practice, it is found that the condition estimation technique that we have
outlined produces adequate order-of-magnitude estimates of the true condition number.

problems
p3,5.l Show by example that there may be more than one way to equilibrate a matrix.
p3,5.2 Suppose P(A + E) = LU, where P is a permutation, L is lower triangular with liij I ::; 1, and
(J is upper triangular. Show that t'toc(A) 2: 11 A lloo/(11 E !loo + µ) where µ = min luiil· Conclude that
if a small pivot is encountered when Gaussian elimination with pivoting is applied to A, then A is
ill-conditioned. The converse is not true. (Hint: Let A be the matrix Bn defined in (2.6.9)).
p3,5.3 (Kahan (1966)) The system Ax = b where
[ 2 -1 1 l [2(1 + 10-10) l
A = -1 10-10 10-10 , b = -10-10
1 10-10 10-10 10-10
has solution x = [10-10 - 1 l]T. (a) Show that if (A + E)y = b and IEI ::; 10-8IAI, then Ix - YI ::;
10-71xl. That is, small relative changes in A's entries do not induce large changes in x even though
1'oo(A) = 1010. (b) Define D = diag(l0-5, 105, 105). Show that Koo(DAD) ::; 5. (c) Explain what is
going on using Theorem 2.6.3.
P3.5.4 Consider the matrix:
0
1
0
0
M
-M
1
0
-M
l
1 M E R .
What estimate of 11:00(T) is produced when (3.5.6) is applied with d(k) = -sign(p(k))? What estimate
does Algorithm 3.5.1 produce? What is the true K00(T)?
P3.5.5 What does Algorithm 3.5.1 produce when applied to the matrix Bn given in (2.6.9)?
The following papers are concerned with the scaling of Ax = b problems:
F.L. Bauer (1963). "Optimally Scaled Matrices," Numer. Math. 5, 73-87.
P.A. Businger (1968). "Matrices Which Can Be Optimally Scaled,'' Numer. Math. 12, 346-48.
A. van der Sluis (1969). "Condition Numbers and Equilibration Matrices," Numer. Math. 14, 14-23.
A. van der Sluis (1970). "Condition, Equilibration, and Pivoting in Linear Algebraic Systems," Numer.
Math. 15, 74-86.
C. McCarthy and G. Strang (1973). "Optimal Conditioning of Matrices," SIAM J. Numer. Anal. 10,
370-388.
T. Fenner and G. Loizou (1974). "Some New Bounds on the Condition Numbers of Optimally Scaled
Matrices,"J. A CM 21, 514-524.
G.H. Golub and J.M. Varah (1974). "On a Characterization of the Best L2-Scaling of a Matrix,''
SIAM J. Numer. Anal. 1 1, 472-479.
R. Skeel (1979). "Scaling for Numerical Stability in Gaussian Elimination," J. ACM 26, 494-526.
R. Skeel (1981). "Effect of Equilibration on Residual Size for Partial Pivoting,'' SIAM J. Numer.
Anal. 18, 449-55.
V. Balakrishnan and S. Boyd (1995). "Existence and Uniqueness of Optimal Matrix Scalings,'' SIAM
Part of the difficulty in scaling concerns the selection of a norm in which to measure errors. An
interesting discussion of this frequently overlooked point appears in:
W. Kahan (1966). "Numerical Linear Algebra,'' Canadian Math. Bull. 9, 757-801.
For a rigorous analysis of iterative improvement and related matters, see:
C.B. Moler (1967). "Iterative Refinement in Floating Point," J. ACM 14, 316-371.
M. Jankowski and M. Wozniakowski (1977). "Iterative Refinement Implies Numerical Stability," BIT
17, 303-311.
ll.D. Skeel (1980). "Iterative Refinement Implies Numerical Stability for Gaussian Elimination," Math.
Comput. 35, 817-832.
N.J. Higham (1997). "Iterative Refinement for Linear Systems and LAPACK,'' IMA J. Numer. Anal.
17, 495-509.

A. Dax {2003). "A Modified Iterative Refinement Scheme," SIAM J. Sci. Comput. 25, 1199-1213.
J. Demmel, Y. Hida, W. Kahan, X.S. Li, S. Mukherjee, and E.J. Riedy {2006). "Error Bounds from
Extra-Precise Iterative Refinement," ACM TI-ans. Math. Softw. 32, 325-351.
The condition estimator that we described is given in:
A.K. Cline, C.B. Moler, G.W. Stewart, and J.H. Wilkinson {1979). "An Estimate for the Condition
Number of a Matrix," SIAM J. Numer. Anal. 16, 368-75.
Other references concerned with the condition estimation problem include:
C.G. Broyden {1973). "Some Condition Number Bounds for the Gaussian Elimination Process," J.
Inst. Math. Applic. 12, 273-286.
F. Lemeire {1973). "Bounds for Condition Numbers of Triangular Value of a Matrix," Lin. Alg.
Applic. 1 1, 1-2.
D.P. O'Leary {1980). "Estimating Matrix Condition Numbers," SIAM J. Sci. Stat. Comput. 1,
205-209.
A.K. Cline, A.R. Conn, and C. Van Loan {1982). "Generalizing the LINPACK Condition Estimator,"
in Numerical Analysis , J.P. Hennart {ed.), Lecture Notes in Mathematics No. 909, Springer-Verlag,
New York.
A.K. Cline and R.K. Rew {1983). "A Set of Counter examples to Three Condition Number Estima
tors," SIAM J. Sci. Stat. Comput. 4, 602-611.
W. Hager {1984). "Condition Estimates," SIAM J. Sci. Stat. Comput. 5, 311-316.
N.J. Higham {1987). "A Survey of Condition Number Estimation for Triangular Matrices," SIAM
Review 29, 575-596.
N.J. Higham {1988). "FORTRAN Codes for Estimating the One-Norm of a Real or Complex Matrix
with Applications to Condition Estimation {Algorithm 674)," A CM TI-ans. Math. Softw. 14,
381-396.
C.H. Bischof {1990). "Incremental Condition Estimation," SIAM J. Matrix Anal. Applic. 11, 312-
322.
G. Auchmuty {1991). "A Posteriori Error Estimates for Linear Equations," Numer. Math. 61, 1-6.
N.J. Higham {1993). "Optimization by Direct Search in Matrix Computations," SIAM J. Matri:i;
Anal. Applic. 14, 317-333.
D.J. Higham {1995). "Condition Numbers and Their Condition Numbers," Lin. Alg. Applic. 214,
193-213.
G.W. Stewart {1997). "The Triangular Matrices of Gaussian Elimination and Related Decomposi
tions," IMA J. Numer. Anal. 1 7, 7-16.
3.6 Parallel LU
In §3.2.11 we show how to organize a block version of Gaussian elimination (without
pivoting) so that the overwhelming majority of flops occur in the context of matrix
multiplication. It is possible to incorporate partial pivoting and maintain the same
level-3 fraction. After stepping through the derivation we proceed to show how the
process can be effectively parallelized using the block-cyclic distribution ideas that
were presented in §1.6.
3.6.1 Block LU with Pivoting
Throughout this section assume A E lRnxn and for clarity that n = rN:
(3.6.1)
We revisit Algorithm 3.2.4 (nonrecursive block LU) and show how to incorporate partial
pivoting.

J.6. Parallel LU 145
The first step starts by applying scalar Gaussian elimination with partial pivoting
to the first block column. Using an obvious rectangular matrix version of Algorithm
3,4.l we obtain the following factorization:
(3.6.2)
In this equation, P1 E JR.nxn is a permutation, Lu E wxr is unit lower triangular, and
Un E wxr is upper triangular.
The next task is to compute the first block row of U. To do this we set
.J. . E wxr
i,3 ' (3.6.3)
a.nd solve the lower triangular multiple-right-hand-side problem
Lu [ U12 I · · · I U1N ] = [ A12 I · · · J A1N ] (3.6.4)
for U12, . . . 'U1N E wxr. At this stage it is easy to show that we have the partial
factorization
[Ln 0
L21 Ir
P1A
LNl 0
0 ] [Uu
0
Ir 0 0
: [ 0 A{new) ] :
Ir 0 0
where
[A22
A{new) = _:
AN2
(3.6.5)
Note that the computation of A(new) is a level-3 operation as it involves one matrix
multiplication per A-block.
if
and
The remaining task is to compute the pivoted LU factorization of A(new). Indeed,
p(new)A(new) =
L(ncw)u(new)

then
pA = [t: 0
L<•••l
0l[�t
LN1 0
is the pivoted block LU factorization of A with
P = Pi .
[Ir 0 l
0 p(new)
In general, the processing of each block column in A is a four-part calculation:
Part A. Apply rectangular Gaussian Elimination with partial pivoting to a block
column of A. This produces a permutation, a block column of L, and a diagonal
block of U. See (3.6.2).
Part B. Apply the Part A permutation to the "rest of A." See (3.6.3).
Part C. Complete the computation of U's next block row by solving a lower trian
gular multiple right-hand-side problem. See (3.6.4).
Part D. Using the freshly computed £-blocks and U-blocks, update the "rest of A."
See (3.6.5).
The precise formulation of the method with overwriting is similar to Algorithm 3.2.4
and is left as an exercise.
3.6.2 Parallelizing the Pivoted Block LU Algorithm
Recall the discussion of the block-cyclic distribution in §1.6.2 where the parallel com
putation of the matrix multiplication update C = C + AB was outlined. To provide
insight into how the pivoted block LU algorithm can be parallelized, we examine a rep
resentative step in a small example that also makes use of the block-cyclic distribution.
Assume that N = 8 in (3.6.1) and that we have a Prow-bY-PcoI processor network
with Prow = 2 and Pcol = 2. At the start, the blocks of A = (Aij) are cyclically
distributed as shown in Figure 3.6.1. Assume that we have carried out two steps of
block LU and that the computed Lij and Uij have overwritten the corresponding A
blocks. Figure 3.6.2 displays the situation at the start of the third step. Blocks that
are to participate in the Part A factorization
[A33 ] [£33 ]
P3
A
�
3
=
L
�
3
Ua3
are highlighted. Typically, Prow processors are involved and since the blocks are each
r-by-r, there are r steps as shown in (3.6.6).

3.6. Parallel LU 147
Figure 3.6.1.
Part A:
Figure 3.6.2.

for j = l:r
end
Columns Akk(:,j), . . . , AN,k(:,j) are assembled in
the processor housing Akk, the "pivot processor"
The pivot processor determines the required row interchange and
the Gauss transform vector
The swapping of the two A-rows may require the involvement of
two processors in the network
The appropriate part of the Gauss vector together with
Akk(j,j:r) is sent by the pivot processor to the
processors that house Ak+l,k, . . . , AN,k
The processors that house Akk, . . . , AN,k carry out their
share of the update, a local computation
{3.6.6)
Upon completion, the parallel execution of Parts B and C follow. In the Part B compu
tation, those blocks that may be involved in the row swapping have been highlighted.
See Figure 3.6.3. This overhead generally engages the entire processor network, al
though communication is local to each processor column.
Part B:
Figure 3.6.3.
Note that Part C involves just a single processor row while the "big" level-three update
that follows typically involves the entire processor network. See Figures 3.6.4 and 3.6.5.

Part C:
Figure 3.6.4.
Part D:
Figure 3.6.5.

The communication overhead associated with Part D is masked by the matrix multi
plications that are performed on each processor.
This completes the k = 3 step of parallel block LU with partial pivoting. The
process can obviously be repeated on the trailing 5-by-5 block matrix. The virtues of
the block-cyclic distribution are revealed through the schematics. In particular, the
dominating level-3 step (Part D) is load balanced for all but the last few values of
k. Subsets of the processor grid are used for the "smaller," level-2 portions of the
computation.
We shall not attempt to predict the fraction of time that is devoted to these
computations or the propagation of the interchange permutations. Enlightenment in
this direction requires benchmarking.
3.6.3 Tournament Pivoting
The decomposition via partial pivoting in Step A requires a lot of communication. An
alternative that addresses this issue involves a strategy called tournament pivoting.
Here is the main idea. Suppose we want to compute PW = LU where the blocks of
are distributed around some network of processors. Assume that each Wi has many
more rows than columns. The goal is to choose r rows from W that can serve as pivot
rows. If we compute the "local" factorizations
via Gaussian elimination with partial pivoting, then the top r rows of the matrices
P1Wi , P2W2, PaWa, are P4W4 are pivot row candidates. Call these square matrices
W{, W�, w3, and W� and note that we have reduced the number of possible pivot rows
from n to 4r.
Next we compute the factorizations
= P12 [ W{
]
w�
W'
3
W'
4 ] =
and recognize that the top r rows of P12W{2 and the top r rows of Pa4W34 are even
better pivot row candidates. Assemble these 2r rows into a matrix W1234 and compute
P1234W12a4 = Li234U1234.
The top r rows of P1234W1234 are then the chosen pivot rows for the LU reduction of
w.
Of course, there are communication overheads associated with each round of the
"tournament," but the volume of interprocessor data transfers is much reduced. See
Demmel, Grigori, and Xiang (2010).

Problems
p3.6.l In §3.6.1 we outlined a single step of block LU with partial pivoting. Specify a complete
version of the algorithm.
p3.6.2 Regarding parallel block LU with partial pivoting, why is it better to "collect" all the per
mutations in Part A before applying them across the remaining block columns? In other words, why
not propagate the Part A permutations as they are produced instead of having Part B, a separate
permutation application step?
P3.6.3 Review the discussion about parallel shared memory computing in §1.6.5 and §1.6.6. Develop a
shared memory version of Algorithm 3.2.1. Designate one processor for computation of the multipliers
and a load-balanced scheme for the rank-1 update in which all the processors participate. A barrier
is necessary because the rank-1 update cannot proceed until the multipliers are available. What if
partial pivoting is incorporated?
Notes and References for §3. 6
See the scaLAPACK manual for a discussion of parallel Gaussian elimination as well as:
J. Ortega (1988). Introduction to Parallel and Vector Solution of Linear Systems, Plenum Press, New
York.
K. Gallivan, W. Jalby, U. Meier, and A.H. Sameh (1988). "Impact of Hierarchical Memory Systems
on Linear Algebra Algorithm Design," Int. J. Supercomput. Applic. 2, 12---48.
J. Dongarra, I. Duff, D. Sorensen, and H. van der Vorst (1990). Solving Linear Systems on Vector
and Shared Memory Computers, SIAM Publications, Philadelphia, PA.
Y. Robert (1990). The Impact of Vector and Parallel Architectures on the Gaussian Elimination
Algorithm, Halsted Press, New York.
J. Choi, J.J. Dongarra, L.S. Osttrouchov, A.P. Petitet, D.W. Walker, and R.C. Whaley (1996). "Design
and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines," Scientific
Programming, 5, 173-184.
X.S. Li (2005). "An Overview of SuperLU: Algorithms, Implementation, and User Interface," ACM
Trans. Math. Softw. 31, 302-325.
S. Tomov, J. Dongarra, and M. Baboulin (2010). "Towards Dense Linear Algebra for Hybrid GPU
Accelerated Manycore Systems," Parallel Comput. 36, 232-240.
The tournament pivoting strategy is a central feature of the optimized LU implementation discussed
in:
J. Demmel, L. Grigori, and H. Xiang (2011). "CALU: A Communication Optimal LU Factorization
Algorithm," SIAM J. Matrix Anal. Applic. 32, 1317-1350.
E. Solomonik and J. Demmel (2011). "Communication-Optimal Parallel 2.5D Matrix Multiplication
and LU Factorization Algorithms," Euro-Par 201 1 Parallel Processing Lecture Notes in Computer
Science, 2011, Volume 6853/2011, 90-109.

Chapter 4
Special Linear Systems
4. 1 Diagonal Dominance and Symmetry
4.2 Positive Definite Systems
4.3 Banded Systems
4.4 Symmetric Indefinite Systems
4.5 Block Tridiagonal Systems
4.6 Vandermonde Systems
4.7 Classical Methods for Toeplitz Systems
4.8 Circulant and Discrete Poisson Systems
It is a basic tenet of numerical analysis that solution procedures should exploit
structure whenever it is present. In numerical linear algebra, this translates into an ex
pectation that algorithms for general linear systems can be streamlined in the presence
of such properties as symmetry, definiteness, and handedness. Two themes prevail:
• There are important classes of matrices for which it is safe not to pivot when
computing the LU or a related factorization.
• There are important classes of matrices with highly structured LU factorizations
that can be computed quickly, sometimes, very quickly.
Challenges arise when a fast, but unstable, LU factorization is available.
Symmetry and diagonal dominance are prime examples of exploitable matrix
structure and we use these properties to introduce some key ideas in §4.1. In §4.2 we
examine the case when A is both symmetric and positive definite, deriving the stable
Cholesky factorization. Unsymmetric positive definite systems are also investigated.
In §4.3, banded versions of the LU and Cholesky factorizations are discussed and this
is followed in §4.4 with a treatment of the symmetric indefinite problem. Block ma
trix ideas and sparse matrix ideas come together when the matrix of coefficients is
block tridiagonal. This important class of systems receives a special treatment in §4.5.
153

154 Chapter 4. Special Linear Systems
Classical methods for Vandermonde and Toeplitz systems are considered in §4.6 and
§4.7. In §4.8 we connect the fast transform discussion in §1.4 to the problem of solving
circulant systems and systems that arise when the Poisson problem is discretized using
finite differences.
Before we get started, we clarify some terminology associated with structured
problems that pertains to this chapter and beyond. Banded matrices and block-banded
matrices are examples ofsparse matrices, meaning that the vast majority of their entries
are zero. Linear equation methods that are appropriate when the zero-nonzero pattern
is more arbitrary are discussed in Chapter 11. Toeplitz, Vandermonde, and circulant
matrices are data sparse. A matrix A E IRmxn is data sparse if it can be parameterized
with many fewer than O(mn) numbers. Cauchy-like systems and semiseparable systems
are considered in §12.1 and §12.2.
Reading Notes
Knowledge of Chapters 1, 2, and 3 is assumed. Within this chapter there are the
following dependencies:
§4.1 -+
-1-
§4.6
§4.2 -+ §4.3 -+ §4.4
-1-
§4.5 -+ §4.7 -+ §4.8
Global references include Stewart( MABD), Higham (ASNA), Watkins (FMC), Tre
fethen and Bau (NLA), Demmel (ANLA), and Ipsen (NMA).
4.1 Diagonal Dominance and Symmetry
Pivoting is a serious concern in the context of high-performance computing because
the cost of moving data around rivals the cost of computation. Equally important,
pivoting can destroy exploitable structure. For example, if A is symmetric, then it
involves half the data of a general A. Our intuition (correctly) tells us that we should
be able to solve a symmetric Ax = b problem with half the arithmetic. However, in
the context of Gaussian elimination with pivoting, symmetry can be destroyed at the
very start of the reduction, e.g.,
[� � �] [� � �] [� � �]·
1 0 0 c e f a b c
Taking advantage of symmetry and other patterns and identifying situations where
pivoting is unnecessary are typical activities in the realm of structured Ax = b solving.
The goal is to expose computational shortcuts and to justify their use through analysis.
4.1.1 Diagonal Dominance and the LU Factorization
If A's diagonal entries are large compared to its off-diagonal entries, then we anticipate
that it is safe to compute A = LU without pivoting. Consider the n = 2 case:

4.1. Diagonal Dominance and Symmetry 155
If a and d "dominate" b and c in magnitude, then the elements of L and U will be
nicely bounded. To quantify this we make a definition. We say that A E 1Rnxn is row
diagonally dominant if
n
laid :::: L laijl, i = l:n . (4.1.1)
j=l
#i
Similarly, column diagonal dominance means that lajjI is larger than the sum of all
off-diagonal element magnitudes in the same column. If these inequalities are strict,
then A is strictly (row/column) diagonally dominant. A diagonally dominant matrix
can be singular, e.g., the 2-by- 2 matrix of l's. However, if a nonsingular matrix is
diagonally dominant, then it has a "safe" LU factorization.
Theorem 4.1.1. If A is nonsingular and column diagonally dominant, then it has an
LU factorization and the entries in L = (fij) satisfy llijl :::; 1.
Proof. We proceed by induction. The theorem is obviously true if n = 1. Assume
that it is true for (n - 1)-by-(n - 1) nonsingular matrices that are column diagonally
dominant. Partition A E 1Rnxn as follows:
[ a WT ]
A = v C ' a E JR, V,W E JRn-1, C E JR(n-l)x(n-1).
If a = 0, then v = 0 and A is singular. Thus, a I- 0 and we have the factorization
where
B = c - .!..vwT.
a
(4.1.2)
Since det(A) = a · det(B), it follows that B is nonsingular. It is also column diagonally
dominant because
n-1 n-1 n-1
L: 1bij1 L fcij - viwj/a[ < L fcij[
i=l i=l i=l
i"#j i"h i"#j
<
fw f
([cjj[ - [wj[) +
1;1 (fa[ - [vj[)
+
<
n-1
fwjl L [vi[
fa[ i=l
if-j
I w·v·
1
Cjj - �J = fbjjf·
By induction, B has an LU factorization £1U1 and so from (4.1.2) we have
[ 1 0 ] [ a WT ]
A = v/a L1 0 U1 =
LU.
The entries in fv/a[ are bounded by 1 because A is column diagonally dominant. By
induction, the same can be said about the entries in [£1[. Thus, the entries in [£[ are
all bounded by 1 completing the proof. 0

The theorem shows that Gaussian elimination without pivoting is a stable solution
procedure for a column diagonally dominant matrix. If the diagonal elements strictly
dominate the off-diagonal elements, then we can actually bound JI A-1 JI.
Theorem 4.1.2. If A E Rnxn and
8 =
then
min
l�j�n
(4.1.3)
Proof. Define D = diag(a11, • • • , ann) and E = A - D. If e is the column n-vector of
l's, then
eTIEI � eTIDI - oeT.
If x E Rn, then Dx = Ax - Ex and
IDI lxl < IAxl + IEI lxJ.
Thus,
eTIDI lxl � eTJAxl + eTIEI Jxl � II Ax 11 1 + (eTIDI - 8eT} lxl
and so 811 x 111 = oeTlxl � II Ax Iii . The bound on II A-1 111 follows from the fact that
for any y E Rn,
The "dominance" factor 8 defined in (4.1.3) is important because it has a bearing on
the condition of the linear system. Moreover, if it is too small, then diagonal dominance
may be lost during the elimination process because of roundoff. That is, the computed
version of the B matrix in (4.1.2) may not be column diagonally dominant.
4.1.2 Symmetry and the LDLT Factorization
IfA is symmetric and has an LU factorization A = LU, then L and U have a connection.
For example, if n = 2 we have
[: � l=
[c;a � l· [� d - (:/a)c l
=
[c;a � l· ([� d - (�/a)c l [�
c
�
a
l).
It appears that U is a row scaling of LT. Here is a result that makes this precise.

4.1. Diagonal Dominance and Symmetry 157
Theorem 4.1.3. (LDLT Factorization) IfA E 1Rnxn is symmetric and the principal
submatrix A(l:k, l:k) is nonsingular for k = l:n - 1, then there exists a unit lower
triangular matrix L and a diagonal matrix
such that A = LDLT. The factorization is unique.
Proof. By Theorem 3.2.1 we know that A has an LU factorization A = LU. Since the
matrix
is both symmetric and upper triangular, it must he diagonal. The theorem follows by
setting D = UL-T and the uniqueness of the LU factorization. D
Note that once we have the LDLT factorization, then solving Ax = b is a 3-step process:
Lz = b, Dy = z,
This works because Ax = L(D(LTx)) = L(Dy) = Lz = b.
Because there is only one triangular matrix to compute, it is not surprising that
the factorization A = LDLT requires half as many flops to compute as A = LU. To
see this we derive a Gaxpy-rich procedure that, for j = l:n, computes L(j + l:n, j) and
d; in step j. Note that
A(j:n, j) = L(j:n, l:j) ·v(l:j)
where
v(l:j)
From this we conclude that
j-1
d; = a;; - 'L dk.e;k.
k=l
With d; available, we can rearrange the equation
A(j + l:n, j) = L(j + l:n, l:j) ·v(l:j)
= L(j + l:n, l:j - l) ·v(l:j - 1) + drL(j + l:n, j)
to get a recipe for L(j + l:n, j):
L(j + l:n, j) = :. (A(j + l:n, j) - L(j + l:n, l:j - l) ·v(l:j - 1)) .
J
Properly sequenced, we obtain the following overall procedure:

for j = l:n
for i = l:j - 1
v(i) = L(j, i) · d(i)
end
d(j) = A(j, j) - L(j, l:j - l) ·v(l:j - 1)
L(j + l:n,j) = (A(j + l:n,j) - L(j + l:n, l:j - l)·v(l:j - 1))/d(j)
end
With overwriting we obtain the following procedure.
Algorithm 4.1.l (LDLT) If A E JR,nxn is symmetric and has an LU factorization, then
this algorithm computes a unit lower triangular matrix L and a diagonal matrix D =
diag(d1, . . . , dn) so A = LDLT. The entry aij is overwritten with fij if i > j and with
di if i = j.
for j = l:n
for i = l:j - 1
v(i) = A(j, i)A(i, i)
end
A(j,j) = A(j,j) - A(j, l:j - l)·v(l:j - 1)
A(j + l:n,j) = (A(j + l:n,j) - A(j + l:n, l:j - l) ·v(l:j - 1))/A(j, j)
end
This algorithm requires n3/3 flops, about half the number of flops involved in Gaussian
elimination.
The computed solution x to Ax = b obtained via Algorithm 4.1.1 and the usual
triangular system solvers of §3.l can be shown to satisfy a perturbed system (A+E)x =
b, where
(4.1.4)
and L and b are the computed versions of L and D, respectively.
As in the case of the LU factorization considered in the previous chapter, the
upper bound in (4.1.4) is without limit unless A has some special property that guar
antees stability. In the next section, we show that if A is symmetric and positive
definite, then Algorithm 4.1.1 not only runs to completion, but is extremely stable. If
A is symmetric but not positive definite, then, as we discuss in §4.4, it is necessary to
consider alternatives to the LDLT factorization.
Problems
P4.1.1 Show that if all the inequalities in (4.1.1) are strict inequalities, then A is nonsingular.
P4.1.2 State and prove a result similar to Theorem 4.1.2 that applies to a row diagonally dominant
matrix. In particular, show that II A-1 1100 � 1/8 where 8 measures the strength of the row diagonal
dominance as defined in Equation 4.1.3.
P4.l.3 Suppose A is column diagonally dominant, symmetric, and nonsingular and that A = LDLT.

4.2. Positive Definite Systems 159
What can you say about the size of entries in L and D? Give the smallest upper bound you can for
II L 111 ·
Notes and References for §4. l
The unsymmetric analog of Algorithm 4.1.2 is related to the methods of Crout and Doolittle. See
Stewart (IMC, pp. 131 - 149) and also:
G.E. Forsythe {1960). "Crout with Pivoting," Commun. ACM 3, 507-508.
W.M. McKeeman {1962). "Crout with Equilibration and Iteration," Commun. A CM 5, 553-555.
H.J. Bowdler, R.S. Martin, G. Peters, and J.H. Wilkinson {1966), "Solution of Real and Complex
Systems of Linear Equations," Numer. Math. 8, 217-234.
Just as algorithms can be tailored to exploit structure, so can error analysis and perturbation theory:
C. de Boor and A. Pinkus {1977). "A Backward Error Analysis for Totally Positive Linear Systems,"
Numer. Math. 27, 485 490.
J.R. Bunch, J.W. Demmel, and C.F. Van Loan {1989). "The Strong Stability ofAlgorithms for Solving
Symmetric Linear Systems," SIAM J. Matrix Anal. Applic. 10, 494-499.
A. Barrlund {1991). "Perturbation Bounds for the LDLT and LU Decompositions," BIT 31, 358-363.
D.J. Higham and N.J. Higham {1992). "Backward Error and Condition ofStructured Linear Systems,"
J.M. Pena {2004). "LDU Decompositions with L and U Well Conditioned," ETNA 18, 198-208.
J-G. Sun (2004). "A Note on Backward Errors for Structured Linear Systems," Numer. Lin. Alg.
12, 585-603.
R. Canto, P. Koev, B. Ricarte, and M. Urbano {2008). "LDU Factorization of Nonsingular Totally
Positive Matrices," SIAM J. Matrix Anal. Applic. 30, 777-782.
Numerical issues that associated with the factorization of a diagonaly dominant matrix are discussed
in:
J.M. Pena {1998). "Pivoting Strategies Leading to Diagonal Dominance by Rows," Numer. Math.
81, 293-304.
M. Mendoza, M. Raydan, and P. Tarazaga {1999). "Computing the Nearest Diagonally Dominant
Matrix," Numer. Lin. Alg. 5, 461 -474.
A. George and K.D. Ikramov {2005). "Gaussian Elimination Is Stable for the Inverse of a Diagonally
Dominant Matrix," Math. Comput. 73, 653-657.
J.M. Pena {2007). "Strict Diagonal Dominance and Optimal Bounds for the Skeel Condition Number,"
SIAM J. Numer. Anal. 45, 1107-1108.
F. Dopico and P. Koev {2011). "Perturbation Theory for the LDU Factorization and Accurate Com
putations for Diagonally Dominant Matrices," Numer. Math. 119, 337-371.
4.2 Positive Definite Systems
A matrix A E JR.nxn is positive definite if xTAx > 0 for all nonzero x E JR.n, positive
semidefinite if xTAx ;::: 0 for all x E JR.n, and indefinite if we can find x, y E JR.n so
(xTAx) (yTAy) < 0. Symmetric positive definite systems constitute one of the most
important classes of special Ax = b problems. Consider the 2-by-2 symmetric case. If
is positive definite then
x = [ 1, o f
x = [ O, l f
X = ( 1, l JT
x = [ 1, -l jT
A = [; � l
=> xTAx = a: > 0,
=> xTAx = 'Y > 0,
=> xTAx = a: + 2(3 + 'Y > 0,
=> xTAx = a: - 2(3 + 'Y > 0.

The last two equations imply I.Bl � (a+-y)/2. From these results wesee that the largest
entry in A is on the diagonal and that it is positive. This turns out to be true in general.
(See Theorem 4.2.8 below.) A symmetric positive definite matrix has a diagonal that is
sufficiently "weighty" to preclude the need for pivoting. A special factorization called
the Cholesky factorization is available for such matrices. It exploits both symmetry and
definiteness and its implementation is the main focus of this section. However, before
those details are pursued we discuss unsymmetric positive definite matrices. This class
of matrices is important in its own right and and presents interesting pivot-related
issues.
4.2.1 Positive Definiteness
Suppose A E JR.nxn is positive definite. It is obvious that a positive definite matrix is
nonsingular for otherwise we could find a nonzero x so xTAx = 0. However, much
more is implied by the positivity of the quadratic form xTAx as the following results
show.
Theorem 4.2.1. If A E JR.nxn is positive definite and X E JR.nxk has rank k, then
B = xrAX E JR.kxk is also positive definite.
Proof. If z E JR.k satisfies 0 � zTBz = (Xz)TA(Xz), then Xz = 0. But since X has
full column rank, this implies that z = 0. D
Corollary 4.2.2. IfA is positive definite, then all its principal submatrices are positive
definite. In particular, all the diagonal entries are positive.
Proof. If v is an integer length-k vector with 1 � v1 < · · · < Vk � n, then X = In(:,v)
is a rank-k matrix made up of columns v1 , • • • , Vk of the identity. It follows from
Theorem 4.2.1 that A(v, v) = XTAX is positive definite. D
Theorem 4.2.3. The matrix A E JR.nxn is positive definite if and only ifthe symmetric
matrix
has positive eigenvalues.
Proof. Note that xTAx = xTTx. If Tx = AX then xTAx = A · xTx. Thus, if A is
positive definite then A is positive. Conversely, suppose T has positive eigenvalues and
QTTQ = diag(Ai) is its Schur decomposition. (See §2.1.7.) It follows that if x E JR.n
and y = QTx, then
n
xTAx = xTTx = yT(QTTQ)y = L AkY� > 0,
k=l

Corollary 4.2.4. If A is positive definite, then it has an LU factorization and the
diagonal entries ofU are positive.
Proof. From Corollary 4.2.2, it follows that the submatrices A(l:k, l:k) arc nonsingular
for k = l:n and so from Theorem 3.2.1 the factorization A = LU exists. If we apply
Theorem 4.2.1 with X = (L-1)T = L-r, then B = XTAX = L-1 (LU)L-1 = UL-T
is positive definite and therefore has positive diagonal entries. The corollary follows
because L-T is unit upper triangular and this implies bi; = u;; , i = l:n. D
The mere existence of an LU factorization does not mean that its computation
is advisable because the resulting factors may have unacceptably large elements. For
example, if E > 0, then the matrix
A = [ -� � ] [ 1
-m/E � ] [ �
is positive definite. However, if m/E » 1, then it appears that some kind of pivoting
is in order. This prompts us to pose an interesting question. Are there conditions
that guarantee when it is safe to compute the LU-without-pivoting factorization of a
positive definite matrix?
4.2.2 Unsymmetric Positive Definite Systems
The positive definiteness of a general matrix A is inherited from its symmetric part:
A + AT
T =
2
Note that for any square matrix we have A = T + S where
A - AT
s =
2
is the skew-symmetric part of A. Recall that a matrix S is skew symmetric if sr = -S.
If S is skew-symmetric, then xTSx = 0 for all x E IR.n and s ; i = 0, i = l:n. It follows
that A is positive definite if and only if its symmetric part is positive definite.
The derivation and analysis of methods for positive definite systems require an
understanding about how the symmetric and skew-symmetric parts interact during the
LU process.
is positive definite and that B E JR,(n-l)x(n-l) is symmetric and C E JR,(n-l)x(n-l) is
skew-symmetric. Then it fallows that
A - [ 1
(v + w)/o: � l [� (4.2.1)

where
(4.2.2)
is symmetric positive definite and
(4.2.3)
is skew-symmetric.
Proof. Since a -:f. 0 it follows that (4.2.1) holds. It is obvious from their definitions
that B1 is symmetric and that C1 is skew-symmetric. Thus, all we have to show is that
B1 is positive definite i.e.,
for all nonzero z E Rn-1. For any µ E R and 0 -:f. z E Rn-l we have
If µ = -(vTz)/a, then
which establishes the inequality (4.2.4). D
(4.2.4)
From (4.2.1) we see that if B1 +C1 = L1U1 is the LU factorization, then A = LU
where
[ 1 O ] [a (v - w)T l
L =
(v + w)/a Ll 0 U1
·
Thus, the theorem shows that triangular factors in A = LU are nicely bounded if S is
not too big compared to r-1. Here is a result that makes this precise:
Theorem 4.2.6. Let A E Rnxn be positive definite and set T = (A + AT)/2 and
S = (A - AT)/2. If A = LU is the LU factorization, then
Proof. See Golub and Van Loan (1979). D
(4.2.5)
The theorem suggests when it is safe not to pivot. Assume that the computed factors
L and (J satisfy
(4.2.6)

where c is a constant of modest size. It follows from (4.2.1) and the analysis in §3.3
that if these factors are used to compute a solution to Ax = b, then the computed
solution x satisfies (A + E)x = b with
II E llF :5 u (2nll A llF + 4cn2 (II T 112 + II sr-1s 112)) + O(u2).
It is easy to show that II T 112 :5 II A 112, and so it follows that if
n
= II sr-1s 112
II A 112
(4.2.7)
(4.2.8)
is not too large, then it is safe not to pivot. In other words, the norm of the skew
symmetric part S has to be modest relative to the condition of the symmetric part T.
Sometimes it is possible to estimate n
in an application. This is trivially the case when
A is symmetric for then n
= o.
4.2.3 Symmetric Positive Definite Systems
If we apply the above results to a symmetric positive definite matrix we know that
the factorization A = LU exists and is stable to compute. The computation of the
factorization A = LDLT via Algorithm 4.1.2 is also stable and exploits symmetry.
However, for symmetric positive definite systems it is often handier to work with a
variation of LDLT.
Theorem 4.2.7 (Cholesky Factorization). If A E IRnxn is symmetric positive
definite, then there exists a unique lower triangular G E IRnxn with positive diagonal
entries such that A = ca r .
Proof. From Theorem 4.1.3, there exists a unit lower triangular L and a diagonal
D = diag(d1, . . . , dn)
such that A = LDLT. Theorem 4.2.1 tells us that L-1AL-T = D is positive definite.
Thus, the dk are positive and the matrix G = L diag(.;d;., . . . , ../d.::.) is real and lower
triangular with positive diagonal entries. It also satisfies A = GGT. Uniqueness follows
from the uniqueness of the LDLT factorization. D
The factorization A = GGT is known as the Cholesky factorization and G is the
Cholesky factor. Note that if we compute the Cholesky factorization and solve the
triangular systems Gy = b and GTx = y, then b = Gy = G(GTx) = (GGT)x = Ax.
4.2.4 The Cholesky Factor is not a Square Root
A matrix X E IRnxn that satisfies A = X2 is a square root of A. Note that if A
symmetric, positive definite, and not diagonal, then its Cholesky factor is not a square
root. However, if A = GGT and X = UEUT where G = UEVT is the SVD, then
X2 = (UEUT)(UEUT) = UE2UT = (UEVT)(UEVT)T = GGT = A.
Thus, a symmetric positive definite matrix A has a symmetric positive definite square
root denoted by A112• We have more to say about matrix square roots in §9.4.2.

4.2.5 A Gaxpy-Rich Cholesky Factorization
Our proof of the Cholesky factorization in Theorem 4.2.7 is constructive. However,
we can develop a more effective procedure by comparing columns in A = GGT. If
A E R.nxn and 1 ::; j ::; n, then
j
A(:,j) = L G(j, k)·G(:, k).
k= l
This says that
j - 1
G(j,j)G(:,j) = A(:,j) - L G(j, k)·G(:, k) = v. (4.2.9)
k=l
If the first j - 1 columns of G are known, then v is computable. It follows by equating
components in (4.2.9) that
G(j:n,j) = v(j:n)/�
and so we obtain
for j = l:n
v(j:n) = A(j:n,j)
for k = l:j - 1
v(j:n) = v(j:n) - G(j, k)·G(j:n, k)
end
G(j:n,j) = v(j:n)/y'V(])
end
It is possible to arrange the computations so that G overwrites the lower triangle of A.
Algorithm 4.2.1 (Gaxpy Cholesky) Given a symmetric positive definite A E R.nxn,
the following algorithm computes a lower triangular G such that A = GGT. For all
i � j, G(i,j) overwrites A(i,j).
for j = l:n
ifj > 1
A(j:n,j) = A(j:n,j) - A(j:n, l:j - l)·A(j, l:j - l)T
end
A(j:n,j) = A(j:n,j)/JA(j,j)
end
This algorithm requires n3/3 flops.
4.2.6 Stability of the Cholesky Process
In exact arithmetic, we know that a symmetric positive definite matrix has a Cholesky
factorization. Conversely, if the Cholesky process runs to completion with strictly
positive square roots, then A is positive definite. Thus, to find out if a matrix A is

positive definite, we merely try to compute its Cholesky factorization using any of the
methods given above.
The situation in the context of roundoff error is more interesting. The numerical
stability of the Cholesky algorithm roughly follows from the inequality
i
9I; � L9Ik = aii·
k= l
This shows that the entries in the Cholesky triangle are nicely bounded. The same
conclusion can be reached from the equation II G II� = II A 112.
The roundoff errors associated with the Cholesky factorization have been exten
sively studied in a classical paper by Wilkinson (1968). Using the results in this paper,
it can be shown that if x is the computed solution to Ax = b, obtained via the Cholesky
process, then x solves the perturbed system
(A + E)x = b
where Cn is a small constant that depends upon n. Moreover, Wilkinson shows that if
QnU K2(A) � 1 where qn is another small constant, then the Cholesky process runs to
completion, i.e., no square roots of negative numbers arise.
It is important to remember that symmetric positive definite linear systems can
be ill-conditioned. Indeed, the eigenvalues and singular values of a symmetric positive
definite matrix are the same. This follows from (2.4.1) and Theorem 4.2.3. Thus,
(A) =
Amax(A)
K2 Amin(A)
.
The eigenvalue Amin(A) is the "distance to trouble" in the Cholesky setting. This
prompts us to consider a permutation strategy that steers us away from using small
diagonal elements that jeopardize the factorization process.
4.2.7 The LDLT Factorization with Symmetric Pivoting
With an eye towards handling ill-conditioned symmetric positive definite systems, we
return to the LDLT factorization and develop an outer product implementation with
pivoting. We first observe that if A is symmetric and P1 is a permutation, then P1A is
not symmetric. On the other hand, P1AP'[ is symmetric suggesting that we consider
the following factorization:
where
[ l [ l [ l [ ]
T
0: VT 1 0 0: 0 1 0
v B v/o: ln-1 0 A v/o: In-1
- 1 T
A = B - - vv .
0:
Note that with this kind of symmetric pivoting, the new (1,1) entry o: is some diagonal
entry aii· Our plan is to choose Pi so that o: is the largest of A's diagonal entries. If
we apply the same strategy recursively to A and compute
- - - r - - - r
PAP = LDL ,

then we emerge with the factorization
PAPT = LDLT (4.2.10)
where
P = [� � lPi , L D = [� � ]·
By virtue of this pivot strategy,
di � d2 � . . . � dn > 0.
Here is a nonrecursive implementation of the overall algorithm:
Algorithm 4.2.2 {Outer Product LDLT with Pivoting) Given a symmetric positive
semidefinite A E R.nxn, the following algorithm computes a permutation P, a unit
lower triangular L, and a diagonal matrix D = diag{di , . . . , dn) so PAPT = LDLT
with di � d2 � · · · � dn > 0. The matrix element ai; is overwritten by di if i = j
and by ii; if i > j. P = Pi · · · Pn where Pk is the identity with rows k and piv(k)
interchanged.
for k = l:n
piv(k) = j where a;; = max{akk, . . . , ann}
A(k, :) t-t A(j, :)
A(:, k) +.+ A(:,j)
a = A(k, k)
v = A(k + l:n, k)
A(k + l:n, k) = v/a
A(k + l:n, k + l:n) = A(k + l:n, k + l:n) - vvT/a
end
If symmetry is exploited in the outer product update, then n3/3 flops are required. To
solve Ax = b given PAPT = LDLT, we proceed as follows:
Lw = Pb, Dy = w, x = PTz.
We mention that Algorithm 4.2.2 can be implemented in a way that only references
the lower trianglar part of A.
It is reasonable to ask why we even bother with the LDLT factorization given that
it appears to offer no real advantage over the Cholesky factorization. There are two
reasons. First, it is more efficient in narrow band situations because it avoids square
roots; see §4.3.6. Second, it is a graceful way to introduce factorizations of the form
p
APT = ( lower ) ( simple ) ( lower )T
triangular x matrix x triangular '
where P is a permutation arising from a symmetry-exploiting pivot strategy. The
symmetric indefinite factorizations that we develop in §4.4 fall under this heading as
does the "rank revealing" factorization that we are about to discuss for semidefinite
problems.

4.2. Positive Definite Systems
4.2.8 The Symmetric Semidefinite Case
A symmetric matrix A E Rnxn is positive semidefinite if
xTAx � 0
167
for every x E Rn. It is easy to show that if A E Rnxn is symmetric and positive
semidefinite, then its eigenvalues satisfy
0 = An(A) = . . . = Ar+i (A) < Ar(A) � . . . � Ai(A) {4.2.11)
where r is the rank of A. Our goal is to show that Algorithm 4.2.2 can be used to
estimate r and produce a streamlined version of {4.2.10). But first we establish some
useful properties.
Theorem 4.2.8. If A E Rnxn is symmetric positive semidefinite, then
{i ;:/; j),
aii = 0 '* A(i, :) = 0, A(:, i) = 0.
Proof. Let ei denote the ith column of In. Since
it follows that
x = ei + e; '* 0 � xTAx = aii + 2ai; + a;;,
x = ei - e; '* 0 � xTAx = aii - 2aii + a;;,
These two equations confirm {4.2.12), which in turn implies (4.2.14).
To prove {4.2.13), set x = Tei + e; where T E R. It follows that
0 < xTAx = aiiT2 + 2ai;T + a;;
{4.2.12)
(4.2.13)
(4.2.14)
(4.2.15)
must hold for all T. This is a quadratic equation in T and for the inequality to hold,
the discriminant 4a�i - 4aiia;; must be negative, i.e., lai;I � Jaiiaii· The implication
in {4.2.15) follows immediately from (4.2.13). D
Let us examine what happens when Algorithm 4.2.2 is applied to a rank-r positive
semidefinite matrix. If k � r, then after k steps we have the factorization
{4.2.16)

where Dk = diag(di. . . . , dk) E Rkxk and di � · · · � dk � 0. By virtue of the pivot
strategy, if dk = 0, then Ak has a zero diagonal. Since Ak is positive semidefinite, it
follows from (4.2.15) that Ak = 0. This contradicts the assumption that A has rank r
unless k = r. Thus, if k � r, then dk > 0. Moreover, we must have Ar = 0 since A has
the same rank as diag(Dr. Ar)· It follows from (4.2.16) that
T [Ln ] [ T
I
T )
PAP = Dr L11 L21
L21
(4.2.17)
where Dr = diag(di, . . . , dr) has positive diagonal entries, £11 E m_rxr is unit lower
triangular, and £21 E IR(n-
r) xr. If ii is the jth column of the £-matrix, then we can
rewrite (4.2.17) as a sum of rank-1 matrices:
r
PAPT = L di iilf.
j=l
This can be regarded as a relatively cheap alternative to the SVD rank-1 expansion.
It is important to note that our entire semidefinite discussion has been an exact
arithmetic discussion. In practice, a threshold tolerance for small diagonal entries
has to be built into Algorithm 4.2.2. If the diagonal of the computed Ak in (4.2.16)
is sufficiently small, then the loop can be terminated and r can be regarded as the
numerical rank of A. For more details, see Higham (1989).
4.2.9 Block Cholesky
Just as there are block methods for computing the LU factorization, so are there are
block methods for computing the Cholesky factorization. Paralleling the derivation of
the block LU algorithm in §3.2.11, we start by blocking A = GGT as follows
(4.2.18)
Here, A11 E m_rxr, A22 E IR(n-r) x(n- ..) , r is a blocking parameter, and G is partitioned
conformably. Comparing blocks in (4.2.18) we conclude that
Au = G11 Gf1,
A21 = G21Gf1,
A22 = G21GI1 + G22GI2.
which suggests the following 3-step procedure:
Step 1: Compute the Cholesky factorization of Au to get Gn .
Step 2: Solve a lower triangular multiple-right-hand-side system for G21·
Step 3: Compute the Cholesky factor G22 of A22 - G21Gf1 = A22 - A21A]i1Af1·
In recursive form we obtain the following algorithm.

Algorithm 4.2.3 (Recursive Block Cholesky) Suppose A E nnxn is symmetric pos
itive definite and r is a positive integer. The following algorithm computes a lower
triangular G E Rnxn so A = GG1'.
function G = BlockCholesky(A, n, r)
if n � r
Compute the Cholesky factorization A = GGr.
else
Compute the Cholesky factorization A(l:r, l:r) = G11Gf1 .
Solve G21Gf1 = A(r + l:n, l:r) for G21·
A = A(r + l:n, r + l:n) - G21Gf1
G22 = BlockCholesky(A, n - r, r)
G =
[�:: G�2 ]
end
end
If symmetry is exploited in the computation of A, then this algorithm requires n3/3
flops. A careful accounting of flops reveals that the level-3 fraction is about 1 - 1/N2
where N � n/r. The "small" Cholesky computation for Gu and the "thin" solution
process for G21 are dominated by the "large" level-3 update for A.
To develop a nonrccursive implementation, we assume for clarity that n = Nr
where N is a positive integer and consider the partitioning
0 l[Gu
G�1
0
r (4.2.19)
where all blocks are r-by-r. By equating (i, j) blocks with i ;:::: j it follows that
j
Aij = L:GikG]k·
k=l
Define
j-1
s Aij - L GikG%
k=1
If i = j, then Gjj is the Cholesky factor of S. If i > j, then GijG'£ = S and Gij is the
solution to a triangular multiple right hand side problem. Properly sequenced, these
equations can be arranged to compute all the G-blocks.

Algorithm 4.2.4 (Nonrecursive Block Cholesky) Given a symmetric positive definite
A E Rnxn with n = Nr with blocking (4.2.19), the following algorithm computes a
lower triangular G E Rnxn such that A = GGT. The lower triangular part of A is
overwritten by the lower triangular part of G.
for j = l:N
end
for i = j:N
j-1
Compute S = Aii - L GikG%.
k=l
if i = j
Compute Cholesky factorization S = GiiGh.
else
Solve GiiGh = S for Gii·
end
Aii = Gii·
end
The overall process involves n3/3 flops like the other Cholesky procedures that we have
developed. The algorithm is rich in matrix multiplication with a level-3 fraction given
by 1 - (1/N2). The algorithm can be easily modified to handle the case when r does
not divide n.
4.2.10 Recursive Blocking
It is instructive to look a little more deeply into the implementation of a block Cholesky
factorization as it is an occasion to stress the importance of designing data structures
that are tailored to the problem at hand. High-performance matrix computations
are filled with tensions and tradeoffs. For example, a successful pivot strategy might
balance concerns about stability and memory traffic. Another tension is between per
formance and memory constraints. As an example of this, we consider how to achieve
level-3 performance in a Cholesky implementation given that the matrix is represented
in packed format. This data structure houses the lower (or upper) triangular portion
of a matrix A E Rnxn in a vector of length N = n(n + 1)/2. The symvec arrangement
stacks the lower triangular subcolumns, e.g.,
(4.2.20)
This layout is not very friendly when it comes to block Cholesky calculations because
the assembly of an A-block (say A(i1:i2,j1:h)) involves irregular memory access pat
terns. To realize a high-performance matrix multiplication it is usually necessary to
have the matrices laid out conventionally as full rectangular arrays that are contiguous
in memory, e.g.,
vec(A) = [ a11 a21 aa1 a41 a12 a22 aa2 a42 a13 a23 aaa a43 a14 a24 aa4 a44 f. (4.2.21)
(Recall that we introduced the vec operation in §1.3.7.) Thus, the challenge is to de
velop a high performance block algorithm that overwrites a symmetric positive definite
A in packed format with its Cholesky factor G in packed format. Toward that end, we

present the main ideas behind a recursive data structure that supports level-3 compu
tation and is storage efficient. As memory hierarchies get deeper and more complex,
recursive data structures are an interesting way to address the problem of blocking for
performance.
The starting point is once again a 2-by-2 blocking of the equation A = GGT:
However, unlike in (4.2.18) where A11 has a chosen block size, we now assume that
A11 E nrxm where m = ceil(n/2). In other words, the four blocks are roughly the
same size. As before, we equate entries and identify the key subcomputations:
half-sized Cholesky.
multiple-right-hand-side triangular solve.
-
T
A22 = A22 - G21G21 symmetric matrix multiplication update.
T
-
G22G22 = A22 half-sized Cholesky.
Our goal is to develop a symmetry-exploiting, level-3-rich procedure that overwrites
A with its Cholesky factor G. To do this we introduce the mixed packed format. An
n = 9 example with A11 E R5x5 serves to distinguish this layout from the conventional
packed format layout:
1 1
2 10 2 6
3 11 18 3 'l 10
4 12 19 25 4 8 11 13
5 13 20 26 31 5 9 12 14 15
6 14 21 27 32 36 16 20 24 28 32 36
7 15 22 28 33 37 40 17 21 25 29 33 37 40
8 16 23 29 34 38 41 43 18 22 26 30 34 38 41 43
9 17 24 30 35 39 42 44 45 19 23 27 31 35 39 42 44 45
Packed format Mixed packed format
Notice how the entries from A11 and A21 are shuffled with the conventional packed
format layout. On the other hand, with the mixed packed format layout, the 15 entries
that define A11 are followed by the 20 numbers that define A21 which in turn are
followed by the 10 numbers that define A22. The process can be repeated on A11 and

1
2 4
3 5 6
7 9 11 13
8 10 12 14 15
16 20 24 28 32 36
17 21 25 29 33 37 38
18 22 26 30 34 39 41 43
19 23 27 31 35 40 42 44 45
Thus, the key to this recursively defined data layout is the idea of representing square
diagonal blocks in a mixed packed format. To be precise, recall the definition of vec
and symvec in (4.2.20) and (4.2.21). If C E 1Rqxq is such a block, then
[symvec(C11) l
mixvec(C) = vec(C21)
symvec(C22)
(4.2.22)
where m = ceil(q/2), C11 = C(l:m, l:m), C22 = C(m + l:n, m + l:n), and C21 =
C(m + l:n, l:m). Notice that since C21 is conventionally stored, it is ready to be
engaged in a high-performance matrix multiplication.
We now outline a recursive, divide-and-conquer block Cholesky procedure that
works with A in packed format. To achieve high performance the incoming A is con
verted to mixed format at each level of the recursion. Assuming the existence of a
triangular system solve procedure TriSol (for the system G21Gf1 = A21) and a sym
metric update procedure SymUpdate (for A22 +-- A22 - G21Gf1) we have the following
framework:
function G = PackedBlockCholesky(A)
{A and G in packed format}
n = size(A)
if n ::::; Tl.min
G is obtained via any levcl-2, packed-format Cholesky method .
else
Set m = ceil(n/2) and overwrite A's packed-format representation
with its mixed-format representation.
G11 = PackedBlockCholesky(A11)
G21 = TriSol(G11 , A21)
A22 = SymUpdate(A22, G21)
G22 = PackedBlockCholesky(A22)
end

Here, nmin is a threshold dimension below which it is not possible to achieve level-
3 performance. To take full advantage of the mixed format, the procedures TriSol
and SymUpdate require a recursive design based on blackings that halve problem size.
For example, TriSol should take the incoming packed format A11 , convert it to mixed
format, and solve a 2-by-2 blocked system of the form
This sets up a recursive solution based on the half-sized problems
X1Lf1 = Bi,
X2LI2 = B2 - X1LI1·
Likewise, SymUpdate should take the incoming packed format A22, convert it to mixed
format, and block the required update as follows:
The evaluation is recursive and based on the half-sized updates
Cn = C11 - Yi Yt,
C21 = C21 - Y2Yt,
C22 = C22 - Y2Y{.
Of course, if the incoming matrices are small enough relative to nmin, then TriSol and
SymUpdate carry out their tasks conventionally without any further subdivisions.
Overall, it can be shown that PackedBlockCholesky has a level-3 fraction approx
imately equal to 1 - O(nmin/n).
Problems
P4.2.1 Suppose that H = A + iB is Hermitian and positive definite with A, B E Rnxn. This means
that xHHx > 0 whenever x -:f. 0. (a) Show that
c = [ ; -� ]
is symmetric and positive definite. {b) Formulate an algorithm for solving (A + iB)(x + iy) = (b + ic),
where b, c, x, and y are in Rn. It should involve 8n3/3 flops. How much storage is required?
P4.2.2 Suppose A E Rn xn is symmetric and positive definite. Give an algorithm for computing an
upper triangular matrix R E Rnxn such that A = RRT.
P4.2.3 Let A E Rnxn be positive definite and set T = (A + AT)/2 and S = (A - AT)/2. (a) Show
that II A-1 112 $ II r-1 112 and XTA-1x $ xTr-1x for all x E Rn . (b) Show that if A = LDMT, then
dk � 1/11 r-1 112 for k = l:n.
P4.2.4 Find a 2-by-2 real matrix A with the property that xTAx > 0 for all real nonzero 2-vectors
but which is not positive definite when regarded as a member of {:2x2
•
P4.2.5 Suppose A E E'xn has a positive diagonal. Show that if both A and AT are strictly diagonally

dominant, then A is positive definite.
P4.2.6 Show that the function f(x) =v'xTAx/2 is a vector norm on Rn if and only if A is positive
definite.
P4.2.7 Modify Algorithm 4.2.1 so that if the square root of a negative number is encountered, then
the algorithm finds a unit vector x so that xTAx < 0 and terminates. ·
P4.2.8 Develop an outer product implementation of Algorithm 4.2.1 and a gaxpy implementation of
Algorithm 4.2.2.
P4.2.9 Assume that A E ccnxn is Hermitian and positive definite. Show that if an =· · · =ann = 1
and lai; I < 1 for all i =j: j, then diag(A-1) :;::: diag((Re(A))-1).
P4.2.10 Suppose A =I+uuT where A E Rnxn and II u 112 =1. Give explicit formulae for the diagonal
and subdiagonal of A's Cholesky factor.
P4.2.11 Suppose A E Rnxn is symmetric positive definite and that its Cholesky factor is available.
Let ek = In(:, k). For 1 � i < j � n, let Ot.ij be the smallest real that makes A + a(eief + e;e'[)
singular. Likewise, let Ot.ii be the smallest real that makes (A+aeie'[) singular. Show how to compute
these quantities using the Sherman-Morrison-Woodbury formula. How many flops are required to find
all the Ot.i;?
P4.2.12 Show that if
M = [ :T � ]
is symmetric positive definite and A and C are square, then
[ A-1 + A-1BS- 1BTA- 1
M-1 = 5-1 BTA-1
-A-1 Bs-1 ]
s-1 '
P4.2.13 Suppose u E R and u E Rn. Under what conditions can we find a matrix X E Jr'Xn so that
X(I + uuuT)X = In? Give an efficient algorithm for computing X if it exists.
P4.2.14 Suppose D =diag(di , . . . , dn) with d; > 0 for all i. Give an efficient algorithm for computing
the largest entry in the matrix (D + CCT)-1 where C E Fxr. Hint: Use the Sherman-Morrison
Woodbury formula.
P4.2.15 Suppose A(.>.) has continuously differentiable entries and is always symmetric and positive
definite. If /(.>.) =log(det(A(.>.))), then how would you compute f'(O)?
P4.2.16 Suppose A E Rnxn is a rank-r symmetric positive semidefinite matrix. Assume that it costs
one dollar to evaluate each aij· Show how to compute the factorization (4.2.17) spending only O(nr)
dollars on ai; evaluation.
P4.2.17 The point of this problem is to show that from the complexity point of view, if you have a
fast matrix multiplication algorithm, then you have an equally fast matrix inversion algorithm, and
vice versa. (a) Suppose Fn is the number of flops required by some method to form the inverse of an
n-by-n matrix. Assume that there exists a constant c1 and a real number 0t. such that Fn � c1n"' for
all n. Show that there is a method that can compute the n-by-n matrix product AB with fewer than
c2n"' flops where c2 is a constant independent of n. Hint: Consider the inverse of
A
In
0 � ]·
In
(b) Let Gn be the number of flops required by some method to form the n-by-n matrix product AB.
Assume that there exists a constant c1 and a real number 0t. such that Gn � c1n"' for all n. Show that
there is a method that can invert a nonsingular n-by-n matrix A with fewer than c2n"' flops where c2
is a constant. Hint: First show that the result applies for triangular matrices by applying recursion to
Then observe that for general A, A-1 = AT(AAT)-1 = ATa-Ta- 1 where AAT = GGT is the
Cholesky factorization.

For an in-depth theoretical treatment of positive definiteness, see:
R. Bhatia (2007). Positive Definite Matrices, Princeton University Press, Princeton, NJ.
The definiteness of the quadratic form xTAx can frequently be established by considering the math
ematics of the underlying problem. For example, the discretization of certain partial differential op
erators gives rise to provably positive definite matrices. Aspects of the unsymmetric positive definite
problem are discussed in:
A. Buckley (1974). "A Note on Matrices A = I + H, H Skew-Symmetric," Z. Angew. Math. Mech.
54, 125-126.
A. Buckley (1977). "On the Solution of Certain Skew-Symmetric Linear Systems," SIAM J. Numer.
Anal. 14, 566-570.
G.H. Golub and C. Van Loan (1979). "Unsymmetric Positive Definite Linear Systems," Lin. Alg.
Applic. 28, 85-98.
R. Mathias (1992). "Matrices with Positive Definite Hermitian Part: Inequalities and Linear Systems,"
K.D. Ikramov and A.B. Kucherov (2000). "Bounding the growth factor in Gaussian elimination for
Buckley's class of complex symmetric matrices," Numer. Lin. Alg. 7, 269-274.
Complex symmetric matrices have the property that their real and imaginary parts are each symmetric.
The following paper shows that if they are also positive definite, then the LDLT factorization is safe
to compute without pivoting:
S. Serbin (1980). "On Factoring a Class of Complex Symmetric Matrices Without Pivoting," Math.
Comput. 35, 1231-1234.
Historically important Algol implementations of the Cholesky factorization include:
R.S. Martin, G. Peters, and J.H. Wilkinson {1965). "Symmetric Decomposition of a Positive Definite
Matrix," Numer. Math. 7, 362-83.
R.S. Martin, G. Peters, and .J.H. Wilkinson (1966). "Iterative Refinement of the Solution of a Positive
Definite System of Equations," Numer. Math. 8, 203-16.
F.L. Bauer and C. Reinsch (1971). "Inversion of Positive Definite Matrices by the Gauss-Jordan
Method," in Handbook /or Automatic Computation Vol. 2, Linear Algebra, J.H. Wilkinson and C.
Reinsch (eds.), Springer-Verlag, New York, 45-49.
For roundoff error analysis of Cholesky, see:
J.H. Wilkinson (1968). "A Priori Error Analysis of Algebraic Processes," Proceedings of the Interna
tional Congress on Mathematics, Izdat. Mir, 1968, Moscow, 629-39.
J. Meinguet (1983). "Refined Error Analyses of Cholesky Factorization," SIAM .J. Numer. Anal. 20,
1243-1250.
A. Kielbasinski (1987). "A Note on Rounding Error Analysis of Cholesky Factorization," Lin. Alg.
Applic. 88/89, 487-494.
N.J. Higham (1990). "Analysis of the Cholesky Decomposition of a Semidefinite Matrix," in Reliable
Numerical Computation, M.G. Cox and S.J. Hammarling (eds.), Oxford University Press, Oxford,
U.K., 161 -185.
J-Guang Sun (1992). "Rounding Error and Perturbation Bounds for the Cholesky and LDLT Factor-
izations," Lin. Alg. Applic. 1 73, 77-97.
The floating point determination of positive definiteness is an interesting problem, see:
S.M. Rump (2006). "Verification of Positive Definiteness," BIT 46, 433-452.
The question of how the Cholesky triangle G changes when A = GGT is perturbed is analyzed in:
G.W. Stewart (1977). "Perturbation Bounds for the QR Factorization of a Matrix," SIAM J. Num.
Anal. 14, 509-18.
Z. Dramac, M. Omladic, and K. Veselic (1994). "On the Perturbation of the Cholesky Factorization,"
X-W. Chang, C.C. Paige, and G.W. Stewart (1996). "New Perturbation Analyses for the Cholesky
Factorization," IMA J. Numer. Anal. 16, 457-484.

G.W. Stewart (1997) "On the Perturbation of LU and Cholesky Factors," IMA J. Numer. Anal. 1 7,
1-6.
Nearness/sensitivity issues associated with positive semidefiniteness are presented in:
N.J. Higham (1988). "Computing a Nearest Symmetric Positive Semidefinite Matrix," Lin. Alg.
Applic. 103, 103-118.
The numerical issues associated with semi-definite rank determination are covered in:
P.C. Hansen and P.Y. Yalamov (2001). "Computing Symmetric Rank-Revealing Decompositions via
Triangular Factorization," SIAM J. Matrix Anal. Applic. 23, 443-458.
M. Gu and L. Miranian (2004). "Strong Rank-Revealing Cholesky Factorization," ETNA 1 7, 76-92.
The issues that surround level-3 performance of packed-format Cholesky arc discussed in:
F.G. Gustavson (1997). "Recursion Leads to Automatic Variable 13locking for Dense Linear-Algebra
Algorithms," IBM J. Res. Dev. 41, 737-756.
F.G. Gustavson, A. Henriksson, I. Jonsson, B. Kagstrom, , and P. Ling (1998). "Recursive Blocked
Data Formats and BLAS's for Dense Linear Algebra Algorithms," Applied Parallel Computing
Large Scale Scientific and Industrial Problems, Lecture Notes in Computer Science, Springer
Verlag, 1541/1998, 195-206.
F.G. Gustavson and I. Jonsson (2000). "Minimal Storage High-Performance Cholesky Factorization
via Blocking and Recursion," IBM J. Res. Dev. 44, 823-849.
B.S. Andersen, J. Wasniewski, and F.G. Gustavson (2001). "A Recursive Formulation of Cholesky
Factorization of a Matrix in Packed Storage," ACM Trans. Math. Softw. 27, 214-244.
E. Elmroth, F. Gustavson, I. Jonsson, and B. Kagstrom, (2004). "Recursive 13locked Algorithms and
Hybrid Data Structures for Dense Matrix Library Software," SIAM Review 46, 3-45.
F.G. Gustavson, J. Wasniewski, J.J. Dongarra, and J. Langou (2010). "Rectangular Full Packed
Format for Cholesky's Algorithm: Factorization, Solution, and Inversion," ACM Trans. Math.
Softw. 37, Article 19.
Other high-performance Cholesky implementations include:
F.G. Gustavson, L. Karlsson, and B. Kagstrom, (2009). "Distributed SBP Cholesky Factorization
Algorithms with Near-Optimal Scheduling," ACM Trans. Math. Softw. 36, Article 11.
G. Ballard, J. Demmel, 0. Holtz, and 0. Schwartz (2010). "Communication-Optimal Parallel and
Sequential Cholesky," SIAM ./. Sci. Comput. 32, 3495-3523.
P. Bientinesi, B. Gunter, and R.A. van de Geijn (2008). "Families of Algorithms Related to the
Inversion of a Symmetric Positive Definite Matrix," ACM Trans. Math. Softw. 35, Article 3.
M.D. Petkovic and P.S. Stanimirovic (2009). "Generalized Matrix Inversion is not Harder than Matrix
Multiplication," J. Comput. Appl. Math. 280, 270-282.
4.3 Banded Systems
In many applications that involve linear systems, the matrix of coefficients is banded.
This is the case whenever the equations can be ordered so that each unknown Xi appears
in only a few equations in a "neighborhood"of the ith equation. Recall from §1.2.l that
A = (aij) has upper bandwidth q if aij = 0 whenever j > i + q and lower bandwidth p if
llij = 0 whenever i > j +p. Substantial economies can be realized when solving banded
systems because the triangular factors in LU, GGT, and LDLT are also banded.
4.3.1 Band LU Factorization
Our first result shows that if A is banded and A = LU, then L inherits the lower
bandwidth of A and U inherits the upper bandwidth of A.

4.3. Banded Systems 177
Theorem 4.3.1. Suppose A E Rnxn has an LUfactorization A = LU. IfA has upper
bandwidth q and lower bandwidth p, then U has upper bandwidth q and L has lower
bandwidth p.
Proof. The proof is by induction on n. Since
[a
WT
l [ 1 0 l [1 0 l [a WT l
A =
v B
=
v/a In-1 0 B - vwT/a 0 In-1
It is clear that B - vwT/a has upper bandwidth q and lower bandwidth p because only
the first q components of w and the first p components of v are nonzero. Let L1U1 be
the LU factorization of this matrix. Using the induction hypothesis and the sparsity
of w and v, it follows that
have the desired bandwidth properties and satisfy A = LU. D
The specialization of Gaussian elimination to banded matrices having an LU factoriza
tion is straightforward.
Algorithm 4.3.1 (Band Gaussian Elimination) Given A E Rnxn with upper band
width q and lower bandwidth p, the following algorithm computes the factorization
A = LU, assuming it exists. A(i, j) is overwritten by L(i, j) if i > j and by U(i, j)
otherwise.
for k = l:n - 1
end
for i = k + l:min{k + p, n}
A(i, k) = A(i, k)/A(k, k)
end
for j = k + l:min{k + q, n}
end
for i = k + l:min{k + p, n}
A(i, j) = A(i, j) - A(i, k) ·A(k, j)
end
If n » p and n » q, then this algorithm involves about 2npq flops. Effective imple
mentations would involve band matrix data structures; see §1.2.5. A band version of
Algorithm 4.1.1 (LDLT) is similar and we leave the details to the reader.
4.3.2 Band Triangular System Solving
Banded triangular system solving is also fast. Here are the banded analogues of Algo
rithms 3.1.3 and 3.1.4:

Algorithm 4.3.2 (Band Forward Substitution) Let L E Rnxn be a unit lower triangu
lar matrix with lower bandwidth p. Given b E Rn, the following algorithm overwrites
b with the solution to Lx = b.
for j = l:n
end
for i = j + l:min{j + p, n}
b(i) = b(i) - L(i,j)·b(j)
end
If n »p, then this algorithm requires about 2np fl.ops.
Algorithm 4.3.3 (Band Back Substitution) Let U E Rnxn be a nonsingular upper
triangular matrix with upper bandwidth q. Given b E R", the following algorithm
overwrites b with the solution to Ux = b.
for j = n: - 1:1
end
b(j) = b(j)/U(j,j)
for i = max{l,j - q}:j - 1
b(i) = b(i) - U(i,j) ·b(j)
end
If n » q, then this algorithm requires about 2nq fl.ops.
4.3.3 Band Gaussian Elimination with Pivoting
Gaussian elimination with partial pivoting can also be specialized to exploit band
structure in A. However, if PA = LU, then the band properties ofL and U are not quite
so simple. For example, if A is tridiagonal and the first two rows are interchanged at the
very first step of the algorithm, then u1a is nonzero. Consequently, row interchanges
expand bandwidth. Precisely how the band enlarges is the subject of the following
theorem.
Theorem 4.3.2. Suppose A E Rnxn is nonsingular and has upper and lower band
widths q and p, respectively. If Gaussian elimination with partial pivoting is used to
compute Gauss transformations
M; = I - o<i>eJ j = l:n - 1
and permutations Pi, . . . , Pn-1 such that Mn-1Pn-1 · · · M1P1A = U is upper triangu
lar, then U has upper bandwidth p + q and o�i)= 0 whenever i � j or i > j + p.
Proof. Let PA = LU be the factorization computed by Gaussian elimination with
partial pivoting and recall that P = Pn-1 · · · P1. Write pT = [ e81 I · · · I e8" ] , where
{si. ..., sn} is a permutation of {1, 2, ..., n}. If Si > i +p then it follows that the leading
i- by-i principal submatrix of PA is singular, since [PA]ii = as;,i for j = l:si - p - 1
and Si - p - 1 � i. This implies that U and A are singular, a contradiction. Thus,

Si ::; i + p for i = l:n and therefore, PA has upper bandwidth p + q. It follows from
Theorem 4.3.1 that U has upper bandwidth p + q. The assertion about the aW can
be verified by observing that Mi need only zero elements (j + 1,j), . . . , (j +p, j) of the
partially reduced matrix PjMj-1Pj-1 · · ·1 P1A. 0
Thus, pivoting destroys band structure in the sense that U becomes "wider" than A's
upper triangle, while nothing at all can be said about the bandwidth of L. However,
since the jth column of L is a permutation of the jth Gauss vector Oj , it follows that
L has at most p + 1 nonzero elements per column.
4.3.4 Hessenberg LU
As an example of an unsymmetric band matrix computation, we show how Gaussian
elimination with partial pivoting can be applied to factor an upper Hessenberg matrix
H. (Recall that if H is upper Hessenberg then hii = 0, i > j + L) After k - 1 steps
of Gaussian elimination with partial pivoting we are left with an upper Hessenberg
matrix of the form
[�
x x x
x x x
0 x x
0 x x
0 0 x
k = 3, n = 5.
By virtue of the special structure of this matrix, we see that the next permutation, Pa,
is either the identity or the identity with rows 3 and 4 interchanged. Moreover, the
next Gauss transformation Mk has a single nonzero multiplier in the (k+ 1, k) position.
This illustrates the kth step of the following algorithm.
Algorithm 4.3.4 (Hessenberg LU) Given an upper Hessenberg matrix H E lRnxn, the
following algorithm computes the upper triangular matrix Mn-1Pn-1 · · · Mi P1H = U
where each Pk is a permutation and each Mk is a Gauss transformation whose entries
are bounded by unity. H(i, k) is overwritten with U(i, k) if i :$ k and by -[Mk]k+l,k
if i = k + 1. An integer vector piv(l:n - 1) encodes the permutations. If Pk = I, then
piv(k) = 0. If Pk interchanges rows k and k + 1, then piv(k) = 1.
for k = l:n - 1
end
if IH(k, k)I < IH(k + 1, k)I
piv(k) = 1; H(k, k:n) t-t H(k + 1, k:n)
else
piv(k) = 0
end
if H(k, k) "::f 0
end
T = H(k + l, k)/H(k, k)
H(k + 1, k + l:n) = H(k + 1, k + l:n) - r·H(k, k + l:n)
H(k + l, k) = T
This algorithm requires n2 flops.

4.3.5 Band Cholesky
The rest of this section is devoted to banded Ax = b problems where the matrix A is
also symmetric positive definite. The fact that pivoting is unnecessary for such matrices
leads to some very compact, elegant algorithms. In particular, it follows from Theorem
4.3.1 that if A = GGT is the Cholesky factorization of A, then G has the same lower
bandwidth as A. This leads to the following banded version of Algorithm 4.2.1.
Algorithm 4.3.5 (Band Cholesky) Given a symmetric positive definite A E 1Rnxn
with bandwidth p, the following algorithm computes a lower triangular matrix G with
lower bandwidth p such that A = GGT. For all i � j, G(i,j) overwrites A(i,j).
for j = l:n
end
for k = max(l,j - p):j - 1
>. = min(k + p, n)
A(j:>.,j) = A(j:>.,j) - A(j, k) ·A(j:>., k)
end
>. = min(j + p, n)
A(j:>.,j) = A(j:>.,j)/JA(j,j)
If n » p, then this algorithm requires about n(p2 + 3p) flops and n square roots. Of
course, in a serious implementation an appropriate data structure for A should be used.
For example, if we just store the nonzero lower triangular part, then a (p + 1)-by-n
array would suffice.
If our band Cholesky procedure is coupled with appropriate band triangular solve
routines, then approximately np2 + 7np + 2n flops and n square roots are required to
solve Ax = b. For small p it follows that the square roots represent a significant portion
of the computation and it is preferable to use the LDLT approach. Indeed, a careful
flop count of the steps A = LDLT, Ly = b, Dz = y, and LTx = z reveals that
np2 + Bnp + n flops and no square roots arc needed.
4.3.6 Tridiagonal System Solving
As a sample narrow band LDLT solution procedure, we look at the case of symmetric
positive definite tridiagonal systems. Setting
L =
[i
0
1
�I
and D = diag(d1 , . . • , dn), we deduce from the equation A = LDLT that
au = di,
ak,k-1 = lk-1dk-i.
akk = dk + �-idk-1 = dk + lk-1ak,k-i.
k = 2:n,
k = 2:n.

4.3. Banded Systems
Thus, the di and ii can be resolved as follows:
di = au
for k = 2:n
end
ik-t = ak,k-i/dk-1
dk = akk - ik-1ak,k-1
181
To obtain the solution to Ax = b we solve Ly = b, Dz = y, and LTx = z. With
overwriting we obtain
Algorithm 4.3.6 (Symmetric, Tridiagonal, Positive Definite System Solver) Given
an n- by-n symmetric, tridiagonal, positive definite matrix A and b E Rn, the following
algorithm overwrites b with the solution to Ax = b. It is assumed that the diagonal of
A is stored in a(l:n) and the superdiagonal in .B(l:n - 1).
for k = 2:n
t = .B(k - 1), .B(k - 1) = t/a(k - 1), a(k) = a(k) - t·.B(k - 1)
end
for k = 2:n
b(k) = b(k) - (:J(k - 1)·.B(k - 1)
end
b(n) = b(n)/a(n)
for k = n - 1: - 1:1
b(k) = b(k)/a(k) - .B(k) ·b(k + 1)
end
This algorithm requires 8n flops.
4.3.7 Vectorization Issues
The tridiagonal example brings up a sore point: narrow band problems and vectoriza
tion do not mix. However, it is sometimes the case that large, independent sets of such
problems must be solved at the same time. Let us examine how such a computation
could be arranged in light of the issues raised in §1.5. For simplicity, assume that we
must solve the n-by-n unit lower bidiagonal systems
k = l:m,
and that m � n. Suppose we have arrays E(l:n - 1, l:m) and B(l:n, l:m) with the
property that E(l:n - 1, k) houses the subdiagonal of A(k) and B(l:n, k) houses the
kth right-hand side b(k) . We can overwrite b(k) with the solution x(k) as follows:
for k = l:m
end
for i = 2:n
B(i, k) = B(i, k) - E(i - 1, k)·B(i - 1, k)
end

This algorithm sequentially solves each bidiagonal system in turn. Note that the inner
loop does not vectorize because of the dependence of B(i, k) on B(i - 1, k). However,
if we interchange the order of the two loops, then the calculation does vectorize:
for i = 2:n
B(i, :) = B(i, : ) - E(i - 1, : ) . * B(i - 1, : )
end
A column-oriented version can be obtained simply by storing the matrix subdiagonals
by row in E and the right-hand sides by row in B:
for i = 2:n
B( : , i) = B( : , i) - E( : , i - 1) . * B( : , i - 1)
end
Upon completion, the transpose of solution x(k} is housed on B(k, : ).
4.3.8 The Inverse of a Band Matrix
In general, the inverse of a nonsingular band matrix A is full. However, the off-diagonal
blocks of A-1 have low rank.
A =
[Au A12 l
A21 A22
is nonsingular and has lower bandwidth p and upper bandwidth q. Assume that the
diagonal blocks are square. If
is partitioned conformably, then
rank(X21) :5 p,
rank(X12) :5 q.
(4.3.1)
(4.3.2)
Proof. Assume that Au and A22 are nonsingular. From the equation AX = I we
conclude that
and so
A21Xu + A22X21 = 0,
AuX12 + A12X22 = 0,
rank(X21) = rank(A221A21X11) :5 rank(A21)
rank(X12) = rank(AilA12X22) < rank(A12).

From the handedness assumptions it follows that A21 has at most p nonzero rows and
A12 has at most q nonzero rows. Thus, rank(A21) � p and rank(A12) � q which proves
the theorem for the case when both An and A22 are nonsingular. A simple limit
argument can be used to handle the situation when An and/or A22 are singular. See
P4.3.ll. D
It can actually be shown that rank(A21) = rank(X21) and rank(A12) = rank(X12). See
Strang and Nguyen (2004). As we will see in §11.5.9 and §12.2, the low-rank, off
diagonal structure identified by the theorem has important algorithmic ramifications.
4.3.9 Band Matrices with Banded Inverse
If A E 1Rnxn is a product
(4.3.3)
and each Fi E JRnxn is block diagonal with 1-by-l and 2-by-2 diagonal blocks, then it
follows that both A and
A-1 _
F-1 F-1
- N • • • 1
are banded, assuming that N is not too big. For example, if
x 0 0 0 0 0 0 0 0 x x 0 0 0 0 0 0 0
0 x x 0 0 0 0 0 0 x x 0 0 0 0 0 0 0
0 x x 0 0 0 0 0 0 0 0 x x 0 0 0 0 0
0 0 0 x x 0 0 0 0 0 0 x x 0 0 0 0 0
A 0 0 0 x x 0 0 0 0 0 0 0 0 x x 0 0 0
0 0 0 0 0 x x 0 0 0 0 0 0 x x 0 0 0
0 0 0 0 0 x x 0 0 0 0 0 0 0 0 x x 0
0 0 0 0 0 0 0 x x 0 0 0 0 0 0 x x 0
0 0 0 0 0 0 0 x x 0 0 0 0 0 0 0 0 x
then
x x 0 0 0 0 0 0 0 x x x 0 0 0 0 0 0
x x x x 0 0 0 0 0 x x x 0 0 0 0 0 0
x x x x 0 0 0 0 0 0 x x x x 0 0 0 0
0 0 x x x x 0 0 0 0 x x x x 0 0 0 0
A = 0 0 x x x x 0 0 0 A-1 0 0 0 x x x x 0 0
0 0 0 0 x x x x 0 0 0 0 x x x x 0 0
0 0 0 0 x x x x 0 0 0 0 0 0 x x x x
0 0 0 0 0 0 x x x 0 0 0 0 0 x x x x
0 0 0 0 0 0 x x x 0 0 0 0 0 0 0 x x
Strang (2010a, 2010b) has pointed out a very important "reverse" fact. If A and A-1
are banded, then there is a factorization of the form (4.3.3) with relatively small N.
Indeed, he shows that N is very small for certain types of matrices that arise in signal
processing. An important consequence of this is that both the forward transform Ax
and the inverse transform A-1x can be computed very fast.

Problems
P4.3.l Develop a version of Algorithm 4.3.1 which assumes that the matrix A is stored in band format
style. (See §1.2.5.)
P4.3.2 Show how the output of Algorithm 4.3.4 can be used to solve the upper Hessenberg system
Hx = b.
P4.3.3 Show how Algorithm 4.3.4 could be used to solve a lower hessenberg system Hx = b.
P4.3.4 Give an algorithm for solving an unsymmetric tridiagonal system Ax = b that uses Gaussian
elimination with partial pivoting. It should require only four n-vectors of floating point storage for
the factorization.
P4.3.5 (a) For C E Rnxn define the profile indices m(C, i) = min{j:C;j # O}, where i = l:n. Show
that if A = GGT is the Cholesky factorization of A, then m(A, i) = m(G, i) for i = l:n. (We say
that G has the same profile as A.) (b) Suppose A E Rnxn is symmetric positive definite with profile
indices m; = m(A, i) where i = l:n. Assume that A is stored in a one-dimensional array v as follows:
v = (a11 , a2,m2 , . . . , a22, a3,rn3 , . . . , a33, . . . , an,mn , . . . , ann)·
Give an algorithm that overwrites v with the corresponding entries of the Cholesky factor G and then
uses this factorization to solve Ax = b. How many flops are required? (c) For C E Rnxn define p(C, i)
= max{j:c;j # 0}. Suppose that A E Rnxn has an LU factorization A = LU and that
m(A, 1)
p(A, 1)
� m(A, 2)
� p(A, 2)
Show that m(A, i) = m(L, i) and p(A, i) = p(U, i) for i = l:n.
P4.3.6 Develop a gaxpy version of Algorithm 4.3.1.
� m(A, n),
� p(A, n).
P4.3.7 Develop a unit stride, vectorizable algorithm for solving the symmetric positive definite tridi
agonal systems A(k)x(k) = b(k). Assume that the diagonals, superdiagonals, and right hand sides are
stored by row in arrays D, E, and B and that b(k) is overwritten with x(k).
P4.3.8 Give an example of a 3-by-3 symmetric positive definite matrix whose tridiagonal part is not
positive definite.
P4.3.9 Suppose a symmetric positive definite matrix A E Rnxn has the "arrow structure" , e.g.,
x
x
0
0
0
x
0
x
0
0
x
0
0
x
0
(a) Show how the linear system Ax = b can be solved with O(n) flops using the Sherman-Morrison
Woodbury formula. (b) Determine a permutation matrix P so that the Cholesky factorization
PAPT = GGT
can be computed with O(n) flops.
P4.3.10 Suppose A E Rnxn is tridiagonal, positive definite, but not symmetric. Give an efficient
algorithm for computing the largest entry of 1sr-1s1 where S = (A - AT)/2 and T = (A + AT)/2.
P4.3.ll Show that if A E R''xn and f > 0, then there is a B E Rnxn such that II A - B II � f and
B has the property that all its principal submatrices arc nonsingular. Use this result to formally
complete the proof of Theorem 4.3.3.
P4.3.12 Give an upper bound on the bandwidth of the matrix A in (4.3.3).
P4.3.13 Show that AT and A-1 have the same upper and lower bandwidths in (4.3.3).
P4.3.14 For the A = FiF2 example in §4.3.9, show that A(2:3, :), A(4:5, :), A(6:7, :), . . . each consist
of two singular 2-by-2 blocks.

Representative papers on the topic of banded systems include:
R.S. Martin and J.H. Wilkinson (1965). "Symmetric Decomposition of Positive Definite Band Matri
ces," Numer. Math. 7, 355-61.
R. S. Martin and J.H. Wilkinson (1967). "Solution of Symmetric and Unsymmetric Band Equations
and the Calculation of Eigenvalues of Band Matrices," Numer. Math. 9, 279-301.
E.L. Allgower (1973). "Exact Inverses of Certain Band Matrices," Numer. Math. 21, 279-284.
z. Bohte (1975). "Bounds for Rounding Errors in the Gaussian Elimination for Band Systems," J.
Inst. Math. Applic. 16, 133-142.
L. Kaufman (2007}. "The Retraction Algorithm for Factoring Banded Symmetric Matrices," Numer.
C. Vomel and J. Slemons (2009}. "Twisted Factorization of a Banded Matrix," BIT 49, 433-447.
Tridiagonal systems are particularly important, see:
C. Fischer and R.A. Usmani (1969). "Properties of Some Tridiagonal Matrices and Their Application
to Boundary Value Problems," SIAM J. Numer. Anal. 6, 127-142.
D.J. Rose (1969). "An Algorithm for Solving a Special Class of Tridiagonal Systems of Linear Equa
tions," Commun. ACM 12, 234-236.
M.A. Malcolm and J. Palmer (1974}. "A Fast Method for Solving a Class of Tridiagonal Systems of
Linear Equations," Commun. A CM 1 7, 14-17.
N.J. Higham (1986). "Efficient Algorithms for Computing the Condition Number of a Tridiagonal
Matrix," SIAM J. Sci. Stat. Comput. 7, 150-165.
N.J. Higham (1990}. "Bounding the Error in Gaussian Elimination for Tridiagonal Systems," SIAM
I.S. Dhillon (1998}. "Reliable Computation of the Condition Number of a Tridiagonal Matrix in O(n)
Time," SIAM J. Matrix Anal. Applic. 19, 776-796.
I. Bar-On and M. Leoncini (2000). "Reliable Solution of Tridiagonal Systems of Linear Equations,''
SIAM J. Numer. Anal. 98, 1134 -1153.
M.I. Bueno and F.M. Dopico (2004}. "Stability and Sensitivity ofTridiagonal LU Factorization without
Pivoting," BIT 44, 651-673.
J.R. Bunch and R.F. Marcia (2006). "A Simplified Pivoting Strategy for Symmetric Tridiagonal
Matrices,'' Numer. Lin. Alg. 19, 865-867.
For a discussion of parallel methods for banded problems, see:
H.S. Stone (1975). "Parallel Tridiagonal Equation Solvers," ACM Trans. Math. Softw. 1, 289-307.
I. Bar-On, B. Codenotti and M. Leoncini (1997). "A Fast Parallel Cholesky Decomposition Algorithm
for Tridiagonal Symmetric Matrices," SIAM J. Matrix Anal. Applic. 18, 403-418.
G.H. Golub, A.H. Sameh, and V. Sarin (2001}. "A Parallel Balance Scheme for Banded Linear
Systems," Num. Lin. Alg. 8, 297-316.
S. Rao and Sarita (2008). "Parallel Solution of Large Symmetric Tridiagonal Linear Systems," Parallel
Comput. 94, 177-197.
Papers that are concerned with the structure of the inverse of a band matrix include:
E. Asplund (1959). "Inverses of Matrices {a;; } Which Satisfy a;; = 0 for j > i + p,'' Math. Scand. 7,
57-60.
C.A. Micchelli (1992). "Banded Matrices with Banded Inverses,'' J. Comput. Appl. Math. 41,
281-300.
G. Strang and T. Nguyen (2004). "The Interplay ofRanks of Submatrices," SIAM Review 46, 637-648.
G. Strang (2010a). "Fast Transforms: Banded Matrices with Banded Inverses," Proc. National Acad.
Sciences 107, 12413-12416.
G. Strang (2010b). "Banded Matrices with Banded Inverses and A = LPU," Proceedings International
Congress of Chinese Mathematicians, Beijing.
A pivotal result in this arena is the nullity theorem, a more general version of Theorem 4.3.3, see:
R. Vandebril, M. Van Bare!, and N. Mastronardi (2008}. Matrix Computations and Semiseparable
Matrices, Volume I Linear Systems, Johns Hopkins University Press, Baltimore, MD., 37-40.

4.4 Symmetric Indefinite Systems
Recall that a matrix whose quadratic form xTAx takes on both positive and negative
values is indefinite. In this section we are concerned with symmetric indefinite lin
ear systems. The LDLT factorization is not always advisable as the following 2-by-2
example illustrates:
Of course, any of the pivot strategies in §3.4 could be invoked. However, they destroy
symmetry and, with it, the chance for a "Cholesky speed" symmetric indefinite system
solver. Symmetric pivoting, i.e., data reshuffiings of the form A r PAPT, must be
used as we discussed in §4.2.8. Unfortunately, symmetric pivoting does not always
stabilize the LDLT computation. If € 1 and €z are small, then regardless of P, the
matrix
has small diagonal entries and large numbers surface in the factorization. With sym
metric pivoting, the pivots are always selected from the diagonal and trouble results if
these numbers are small relative to what must be zeroed off the diagonal. Thus, LDLT
with symmetric pivoting cannot be recommended as a reliable approach to symmetric
indefinite system solving. It seems that the challenge is to involve the off-diagonal
entries in the pivoting process while at the same time maintaining symmetry.
In this section we discuss two ways to do this. The first method is due to Aasen
(1971) and it computes the factorization
(4.4.1)
where L = (eij) is unit lower triangular and T is tridiagonal. P is a permutation
chosen such that lfij I ::; 1. In contrast, the diagonal pivoting method due to Bunch
and Parlett (1971) computes a permutation P such that
(4.4.2)
where D is a direct sum of 1-by-l and 2-by-2 pivot blocks. Again, P is chosen so that
the entries in the unit lower triangular L satisfy lfij I ::; 1. Iloth factorizations involve
n3/3 flops and once computed, can be used to solve Ax = b with O(n2) work:
PAPT = LTLT, Lz = Pb, Tw = z, LTy = w, x = PTy :::} Ax = b,
PAPT = LDLT, Lz = Pb, Dw = z, LTy = w, x = PTy :::} Ax = b.
A few comments need to be made about the Tw = z and Dw = z systems that arise
when these methods are invoked.
In Aasen's method, the symmetric indefinite tridiagonal system Tw = z is solved
in O(n) time using band Gaussian elimination with pivoting. Note that there is no
serious price to pay for the disregard of symmetry at this level since the overall process
is O(n3).

4.4. Symmetric Indefinite Systems 187
In the diagonal pivoting approach, the Dw = z system amounts to a set of l-by-1
and 2-by-2 symmetric indefinite systems. The 2-by-2 problems can be handled via
Gaussian elimination with pivoting. Again, there is no harm in disregarding symmetry
during this 0(n) phase of the calculation. Thus, the central issue in this section is the
efficient computation of the factorizations (4.4.1) and (4.4.2).
4.4.1 The Parlett-Reid Algorithm
Parlett and Reid (1970) show how to compute (4.4.1) using Gauss transforms. Their
algorithm is sufficiently illustrated by displaying the k = 2 step for the case n = 5. At
the beginning of this step the matrix A has been transformed to
I
0'.1 /31 0 0 0
I
/31 0'.2 V3 V4 V5
A(l} = MiPiAP'[M'[ = 0 V3 x x x '
0 V4 X X X
0 V5 X X X
where P1 is a permutation chosen so that the entries in the Gauss transformation M1
are bounded by unity in modulus. Scanning the vector [v3 V4 v5 jT for its largest entry,
we now determine a 3-by-3 permutation P2 such that
If this maximal element is zero, we set M2 = P2 = I and proceed to the next step.
Otherwise, we set P2 = diag(/2, P2) and M2 = I - a:(2>ef with
a:(2) = ( 0 0 0 v4/v3 v5/v3 ]T .
Observe that
Ia ,
/31 0 0
n
/31 0'.2 v3 0
A(') � M2F,,A0lp{M[ � � V3 x x
0 x x
0 x x
In general, the process continues for n - 2 steps leaving us with a tridiagonal matrix
T = A(n-2> = (Mn-2Pn-2 . . . M1P1)A(Mn-2Pn-2 . . . M1P1)T .
It can be shown that (4.4.1) holds with P = Pn-2 · · · P1 and
L = (Mn-2Pn-2 · · · MiP1PT)-1
.
Analysis of L reveals that its first column is e1 and that its subdiagonal entries in
column k with k > 1 are "made up" of the multipliers in Mk-1·

The efficient implementation of the Parlett-Reid method requires care when com
puting the update
(4.4.3)
To see what is involved with a minimum of notation, suppose B = BT E JR(n-k)x (n-k)
has and that we wish to form
B+ = (I - wef)B(I - wef)T,
where w E 1Rn-k and e1 is the first column of In-k· Such a calculation is at the heart
of (4.4.3). If we set
b11
u = Be1 - 2w,
then B+ = B - wuT - uwT and its lower triangular portion can be formed in 2(n - k)2
flops. Summing this quantity as k ranges from 1 to n-2 indicates that the Parlett-Reid
procedure requires 2n3/3 flops-twice the volume of work associated with Cholesky.
4.4.2 The Method of Aasen
An n3/3 approach to computing (4.4.1) due to Aasen (1971) can be derived by re
considering some of the computations in the Parlett-Reid approach. We examine the
no-pivoting case first where the goal is to compute a unit lower triangular matrix L
with L(:, 1) = e1 and a tridiagonal matrix
0
T
/3n-1
0 f3n-1 On
such that A = LTLT. The Aasen method is structured as follows:
for j = l:n
end
{a(l:j - 1), /3(1:j - 1) and L(:, l:j) are known}
Compute ai.
if j :'.S n - 1
Compute /3i.
end
if j :'.S n - 2
Compute L(j + 2:n, j + 1).
end
(4.4.4)

To develop recipes for aj , /3j, and L(j + 2:n, j + 1), we compare the jth columns in the
equation A = LH where H = TLT. Noting that H is an upper Hessenberg matrix we
obtain j+l
A(:,j) = LH(:, j) = L L(:, k) · h(k), (4.4.5)
k=l
where h(l:j + 1) = H(l:j + 1, j) and we assume that j � n - 1. It follows that
hj+I ·L(j + l:n, j + 1) = v(j + l:n), (4.4.6)
where
v(j + l:n) = A(j + l:n, j) - L(j + l:n, l:j) · h(l:j). (4.4.7)
Since L is unit lower triangular and £(:, l:j) is known, this gives us a working recipe
for L(j + 2:n, j + 1) provided we know h(l:j). Indeed, from (4.4.6) and (4.4.7) it is
easy to show that
L(j + 2:n, j + 1) = v(j + 2:n)/v(j + 1). (4.4.8)
To compute h(l:j) we turn to the equation H = TLT and examine its jth column.
The case j = 5 amply displays what is going on:
h1 a1 /31 0 0 0 /31l52
0
h2 /31 a2 /32 0 0 a2ls2 + /32lsa
ha 0 /32 {33 0
ls2
/32ls2 + aalsa + /3als4
aa
f53 = (4.4.9)
h4 0 0 {33 a4 {34 /3alsa + a4ls4 + {34
hs 0 0 0 {34
f54
f34l54 + as
as
1
h5 0 0 0 0 /3s {35
At the start of step j, we know a(l:j - 1), /3(1:j - 1) and £(:, l:j). Thus, we can
determine h(l:j - 1) as follows
h1 = f31lj2
for k = l:j - 1
hk = f3k-1lj,k-1 + akljk + f3klj,k+I
end
Equation (4.4.5) gives us a formula for hj:
From (4.4.9) we infer that
j-1
hj = A(j, j) - L L(j, k)hk.
k=l
aj = hj - /3j-1lj,j-1 '
/3j = hj+l ·
(4.4.10)
(4.4.11)
(4.4.12)
(4.4.13)
Combining these equations with (4.4.4), (4.4.7), (4.4.8), (4.4.10), and (4.4.11) we obtain
the Aasen method without pivoting:

L = In
for j = l:n
if j = 1
end
v(2:n) = A(2:n, 1)
else
h1 = /31 ·l12
for k = 2:j - 1
hk = /Jk-lfj,k-1 + O:k(jk + fJkfj,k+1
end
h1 = a11 - L(j, l:j - l) ·h(l:j - 1)
CXj = hj - /Jj-lfj,j-1
v(j + l:n) = A(j + l:n,j) - L(j + l:n, l:j) ·h(l:j)
end
if j <= n - 1
(31 = v(j + l)
end
if j <= n - 2
L(j + 2:n,j + 1)
end
v(j + 2:n)/v(j + 1)
(4.4.14)
The dominant operation each pass through the j-loop is an (n-j)- by-j gaxpy operation.
Accounting for the associated flops we see that the overall Aasen ccomputation involves
n3/3 flops, the same as for the Cholesky factorization.
As it now stands, the columns of L are scalings of the v-vectors in (4.4.14). If
any of these scalings are large, i.e., if any v(j + 1) is small, then we are in trouble.
To circumvent this problem, it is only necessary to permute the largest component of
v(j + l:n) to the top position. Of course, this permutation must be suitably applied to
the unreduced portion of A and the previously computed portion of L. With pivoting,
Aasen's method is stable in the same sense that Gaussian elimination with partial
pivoting is stable.
In a practical implementation ofthe Aasen algorithm, the lower triangular portion
of A would be overwritten with L and T, e.g.,
a1
/31 a2
A f- f32 fJ2 a3
f42 (43 fJ3 0:4
f52 (53 (54 fJ4 0'.5
Notice that the columns of L are shifted left in this arrangement.

4.4.3 Diagonal Pivoting Methods
We next describe the computation of the block LDLT factorization (4.4.2). We follow
the discussion in Bunch and Parlett (1971). Suppose
[ E CT ] s
C B n-s
s n-s
where P1 is a permutation matrix and s = 1 or 2. If A is nonzero, then it is always
possible to choose these quantities so that E is nonsingular, thereby enabling us to
write
PiAP[ =
[c;-1 Ino-s l [� B - c�-lCT l [; E�l-�T l·
For the sake of stability, the s-by-s "pivot" E should be chosen so that the entries in
(4.4.15)
are suitably bounded. To this end, let a E (0, 1) be given and define the size measures
µo = max laijl,
i,j
µi = max laiil .
The Bunch-Parlett pivot strategy is as follows:
if µi � aµo
s = l
Choose Pi so leuI = µi.
else
s = 2
Choose P1 so le21 I = /Lo.
end
It is easy to verify from (4.4.15) that if s = 1, then
laiil :::; (1 + a-1) µo,
while s = 2 implies
(4.4.16)
(4.4.17)
By equating (1 + a-1)2, the growth factor that is associated with two s = 1 steps,
and (3 - a)/(1 - a), the corresponding s = 2 factor, Bunch and Parlett conclude that
a = (1 + ../f7)/8 is optimum from the standpoint of minimizing the bound on element
growth.
The reductions outlined above can be repeated on the order-(n - s) symmetric
matrix A. A simple induction argument establishes that the factorization (4.4.2) exists
and that n3/3 flops are required if the work associated with pivot determination is
ignored.

4.4.4 Stability and Efficiency
Diagonal pivoting with the above strategy is shown by Bunch (1971) to be as stable
as Gaussian elimination with complete pivoting. Unfortunately, the overall process
requires between n3/12 and n3/6 comparisons, since µ0 involves a two-dimensional
search at each stage of the reduction. The actual number of comparisons depends
on the total number of 2-by-2 pivots but in general the Bunch-Parlett method for
computing (4.4.2) is considerably slower than the technique of Aasen. See Barwell and
George (1976).
This is not the case with the diagonal pivoting method of Bunch and Kaufman
(1977). In their scheme, it is only necessary to scan two columns at each stage of the
reduction. The strategy is fully illustrated by considering the very first step in the
reduction:
a = (1 + ffi)/8
A = lard = max{la2il, · · · , land}
if . > 0
end
if IanI � a.
Set s = 1 and P1 = I.
else
end
a = lavrl = max{la1r, . . . , lar-1,rl, lar+i,rl, . . . , ianrl}
if ala11 I � a.2
Set s = 1 and P1 = I
elseif IarrI � aa
Set s = 1 and choose P1 so (P[AP1h1 = arr·
else
Set s = 2 and choose P1 so (P'{AP1h1 = arp·
end
Overall, the Bunch-Kaufman algorithm requires n3/3 fl.ops, O(n2) comparisons, and,
like all the methods of this section, n2/2 storage.
4.4.5 A Note on Equilibrium Systems
A very important class of symmetric indefinite matrices have the form
A =
[ C B ]n
BT 0 p
n P
(4.4.18)
where C is symmetric positive definite and B has full column rank. These conditions
ensure that A is nonsingular.
Of course, the methods of this section apply to A. However, they do not exploit
its structure because the pivot strategies "wipe out" the zero (2,2) block. On the other
hand, here is a tempting approach that does exploit A's block structure:

4.4. Symmetric Indefinite Systems
Step 1. Compute the Cholesky factorization C = GGT.
Step 2. Solve GK = B for K E IRnxv.
Step 3. Compute the Cholesky factorization HHT = KTK = BTc-1B.
From this it follows that
193
In principle, this triangular factorization can be used to solve the equilibrium system
(4.4.19)
However, it is clear by considering steps (b) and (c) above that the accuracy of the
computed solution depends upon K(C) and this quantity may be much greater than
K(A). The situation has been carefully analyzed and various structure-exploiting algo
rithms have been proposed. A brief review of the literature is given at the end of the
section.
It is interesting to consider a special case of (4.4.19) that clarifies what it means
for an algorithm to be stable and illustrates how perturbation analysis can structure
the search for better methods. In several important applications, g = 0, C is diagonal,
and the solution subvector y is of primary importance. A manipulation shows that this
vector is specified by
(4.4.20)
Looking at this we are again led to believe that K(C) should have a bearing on the
accuracy of the computed y. However, it can be shown that
(4.4.21)
where the upper bound 'lj;8 is independent ofC, a result that (correctly) suggests that y
is not sensitive to perturbations in C. A stable method for computing this vector should
respect this, meaning that the accuracy of the computed y should be independent of
C. Vavasis (1994) has developed a method with this property. It involves the careful
assembly of a matrix V E nex(n-p) whose columns are a basis for the nullspace of
BTc-1 . The n-by-n linear system
[ B / V ] [ � ] = f
is then solved implying f = By+ Vq. Thus, BTc-11 = BTC-1By and (4.4.20) holds.
Problems
P4.4.1 Show that if all the 1-by-1 and 2-by-2 principal submatrices of an n-by-n symmetric matrix
A are singular, then A is zero.
P4.4.2 Show that no 2-by-2 pivots can arise in the Bunch-Kaufman algorithm if A is positive definite.

P4.4.3 Arrange (4.4.14) so that only the lower triangular portion of A is referenced and so that
a(j) overwrites A(j, j) for j = l:n, {3(j) overwrites A(j + 1,j) for j = l:n - 1, and L(i, j) overwrites
A(i, j - 1) for j = 2:n - 1 and i = j + 1:n.
P4.4.4 Suppose A E ff• X n. is symmetric and strictly diagonally dominant. Give an algorithm that
computes the factorization
nAnT =
[ R 0 ] [ RT
S -M 0
where n is a permuation and the diagonal blocks R and Pvf are lower triangular.
P4.4.5 A symmetric matrix A is quasidefinite if it has the form
A =
n p
with Au and A22 positive definite. (a) Show that such a matrix has an LDLT factorization with the
property that
D = [ �1
-
�2
]
where Di E Rnxn and D2 E wxp have positive diagonal entries. (b) Show that if A is quasidefinite
then all its principal submatrices arc nonsingular. This means that PAPT has an LDLT factorization
for any permutation matrix P.
P4.4.6 Prove (4.4.16) and (4.4.17).
P4.4.7 Show that -(BTc-1 B)-1 is the (2,2) block of A-1 where A is given by equation (4.4.18).
P4.4.8 The point of this problem is to consider a special case of (4.4.21). Define the matrix
M(a) = (BTc-1B)-1 BTc-1
where C = (In + aekek), n > -1, and ek = In (: , k). (Note that C is just the identity with a added
to the (k, k) entry.) Assume that B E Rn xp has rank p and show that
M(a) = (BTB)-1BT (In-
a
T
ekwT)
I + aw w
where
Show that if 11 w lb = 0 or 11 w 112 = 1, then 11 M(a) lb = 1/amin(B). Show that if 0 < 11 w 112 < 1,
then
11 M(a) 112 � max {1 -
I�w lb , 1 +
II �lb }Iamin(B).
Thus, II M(a) 112 has an a-independent upper bound.
The basic references for computing (4.4.1) are as follows:
J.O. Aasen (1971). "On the Reduction of a Symmetric Matrix to Tridiagonal Form," BIT 11, 233-242.
B.N. Parlett and J.K. Reid (1970). "On the Solution of a System of Linear Equations Whose Matrix
Is Symmetric but not Definite," BIT 10, 386· 397.
The diagonal pivoting literature includes:
J.R. Bunch and B.N. Parlett (1971). "Direct Methods for Solving Symmetric Indefinite Systems of
Linear Equations,'' SIAM J. Numer. Anal. 8, 639-655.
J.R. Bunch (1971) . "Analysis of the Diagonal Pivoting Method," SIAM J. Numer. Anal. 8, 656-680.
J.R. Bunch (1974). "Partial Pivoting Strategies for Symmetric Matrices,'' SIAM J. Numer. Anal.
1 1, 521-528.

J.R. Bunch, L. Kaufman, and B.N. Parlett (1976). "Decomposition of a Symmetric Matrix,'' Numer.
Math. 27, 95-109.
J.R. Bunch and L. Kaufman (1977). "Some Stable Methods for Calculating Inertia and Solving
Symmetric Linear Systems," Math. Comput. 31, 162-79.
M.T. .Jones and M.L. Patrick (1993). "Bunch-Kaufman Factorization for Real Symmetric Indefinite
Banded Matrices,'' SIAM J. Matrix Anal. Applic. 14, 553-559.
Because "future" columns must be scanned in the pivoting process, it is awkward (but possible) to
obtain a gaxpy-rich diagonal pivoting algorithm. On the other hand, Aasen's method is naturally rich
in gaxpys. Block versions of both procedures are possible. Various performance issues are discussed
in:
V. Barwell and J.A. George (1976). "A Comparison of Algorithms for Solving Symmetric Indefinite
Systems of Linear Equations," ACM Trans. Math. Softw. 2, 242-251.
M.T. Jones and M.L. Patrick (1994). "Factoring Symmetric Indefinite Matrices on High-Performance
Architectures," SIAM J. Matrix Anal. Applic. 15, 273-283.
Another idea for a cheap pivoting strategy utilizes error bounds based on more liberal interchange
criteria, an idea borrowed from some work done in the area of sparse elimination methods, see:
R. Fletcher (1976). "Factorizing Symmetric Indefinite Matrices,'' Lin. Alg. Applic. 14, 257-272.
Before using any symmetric Ax = b solver, it may be advisable to equilibrate A. An O(n2) algorithm
for accomplishing this task is given in:
J.R. Bunch (1971). "Equilibration of Symmetric Matrices in the Max-Norm," J. ACM 18, 566-·572.
N.J. Higham (1997). "Stability of the Diagonal Pivoting Method with Partial Pivoting," SIAM J.
Procedures for skew-symmetric systems similar to the methods that we have presented in this section
also exist:
J.R. Bunch (1982). "A Note on the Stable Decomposition of Skew Symmetric Matrices," Math.
Comput. 158, 475-480.
J. Bunch (1982). "Stable Decomposition of Skew-Symmetric Matrices," Math. Comput. 38, 475-479.
P. Benner, R. Byers, H. Fassbender, V. Mehrmann, and D. Watkins (2000). "Cholesky-like Factoriza
tions of Skew-Symmetric Matrices,'' ETNA 1 1, 85-93.
For a discussion of symmetric indefinite system solvers that are also banded or sparse, see:
C. Ashcraft, R.G. Grimes, and J.G. Lewis (1998). "Accurate Symmetric Indefinite Linear Equation
Solvers,'' SIAM J. Matrix Anal. Applic. 20, 513--561.
S.H. Cheng and N.J. Higham (1998). "A Modified Cholesky Algorithm Based on a Symmetric Indef
inite Factorization,'' SIAM J. Matrix Anal. Applic. 19, 1097--1110.
J. Zhao, W. Wang, and W. Ren (2004). "Stability of the Matrix Factorization for Solving Block
Tridiagonal Symmetric Indefinite Linear Systems,'' BIT 44, 181 -188.
H. Fang and D.P. O'Leary (2006). "Stable Factorizations of Symmetric Tridiagonal and Triadic
Matrices," SIAM J. Matrix Anal. Applic. 28, 576-595.
D. Irony and S. Toledo (2006). "The Snap-Back Pivoting Method for Symmetric Banded Indefinite
Matrices,'' SIAM J. Matrix Anal. Applic. 28, 398-424.
The equilibrium system literature is scattered among the several application areas where it has an
important role to play. Nice overviews with pointers to this literature include:
G. Strang (1988). "A Framework for Equilibrium Equations," SIAM Review .'JO, 283-297.
S.A. Vavasis (1994). "Stable Numerical Algorithms for Equilibrium Systems," SIAM J. Matrix Anal.
Applic. 15, 1108-1131.
P.E. Gill, M.A. Saunders, and J.R. Shinnerl (1996). "On the Stability of Cholesky Factorization for
Symmetric Quasidefinite Systems," SIAM J. Matrix Anal. Applic. 1 7, 35-46.
G.H. Golub and C. Greif (2003). "On Solving Block-Structured Indefinite Linear Systems,'' SIAM J.
Sci. Comput. 24, 2076-2092.
For a discussion of (4.4.21), see:
G.W. Stewart (1989). "On Scaled Projections and Pseudoinverses," Lin. Alg. Applic. 1 12, 189-193.

D.P. O'Leary (1990). "On Bounds for Scaled Projections and Pseudoinverses," Lin. Alg. Applic. 132,
1 15-117.
M.J. Todd (1990). "A Dantzig-Wolfe-like Variant of Karmarkar's Interior-Point Linear Programming
Algorithm," Oper. Res. 38, 1006-1018.
An equilibrium system is a special case of a saddle point system. See §11.5.10.
4.5 Block Tridiagonal Systems
Block tridiagonal linear systems of the form
0 X1 bi
X2 b2
(4.5.1}
FN-1
0 EN-1 DN XN bN
frequently arise in practice. We assume for clarity that all blocks are q-by-q. In this
section we discuss both a block LU approach to this problem as well as a pair of
divide-and-conquer schemes.
4.5. 1 Block Tridiagonal LU Factorization
If
D1 F1 0
E1 D2
A =
FN- 1
0 EN-1 DN
then by comparing blocks in
I 0 U1 F1 0
L1 I 0 U2
A =
FN-1
0 LN-1 I 0 0 UN
we formally obtain the following algorithm for computing the Li and Ui:
U1 = D1
for i = 2:N
end
Solve Li-1Ui-1 = Ei-1 for Li-1·
Ui = Di - Li-1Fi-1
(4.5.2}
(4.5.3}
(4.5.4)

4.5. Block Tridiagonal Systems 197
The procedure is defined as long as the Ui are nonsingular.
Having computed the factorization (4.5.3), the vector x in {4.5.1) can be obtained
via block forward elimination and block back substitution:
Yi = bi
for i = 2:N
Yi = bi - Li-iYi-i
end
Solve UNxN = YN for xN
for i = N - 1: - 1:1
Solve Uixi = Yi - Fixi+i for Xi
end
{4.5.5)
To carry out both (4.5.4) and (4.5.5), each Ui must be factored since linear systems
involving these submatrices are solved. This could be done using Gaussian elimination
with pivoting. However, this does not guarantee the stability of the overall process.
4.5.2 Block Diagonal Dominance
In order to obtain satisfactory bounds on the Li and Ui it is necessary to make addi
tional assumptions about the underlying block matrix. For example, if we have
{4.5.6)
for i = l:N, then the factorization (4.5.3) exists and it is possible to show that the Li
and ui satisfy the inequalities
II Li Iii :::; 1,
II ui 11i :::; II An 11i .
The conditions (4.5.6) define a type of block diagonal dominance.
4.5.3 Block-Cyclic Reduction
{4.5.7)
{4.5.8)
We next describe the method of block-cyclic reduction that can be used to solve some
important special instances of the block tridiagonal system (4.5.1). For simplicity, we
assume that A has the form
A
D F
F D
0
0
F
F D
E nrvqxNq {4.5.9)
where F and D are q-by-q matrices that satisfy DF = FD. We also assume that
N = 2k - 1. These conditions hold in certain important applications such as the
discretization of Poisson's equation on a rectangle. (See §4.8.4.)

The basic ideabehind cyclic reduction is to halve repeatedly the dimension of the
problem on hand repeatedly until we are left with a single q-by-q system for the un
known subvector x2k-t . This system is then solved by standard means. The previously
eliminated Xi are found by a back-substitution process.
The general procedure is adequately illustrated by considering the case N = 7:
bi Dx1 + Fx2,
b2 Fx1 + Dx2 + Fx3,
b3 Fx2 + Dx3 + Fx4,
b4 Fx3 + Dx4 + Fx5,
b5 Fx4 + Dxf> + Fx6,
b6 Fx5 + Dx6 + Fxr,
br Fx6 + Dxr.
For i = 2, 4, and 6 we multiply equations i - 1, i, and i + 1 by F, -D, and F,
respectively, and add the resulting equations to obtain
(2F2 - D2)x2 + F2x4 = F(b1 + b3) - Db2,
F2x2 + (2F2 - D2)x4 + F2x6 = F(b3 + b5) - Db4,
F2x4 + (2F2 - D2)x6 = F(b5 + br) - Db6.
Thus, with this tactic we have removed the odd-indexed Xi and are left with a reduced
block tridiagonal system of the form
D(llx2 + p(llx4
p(l)x2 + D(llx4 + p<l)x6
p(llx4 + D(llx6
where D(l) = 2F2 - D2 and p(l) = F2 commute. Applying the same elimination
strategy as above, we multiply these three equations respectively by p(I), -D<1l, and
p(l). When these transformed equations are added together, we obtain the single
equation
(2[F(l)J2 - D(1)2)X4 = p(l) (b�l) + b�l)) - D(l)bil)'
which we write as
DC2lx4 = b(2).
This completes the cyclic reduction. We now solve this (small) q-by-q system for x4.
The vectors x2 and x6 are then found by solving the systems
D(l)x - b(l) - p(I)x
2 - 2 ,4 ,
D(l)X6 = b�l) - p(l)X4.
Finally, we use the first, third, fifth, and seventh equations in the original system to
compute X1, x3, X5, and ;i:7 , respectively.
The amount of work required to perform these recursions for general N depends
greatly upon the sparsity of the fl(P) and p(P). In the worst case when these matrices
are full, the overall flop count has order log(N)q3. Care must be exercised in order to
ensure stability during the reduction. For further details, see Buneman (1969).

4.5.4 The SPIKE Framework
A bandwidth-p matrix A E R.Nqx Nq
can also be regarded as a block tridiagonal matrix
with banded diagonal blocks and low-rank off-diagonal blocks. Herc is an example
where N = 4, q = 7, and p = 2:
x x x
x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
A = x x x x x
x x x x x
(4.5.11)
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
x x x x
x x x
Note that the diagonal blocks have bandwidth p and the blocks along the subdiagonal
and superdiagonal have rank p. The low rank of the off-diagonal blocks makes it
possible to formulate a divide-and-conquer procedure known as the "SPIKE" algorithm.
The method is of interest because it parallelizes nicely. Our brief discussion is based
on Polizzi and Sameh (2007).
Assume for clarity that the diagonal blocks D1 , • • • , D4 are sufficiently well con
ditioned. If we premultiply the above matrix by the inverse of diag(D1 , D2, Da, D4),
then we obtain
I + +
I + +
1 + +
1 + +
l + +
I + +
I + +
+ + I + +
+ + I + +
+ + I + +
+ + 1 + +
+ + I + +
+ + I + +
+ + I + +
(4.5.12)
+ + I + +
+ + 1 + +
+ + I + +
+ + I + +
+ + 1 + +
+ + I + +
+ + I + +
+ + 1
+ + I
+ + 1
+ + 1
+ + 1
+ + 1
+ + 1

With this maneuver, the original linear system
(4.5.13)
which corresponds to (4.5.11), transforms to
(4.5.14)
where DJJi = bi, DJ!i = Fi, and Di+lEi = Ei. Next, we refine the blocking (4.5.14)
by turning each submatrix into a 3-by-3 block matrix and each subvector into a 3-by-1
block vector as follows:
l2 0 0 Ki 0 0 0 0 0 0 0 0 W1 Ci
0 Ia 0 H1 0 0 0 0 0 0 0 0 Yi di
0 0 I2 G1 0 0 0 0 0 0 0 0 Zi Ji
0 0 Ri l2 0 0 K2 0 0 0 0 0 W2 C2
0 0 S1 0 Ia 0 H2 0 0 0 0 0 Y2 d2
0 0 Ti 0 0 l2 G2 0 0 0 0 0
0 0 0 0 0 R2 l2 0 0 K3 0 0
z2 h (4.5.15)
W3 C3
0 0 0 0 0 S2 0 [3 0 H3 0 0 Y3 d3
0 0 0 0 0 T2 0 0 l2 G3 0 0 Z3 f3
0 0 0 0 0 0 0 0 R3 Iq 0 0 W4 C4
0 0 0 0 0 0 0 0 S3 0 Im 0 Y4 d4
0 0 0 0 0 0 0 0 T3 0 0 lq Z4 f4
The block rows and columns in this equation can be reordered to produce the following
equivalent system:
l2 0 Ki 0 0 0 0 0 0 0 0 0 Wi Ci
0 l2 Gi 0 0 0 0 0 0 0 0 0 Zi Ji
0 Ri l2 0 K2 0 0 0 0 0 0 0 W2 C2
0 Ti 0 l2 G2 0 0 0 0 0 0 0 Z2 h
0 0 0 R2 l2 0 K3 0 0 0 0 0 W3 C3
0 0 0 T2 0 l2 G3 0 0 0 0 0 Z3 f3 (4.5.16)
0 0 0 0 0 R3 l2 0 0 0 0 0 =
W4 C4
0 0 0 0 0 T3 0 l2 0 0 0 0 Z4 f4
0 0 Hi 0 0 0 0 0 [3 0 0 0 Yi T
0 Si 0 0 H2 0 0 0 0 [3 0 0 Y2 d2
0 0 0 S2 0 0 H3 0 0 0 [3 0 Y3 d3
0 0 0 0 0 S3 0 0 0 0 0 [3 Y4 d4

Ifwe assume that N » 1, then the (1,1) block is a relatively small banded matrix that
define the Zi and Wi. Once these quantities are computed, then the remaining unknowns
follow from a decoupled set of large matrix-vector multiplications, e.g., Yi = di -Hiw2,
y2 = d2 - Sizi - H2w3, y3 = d3 - S2z2 - H3w4, and Y4 = d4 - S3z3. Thus, in a four
processor execution of this method, there are (short) communications that involves the
Wi and Zi and a lot of large, local gaxpy computations.
Problems
P4.5.l (a) Show that a block diagonally dominant matrix is nonsingular. (b) Verify that (4.5.6)
implies (4.5.7) and (4.5.8).
P4.5.2 Write a recursive function x = CR(D,F, N, b) that returns the solution to Ax = b where A is
specified by (4.5.9). Assume that N = 2k - 1 for some positive integer k, D, F E R'lxq, and b E RNq.
P4.5.3 How would you solve a system of the form
where D1 and D2 are diagonal and F1 and Ei are tridiagonal? Hint: Use the perfect shuffle permu
tation.
P4.5.4 In the simplified SPIKE framework that we presented in §4.5.4, we treat A as an N-by-N
block matrix with q-by-q blocks. It is assumed that A E R_Nqx Nq has bandwidth p and that p « q.
For this general case, describe the block sizes that result when the transition from (4.5.11) to (4.5.16)
is carried out. Assuming that A's band is dense, what fraction of flops are gaxpy flops?
The following papers provide insight into the various nuances of block matrix computations:
J.M. Varah (1972). "On the Solution of Block-Tridiagonal Systems Arising from Certain Finite
Difference Equations," Math. Comput. 26, 859-868.
R. Fourer (1984). "Staircase Matrices and Systems," SIAM Review 26, 1-71.
M.L. Merriam (1985). "On the Factorization of Block Tridiagonals With Storage Constraints," SIAM
J. Sci. Stat. Comput. 6, 182-192.
The property of block diagonal dominance and its various implications is the central theme in:
D.G. Feingold and R.S. Varga (1962). "Block Diagonally Dominant Matrices and Generalizations of
the Gershgorin Circle Theorem,"Pacific J. Math. 12, 1241-1250.
R.S. Varga (1976). "On Diagonal Dominance Arguments for Bounding II A-1 1100 ," Lin. Alg. Applic.
14, 211-217.
Early methods that involve the idea of cyclic reduction are described in:
R.W. Hockney (1965). "A Fast Direct Solution of Poisson's Equation Using Fourier Analysis, " J.
ACM 12, 95-113.
B.L. Buzbee, G.H. Golub, and C.W. Nielson (1970). "On Direct Methods for Solving Poisson's
Equations," SIAM J. Numer. Anal. 7, 627-656.
The accumulation of the right-hand side must be done with great care, for otherwise there would be
asignificant loss of accuracy. A stable way of doing this is described in:
0. Buneman (1969). "A Compact Non-Iterative Poisson Solver," Report 294, Stanford University
Institute for Plasma Research, Stanford, CA.
Other literature concerned with cyclic reduction includes:
F.W. Dorr (1970). "The Direct Solution of the Discrete Poisson Equation on a Rectangle," SIAM
Review 12, 248-263.

B.L. Buzbee, F.W. Dorr, J.A. George, and G.H. Golub (1971). "The Direct Solution of the Discrete
Poisson Equation on Irregular Regions," SIAM J. Nu.mer. Anal. 8, 722-736.
F.W. Dorr (1973). "The Direct Solution of the Discrete Poisson Equation in O(n2) Operations,"
SIAM Review 15, 412-415.
P. Concus and G.H. Golub (1973). "Use of Fast Direct Methods for the Efficient Numerical Solution
of Nonseparable Elliptic Equations," SIAM J. Numer. Anal. 10, 1103-1120.
B.L. Buzbee and F.W. Dorr (1974). "The Direct Solution ofthe Biharmonic E!=tuation on Rectangular
Regions and the Poisson Equation on Irregular Regions," SIAM J. Numer. Anal. 11, 753-763.
D. Heller (1976). "Some Aspects of the Cyclic Reduction Algorithm for Block Tridiagonal Linear
Systems," SIAM J. Numer. Anal. 13, 484-496.
Various generalizations and extensions to cyclic reduction have been proposed:
P.N. Swarztrauber and R.A. Sweet (1973). "The Direct Solution of the Discrete Poisson Equation on
a Disk," SIAM J. Numer. Anal. 10, 900-907.
R.A. Sweet (1974). "A Generalized Cyclic Reduction Algorithm," SIAM J. Num. Anal. 11, 506-20.
M.A. Diamond and D.L.V. Ferreira (1976). "On a Cyclic Reduction Method for the Solution of
Poisson's Equation," SIAM J. Numer. Anal. 13, 54--70.
R.A. Sweet (1977). "A Cyclic Reduction Algorithm for Solving Block Tridiagonal Systems of Arbitrary
Dimension," SIAM J. Numer. Anal. 14, 706-720.
P.N. Swarztrauber and R. Sweet (1989). "Vector and Parallel Methods for the Direct Solution of
Poisson's Equation," J. Comput. Appl. Math. 27, 241-263.
S. Bondeli and W. Gander (1994). "Cyclic Reduction for Special Tridiagonal Systems," SIAM J.
Matri.x Anal. Applic. 15, 321-330.
A 2-by-2 block system with very thin (1,2) and (2,1) blocks is referred to as a bordered linear system.
Special techniques for problems with this structure are discussed in:
W. Govaerts and J.D. Pryce (1990). "Block Elimination with One Iterative Refinement Solves Bor
dered Linear Systems Accurately," BIT 30, 490-507.
W. Govaerts (1991). "Stable Solvers and Block Elimination for Bordered Systems," SIAM J. Matri.x
Anal. Applic. 12, 469-483.
W. Govaerts and J.D. Pryce (1993). "Mixed Block Elimination for Linear Systems with Wider Bor
ders," IMA J. Numer. Anal. 13, 161-180.
Systems that are block bidiagonal, block Hessenberg, and block triangular also occur, see:
G. Fairweather and I. Gladwell (2004). "Algorithms for Almost Block Diagonal Linear Systems,"
U. von Matt and G. W. Stewart (1996). "Rounding Errors in Solving Block Hessenberg Systems,"
Math. Comput. 65, 115 -135.
L. Gemignani and G. Lotti (2003). "Efficient and Stable Solution of M-Matrix Linear Systems of
(Block) Hessenberg Form," SIAM J. Matri.x Anal. Applic. 24, 852-876.
M. Hegland and M.R. Osborne (1998). "Wrap-Around Partitioning for Block Bidiagonal Linear Sys..
terns," IMA J. Numer. Anal. 18, 373-383.
T. Rossi and J. Toivanen (1999). "A Parallel Fast Direct Solver for Block Tridiagonal Systems with
Separable Matrices of Arbitrary Dimension," SIAM J. Sci. Comput. 20, 1778-1793.
1.M. Spitkovsky and D. Yong (2000). "Almost Periodic Factorization of Certain Block Triangular
Matrix Functions," Math. Comput. 69, 1053--1070.
The SPIKE framework supports many different options according to whether the band is sparse or
dense. Also, steps have to be taken if the diagonal blocks are ill-conditioned, see:
E. Polizzi and A. Sameh (2007). "SPIKE: A Parallel Environment for Solving Banded Linear Systems,"
Comput. Fluids 36, 113-120.
C.C.K. Mikkelsen and M. Manguoglu (2008). "Analysis of the Truncated SPIKE Algorithm," SIAM
J. Matri.x Anal. Applic. 30, 1500-1519.

4.6. Vandermonde Systems 203
4.6 Vandermonde Systems
Supposex(O:n) E :nr+1. A matrix V E IR.(n+l)x(n+l}of the form
I:,
1 1
X1 Xn
v V(xo, . . . , Xn) =
xn xn xn
0 1 n
is said to be a Vandermonde matrix. Note that the discrete Fourier transform matrix
(§1.4.1) is a very special complex Vandermonde matrix.
In this section, we show how the systems VTa = f = f(O:n) and Vz = b = b(O:n)
can be solved in O(n2) flops. For convenience, vectors and matrices are subscripted
from 0 in this section.
4.6.1 Polynomial Interpolation: vra = f
Vandermonde systems arise in many approximation and interpolation problems. In
deed, the key to obtaining a fast Vandermonde solver is to recognize that solving
vTa = f is equivalent to polynomial interpolation. This follows because if VTa = f
and n
p(x) = :Lajxj, (4.6.1)
j=O
then p(xi) = fi for i = O:n.
Recall that if the Xi are distinct then there is a unique polynomial of degree n
that interpolates (xo, Jo), . . . , (xn, fn)· Consequently, V is nonsingular as long as the
Xi are distinct. We assume this throughout the section.
The first step in computing the aj of (4.6.1) is to calculate the Newton represen
tation of the interpolating polynomial p:
n (k-1 )
p(x) = �Ck !!(x - Xi) .
The constants ck are divided differences and may be determined as follows:
c(O:n) = f(O:n)
for k = O : n- 1
end
for i = n : - 1 : k+1
Ci = (ci - Ci-1)/(xi - Xi-k-d
end
See Conte and deBoor (1980).
(4.6.2)
(4.6.3)
The next task is to generate the coefficients ao, . . . , an in (4.6.1) from the Newton
representation coefficients Co, . . . , Cn. Define the polynomials Pn(x), . . . ,p0(x) by the
iteration

Pn(x) = Cn
for k = n - 1 : - 1 : 0
Pk(x) = Ck + (x - Xk)'Pk+i(x)
end
and observe that Po(x) = p(x). Writing
Pk(x) - a(k) + a(k) x + · · ·
+ a(k)xn-k
- k k+l n
and equating like powers of x in the equation Pk = Ck + (x-xk)Pk+i gives the following
recursion for the coefficients a�k):
a�n) = Cn
for k = n- 1: - 1 : 0
end
a(k) - Ck - Xka(k+I)
k - k+l
for i = k + l : n - 1
end
(k) (k+l) (k+l)
ai = ai - Xkai+1
(k) (k+l)
an = an
Consequently, the coefficients ai = a�0> can be calculated as follows:
a(O:n) = c{O:n)
for k = n- 1: - 1 : 0
end
for i = k:n - 1
ai = ai - Xkai+1
end
Combining this iteration with (4.6.3) gives the following algorithm.
{4.6.4)
Algorithm 4.6.1 Given x(O : n) E Rn+l with distinct entries and f = f(O : n) E Rn+1,
the following algorithm overwrites f with the solution a = a(O : n) to the Vandermonde
system V(xo, . . . ,Xn)Ta = f.
for k = O : n - 1
end
for i = n: - 1 :k + 1
f(i) = (!(i) - f(i - 1))/(x(i) - x(i - k - 1))
end
for k = n - 1: - 1 : 0
for i = k : n - 1
f(i) = f(i) - f(i + l) ·x(k)
end
end
This algorithm requires 5n2/2 flops.

4.6.2 The System Vz = b
Now consider the system Vz = b. To derive an efficient algorithm for this problem,
we describe what Algorithm 4.6.1 does in matrix-vector language. Define the lower
bidiagonal matrix Lk(a) E lll(n+l)x(n+l) by
0
and the diagonal matrix Dk by
1 0
-a 1
0
0
0
0
1
-a 1
Dk = diag( 1, . . . , 1 ,Xk+i - Xo, . . . ,Xn - Xn-k-1).
'--v-"
k+l
With these definitions it is easy to verify from (4.6.3) that, if f = f(O : n) and c = c(O : n)
is the vector of divided differences, then
where U is the upper triangular matrix defined by
UT = D;�1Ln-1(1) · · · D01Lo(l).
Similarly, from (4.6.4) we have
a = LTc,
where L is the unit lower triangular matrix defined by
LT = Lo(xo? · · · Ln-1(Xn-1)T.
It follows that a = v-Tf is given by
a = LTUTf.
Thus,
y-T = LTUT
which shows that Algorithm 4.6.1 solves VTa = f by tacitly computing the "UL
factorization" of v-1. Consequently, the solution to the system Vz = b is given by
z = v-1b = U(Lb)
= (Lo(l)TDQ"1 . . . Ln-1(l)TD;�1) (Ln-1(Xn-1) . . . Lo(xo)b).

This observation gives rise to the following algorithm:
Algorithm 4.6.2 Given x(O : n) E Rn+l with distinct entries and b = b(O : n) E Rn+l,
the following algorithm overwrites b with the solution z = z(O : n) to the Vandermonde
system V(xo, . . . , Xn)z = b.
for k = O : n - 1
end
for i = n: - 1 : k + 1
b(i) = b(i) - x(k)b(i - 1)
end
for k = n - 1: - 1 : 0
for i = k + 1 : n
end
b(i) = b(i)/(x(i) - x(i - k - 1))
end
for i = k : n - 1
b(i) = b(i) - b(i + 1)
end
This algorithm requires 5n2/2 fl.ops.
Algorithms 4.6.1 and 4.6.2 are discussed and analyzed by Bjorck and Pereyra
{1970). Their experience is that these algorithms frequently produce surprisingly ac
curate solutions, even if V is ill-conditioned.
We mention that related techniques have been developed and analyzed for con
fluent Vandennonde systems, e.g., systems of the form
See Higham (1990).
Problems
P4.6.l Show that if V = V(xo, . . . , Xn), then
det(V) =
II (xi - x; ).
n2:'.i>j2:'.0
P4.6.2 (Gautschi 1975) Verify the following inequality for the n = 1 case above:
n
II v-1 lloo ::; max II 1 + lxi l .
09�n lxk - xii
i=O
i�k
Equality results if the Xi are all on the same ray in the complex plane.

Our discussion of Vandermonde linear systems is drawn from the following papers:
A. Bjorck and V. Pereyra (1970). "Solution of Vandermonde Systems of Equations," Math. Comput.
24, 893-903.
A. Bjorck and T. Elfving (1973). "Algorithms for Confluent Vandermonde Systems,'' Numer. Math.
21, 130-37.
The divided difference computations we discussed are detailed in:
S.D. Conte and C. de Boor (1980). Elementary Numerical Analysis: An Algorithmic Approach, Third
Edition, McGraw-Hill, New York, Chapter 2.
Error analyses of Vandermonde system solvers include:
N.J. Higham (1987). "Error Analysis of the Bjorck-Pereyra Algorithms for Solving Vandermonde
Systems," Numer. Math. 50, 613-632.
N.J. Higham (1988). "Fast Solution of Vandermonde-like Systems Involving Orthogonal Polynomials,''
IMA J. Numer. Anal. 8, 473-486.
N.J. Higham (1990). "Stability Analysis of Algorithms for Solving Confluent Vandermonde-like Sys
tems," SIAM J. Matrix Anal. Applic. 11, 23- 41.
S.G. Bartels and D.J. Higham (1992). "The Structured Sensitivity of Vandermonde-Like Systems,"
Numer. Math. 62, 17-34.
J.M. Varah (1993). "Errors and Perturbations in Vandermonde Systems," IMA J. Numer. Anal. 13,
1-12.
Interesting theoretical results concerning the condition of Vandermonde systems may be found in:
W. Gautschi (1975). "Norm Estimates for Inverses of Vandermonde Matrices," Numer. Math. 23,
337-347.
W. Gautschi (1975). "Optimally Conditioned Vandermonde Matrices,'' Numer. Math. 24, 1-12.
J-G. Sun (1998). "Bounds for the Structured Backward Errors of Vandermonde Systems,'' SIAM J.
B.K. Alpert (1996). "Condition Number of a Vandermonde Matrix,'' SIAM Review 38, 314--314.
B. Beckermarm (2000). "The condition number of real Vandermonde, Krylov and positive definite
Hankel matrices," Numer. Math. 85, 553-577.
The basic algorithms presented can be extended to cover confluent Vandermonde systems, block
Vandermonde systems, and Vandermonde systems with other polynomial bases:
G. Galimberti and V. Pereyra (1970). "Numerical Differentiation and the Solution of Multidimensional
Vandermonde Systems,'' Math. Comput. 24, 357-364.
G. Galimberti and V. Pereyra (1971). "Solving Confluent Vandermonde Systems of Hermitian Type,''
Numer. Math. 18, 44-60.
H. Van de Ve! (1977). "Numerical Treatment of Generalized Vandermonde Systems of Equations,"
Lin. Alg. Applic:. 1 7, 149-174.
G.H. Golub and W.P Tang (1981). "The Block Decomposition of a Vandermonde Matrix and Its
Applications," BIT 21, 505-517.
D. Calvetti and L. Reichel (1992). "A Chebychev-Vandermonde Solver," Lin. Alg. Applic. 172,
219-229.
D. Calvetti and L. Reichel (1993). "Fast Inversion ofVandermonde-Like Matrices Involving Orthogonal
Polynomials," BIT SS, 473-484.
H. Lu (1994). "Fast Solution of Confluent Vandermonde Linear Systems," SIAM J. Matrix Anal.
Applic. 15, 1277-1289.
H. Lu (1996). "Solution of Vandermonde-like Systems and Confluent Vandermonde-Iike Systems,''
SIAM J. Matrix Anal. Applic. 1 7, 127-138.
M.-R. Skrzipek (2004). "Inversion of Vandermonde-Like Matrices,'' BIT 44, 291-306.
J.W. Demmel and P. Koev (2005). "The Accurate and Efficient Solution of a Totally Positive Gener
alized Vandermonde Linear System,'' SIAM J. Matrix Anal. Applic. 27, 142-152.
The displacement rank idea that we discuss in §12.1 can also be used to develop fast methods for
Vandermonde systems.

4.7 Classical Methods for Toeplitz Systems
Matrices whose entries are constant along each diagonal arise in many applications
and are called Toeplitz matrices. Formally, T E IRnxn is Toeplitz if there exist scalars
r-n+l, . . . , ro, . . . , rn-1 such that aij = rj-i for all i and j. Thus,
[ro rl
r_ 1 ro
T -
r_2 r_1
r_3 r_2
is Toeplitz. In this section we show that Toeplitz systems can be solved in O(n2) flops
The discussion focuses on the important case when T is also symmetric and positive
definite, but we also include a few comments about general Toeplitz systems. An
alternative approach to Toeplitz system solving based on displacement rank is given in
§12.1.
4.7.1 Persymmetry
The key fact that makes it possible to solve a Toeplitz system Tx = b so fast has to do
with the structure of r-1. Toeplitz matrices belong to the larger class of persymmetric
matrices. We say that B E IRnxn is persymmetric if
£nB£n = BT
where t:n is the n-by-n exchange matrix defined in §1.2.11, e.g.,
If B is persymmetric, then t:nB is symmetric. This means that B is symmetric about
its antidiagonal. Note that the inverse of a persymmetric matrix is also pcrsymmetric:
Thus, the inverse of a nonsingular Toeplitz matrix is persymmetric.
4.7.2 Three Problems
Assume that we have scalars ri, . . . , rn such that for k = l:n the matrices
1 rl
rl 1
Tk =
rk-2
rk-1 rk-2

4.7. Classical Methods for Toeplitz Systems 209
are positive definite. (There is no loss of generality in normalizing the diagonal.) We
set out to describe three important algorithms:
• Durbin's algorithm for the Yule-Walker problem Tny = -[r1, . . . , rnf·
• Levinson's algorithm for the general right-hand-side problem Tnx = b.
• Trench's algorithm for computing B = T.;;1.
4.7.3 Solving the Yule-Walker Equations
We begin by presenting Durbin's algorithm for the Yule-Walker equations which arise
in conjunction with certain linear prediction problems. Suppose for some k that sat
isfies 1 :S k :S n - 1 we have solved the kth order Yule-Walker system Tky = -r =
-[ri, . . . , rk]T. We now show how the (k + l)st order Yule-Walker system
can be solved in O(k) flops. First observe that
and
a = -
Tk+l -
TTEkz.
Since T/;1 is persymmetric, T/;1Ek = Ek T/;1 and thus
By substituting this into the above expression for a we find
The denominator is positive because Tk+l is positive definite and because
We have illustrated the kth step ofan algorithm proposed by Durbin (1960). It proceeds
by solving the Yule-Walker systems
for k = l:n as follows:

y(l} = -r1
for k = l:n - 1
end
!A = 1 + [r<klfy<kl
Gk = -(rk+l + r(k)T£ky(kl)/f3k
z(k) = y(k) + Gkt:ky(k)
y(k+l} =
[ �:) ]
(4.7.1)
As it stands, this algorithm would require 3n2 flops to generate y = y(n) . It is possible,
however, to reduce the amount of work even further by exploiting some of the above
expressions:
f3k = 1 + [r(k)jTy(k)
= 1 +
[ r(k-I) ]T [ y(k-1) + Gk-1£k- IY(k-l) ]
Tk Gk-I
= (1 + [r(k-l)j'I'y(k-1} ) + Ctk-1 (rr(k-l)jT£k-IY(k-l) + rk)
= f3k-l + Ctk-1 (-f3k-1Ctk-1 )
= (1 - aL1 )!3k-l ·
Using this recursion we obtain the following algorithm:
Algorithm 4.7.1 (Durbin) Given real numbers r0, r1 , . . . , rn with r0 = 1 such that
T = (rli-jl ) E JRn x n
is positive definite, the following algorithm computes y E JRn such
that Ty = - [r1 , . . . , rnf·
y(l) = -r(l); f3 = 1; a = -r(l)
for k = l:n - 1
end
f3 = (1 - G2)/3
a = - (r(k + 1) + r(k: - l:l)Ty(l:k)) //3
z(l:k) = y(l :k) + ay(k: - 1:1)
y( l:k + 1) =
[ z(�'.k) ]
This algorithm requires 2n2 flops. We have included an auxiliary vector z for clarity,
but it can be avoided.
4.7.4 The General Right-Hand-Side Problem
With a little extra work, it is possible to solve a symmetric positive definite Toeplitz
system that has an arbitrary right-hand side. Suppose that we have solved the system
(4.7.2)

4.7. Classical Methods for Toeplitz Systems
for some k satisfying 1 :::; k < n and that we now wish to solve
[ Tk £kr l [v l [ b ]
rT£k 1 /1 - bk+l .
211
(4.7.3)
Here, r = [r1, . . . , rk]T as above. Assume also that the solution to the order-k Yule
Walker system Tky = -r is also available. From Tkv + µ£kr = b it follows that
and so
µ = bk+l - rT£kv
= bk+l - rT£kx - µrTy
= (bk+l - rT£kx) / (1 + rTy) .
Consequently, we can effect the transition from (4.7.2) to (4.7.3) in O(k) flops.
Overall, we can efficiently solve the system Tnx = b by solving the systems
Tkx(k) = b(k) = [b1, . . . , bk]T
and
Tky(k) =
-r(k) =
-[ri' . . . ' rk]T
"in parallel" for k = l:n. This is the gist of the Levinson algorithm.
Algorithm 4.7.2 (Levinson) Given b E IR" and real numbers 1 = r0, r1, . . . , rn such
that T = (rli-jl ) E IRnxn is positive definite, the following algorithm computes x E IRn
such that Tx = b.
y(l) = -r(l); x(l) = b(l); /3 = l; a = -r(l)
for k = 1 : n - 1
end
(3 = (1 - u2)(3
µ = (b(k + 1) - r(l:k)Tx(k: - 1:1)) /,8
v(l:k) = x(l:k) + µ-y(k: - 1:1)
x(l:k + 1) = [ v(��k) ]
if k < n - 1
end
a = - (r(k + 1) + r(l:k)Ty(k: - 1:1)) /(3
z(l:k) = y(l:k) + a·y(k: - 1:1)
y(l:k + 1) = [ z(!�k)
]
This algorithm requires 4n2 flops. The vectors z and v are for clarity and can be
avoided in a detailed implementation.

4.7.5 Computing the Inverse
One of the most surprising properties of a symmetric positive definite Toeplitz matrix
Tn is that its complete inverse can be calculated in O(n2) flops. To derive the algorithm
for doing this, partition T;1 as follows:
T_1 =
[ A Er i-1
=
[ B v l
n rTE 1 VT 'Y
(4.7.4)
where A = Tn-1, E = Cn-t. and r = [ri. . . . , rn-1f. From the equation
it follows that Av = -7Er = -7E(r1, . . . ,rn-l)T and 'Y = 1 - rTEv. If y solves the
order-(n-1) Yule-Walker system Ay = -r, then these expressions imply that
'Y = 1/(1 + rTy),
v = 'YEy.
Thus, the last row and column of T,.;-1 are readily obtained.
It remains for us to develop working formulae for the entries of the submatrix B
in (4.7.4). Since AB + &rvT = ln_1, it follows that
vvT
B = A-1 - (A-1Er)vT = A-1 + - .
'Y
Now since A = Tn-l is nonsingular and Toeplitz, its inverse is persymmetric. Thus,
1) ViVj
bii = (A- ii + -
'Y
= (A-1)n-j,n-i +
'Y
= bn-j,n-i
Vn-jVn-i + ViVj
'Y 'Y
1
= bn-j,n-i + - (viVj - Vn-jVn-i) ·
'Y
(4.7.5)
This indicates that although B is not persymmetric, we can readily compute an element
bii from its reflection across the northeast-southwest axis. Coupling this with the fact
that A-1 is persymmetric enables us to determine B from its "edges" to its "interior."
Because the order of operations is rather cumbersome to describe, we preview the
formal specification of the algorithm pictorially. To this end, assume that we know the
last column and row of T,.;-1:
u u u u u k
u u u u u k
T-1 u u u u u k
= k
n u u 1L u u
u u u u u k
k k k k k k

Here "
u
"
and "k" denote the unknown and the known entries, respectively, and n =
6. Alternately exploiting the persymmetry of T;1 and the recursion (4.7.5), we can
compute B, the leading (n - 1)-by-(n - 1) block of T;1, as follows:
k k k k k k k k k k k k k k k k k k
k u u u u k k u u u k k k k k k k k
p�m k u u u u k
(�)
k u u u k k p�m k k u u k k
k u u u u k k u u u k k k k u u k k
k u u u u k k k k k k k k k k k k k
k k k k k k k k k k k k k k k k k k
k k k k k k k k k k k k
(�)
k k u k k k p�m
k k k k k k
Of course, when computing a matrix that is both symmetric and persymmetric, such
as T;1, it is only necessary to compute the "upper wedge" of the matrix-e.g.,
x x x x x x
x x x x (n = 6).
x x
With this last observation, we are ready to present the overall algorithm.
Algorithm 4.7.3 (Trench) Given real numbers 1 = ro,ri, . . . , rn such that T =
(rli-il) E Rnxn is positive definite, the following algorithm computes B = T;1. Only
those bij for which i $ j and i + j $ n + 1 are computed.
Use Algorithm 4.7.1 to solve Tn-IY = -(r1, . . . , Tn-i )T.
'Y = 1/(1 + r(l:n - l)Ty(l:n - 1))
v(l:n - 1) = -yy(n - 1: - 1:1)
B(l, 1) = -y
B(l, 2:n) = v(n - 1: - l:l)T
for i = 2 : floor((n - 1)/2) + 1
end
for j = i:n - i + 1
B(i,j) = B(i - 1,j - 1) + (v(n+l -j)v(n + 1 - i) - v(i - l)v(j - 1)) h
end
This algorithm requires 13n2/4 flops.

4.7.6 Stability Issues
Error analyses for the above algorithms have been performed by Cybenko (1978), and
we briefly report on some of his findings.
The key quantities turn out to be the O:k in (4.7.1). In exact arithmetic these
scalars satisfy
and can be used to bound 11 r-1 111:
Moreover, the solution to the Yule-Walker system TnY = -r(l:n) satisfies
provided all the O:k are nonnegative.
(4.7.6)
(4.7.7)
Now if x is the computed Durbin solution to the Yule-Walker equations, then the
vector To = Tnx + T can be bounded as follows
n
II ro II � u IT(1 + l&kl),
k=l
where &k is the computed version of o:k. By way of comparison, since each ITil is
bounded by unity, it follows that 11 Tc II � ull y 111 where Tc is the residual associated
with the computed solution obtained via the Cholesky factorization. Note that the two
residuals are of comparable magnitude provided (4.7.7) holds. Experimental evidence
suggests that this is the case even if some of the O:k are negative. Similar comments
apply to the numerical behavior of the Levinson algorithm.
For the Trench method, the computed inverse fJ of T;;1 can be shown to satisfy
In light of (4.7.7) we see that the right-hand side is an approximate upper bound for
ull T;;1 II which is approximately the size of the relative error when T;;1 is calculated
using the Cholesky factorization.
4. 7.7 A Toeplitz Eigenvalue Problem
Our discussion of the symmetric eigenvalue problem begins in Chapter 8. However, we
are able to describe a solution procedure for an important Toeplitz eigenvalue problem
that does not require the heavy machinery from that later chapter. Suppose
T = [� �]

is symmetric, positive definite, and Toeplitz with r E R.n-l. Cybenko and Van Loan
(1986) show how to pair the Durbin algorithm with Newton's method to compute
Amin(T) assuming that
Amin(T) < Amin(B).
This assumption is typically the case in practice. If
[ 1 rT ] [ o: ] _
A
. [ a ]
r B y
- mm y '
then y = -a(B - Aminl)-1r, a =/:- 0, and
a + rT [-a(B - Am;nl)-1r) = Amino:.
Thus, Amin is a zero of the rational function
f(A) = 1 - A - rT(B - A/)-1 r.
Note that if A < Amin(B), then
!'(A) = -1 - 11 (B - AI)-1r 11� ::; -1,
J"(A) = -2rT(B - AI)-3r $ 0.
Using these facts it can be shown that if
Amin(T) $ >,(O)
< Amin(B),
then the Newton iteration
(4.7.8)
(4.7.9)
(4.7.10)
converges to Amin(T) monotonically from the right. The iteration has the form
>,(k+1) = >,(k) +
l + rTw - A(k>,
l + wTw
where w solves the "shifted" Yule-Walker system
(B - >,(k)I)w = -r.
Since >.(k) < >.min(B), this system is positive definite and the Durbin algorithm (Algo
rithm 4.7.1) can be applied to the normalized Toeplitz matrix (B - A(k)J)/(1 - A(kl).
The Durbin algorithm can also be used to determine a starting value A(O) that
satisfies (4.7.9). If that algorithm is applied to
T>. = (T - .AI)/(1 - A)
then it runs to completion if T>. is positive definite. In this case, the fA defined in
(4.7.1) are all positive. On the other hand, if k $ n - 1, f3k $ 0 and f31, . . . , fJk-l are all
positive, then it follows that T>.(l:k, l:k) is positive definite but that T>.(l:k+ 1, k+ 1) is
not. Let m(>.) be the index ofthe first nonpositive (3 and observe that ifm(A<0l) = n-1,
then B - >.<0lI is positive definite and T - A(O)I is not, thereby establishing (4.7.9). A
bisection scheme can be formulated to compute A(O) with this property:

L = O
R = l - lr1I
µ = (L + R)/2
while m(µ) f= n - 1
if m(µ) < n - 1
R = µ
else
L = µ
end
µ = (L + R)/2
end
_x(O) = µ
(4.7.11)
At all times during the iteration we have m(L) :::; n - 1 :::; m(R). The initial value for
R follows from the inequality
Note that the iterations in (4.7.10) and (4.7.11) involve at most O(n2) flops per pass.
A heuristic argument that O(log n) iterations are required is given by Cybenko and
Van Loan (1986).
4. 7.8 Unsymmetric Toeplitz System Solving
We close with some remarks about unsymmetric Toeplitz system-solving. Suppose we
are given scalars r1, . . . , rn-1, P1, . . . ,Pn-l , and bi, . . . , bn and that we want to solve a
linear system Tx = b of the form
I
1 r1
P1 1
P2 P1
P3 P2
p4 p3 �� : �: II�� I I��I
1 r1 r2 x3 b3
P1 1 r1 X4 b4
P2 Pl 1 X5 b5
(n = 5).
Assume that Tk = T(l:k, l:k) is nonsingular for k = l:n. It can shown that if we have
the solutions to the k-by-k systems
T'{y -r = - h T2 · · · Tk f ,
Tkw -p = - (p1 P2 · · · Pk f , (4.7.12)
Tkx = b = [b1 b2 . . . bk f ,

4.7. Classical Methods for Toeplitz Systems
then we can obtain solutions to
n:J �
-
[r:H ]•
l [: l [P:+1 ]•
l [: l [bk:l l
217
(4.7.13)
in O(k) flops. The update formula derivations are very similar to the Levinson algo
rithm derivations in §4.7.3. Thus, if the process is repeated for k = l:n -
1, then we
emerge with the solution to Tx = Tnx = b. Care must be exercised if a Tk matrix is
singular or ill-conditioned. One strategy involves a lookahead idea. In this framework,
one might transition from the Tk problem directly to the Tk+2 problem if it is deemed
that the Tk+l problem is dangerously ill-conditioned. See Chan and Hansen (1992).
An alternative approach based on displacement rank is given in §12.1.
Problems
P4.7.1 For any v E R" define the vectors v+ = (v+env)/2 and v_ = (v -e..v)/2. Suppose A E Fxn
is symmetric and persymmetric. Show that if Ax = b then Ax+ = b+ and Ax_ = b_ .
P4.7.2 Let U E Rnxn be the unit upper triangular matrix with the property that U(l:k - 1, k) =
Ck-lY(k- l) where y(k) is defined by (4.7.1). Show that UTTnU = diag(l, ,Bi , . . . , .Bn-i).
P4.7.3 Suppose that z E Rn and that S E R"'xn is orthogonal. Show that if X = [z, Sz, . . . , sn-l z] ,
then XTX is Toeplitz.
P4.7.4 Consider the LDLT factorization of an n-by-n symmetric, tridiagonal, positive definite Toeplitz
matrix. Show that dn and ln,n-1 converge as n -+ oo.
P4.7.5 Show that the product of two lower triangular Toeplitz matrices is Toeplitz.
P4.7.6 Give an algorithm for determining µ E R such that Tn + µ (enef + e1e:Z:) is singular. Assume
Tn = (rli-j l ) is positive definite, with ro = 1.
P4.7.7 Suppose T E Rnxn is symmetric, positive definite, and Tocplitz with unit diagonal. What is
the smallest perturbation of the the ith diagonal that makes T semidefinite?
P4.7.8 Rewrite Algorithm 4.7.2 so that it does not require the vectors z and v.
P4.7.9 Give an algorithm for computing it00(Tk) for k = l:n.
P4.7.10 A p-by-p block matrix A = (Aij ) with m-by-m blocks is block Toeplitz if there exist
A-p+i. . . . , A-1,Ao,At , . . . ,Ap-1 E R"''xm so that Aij = Ai-i • e.g.,
[ Ao
A = A-1
A-2
A-a
(a) Show that there is a permutation II such that
[Tu
T T21
II AII = : :
Tm1
T1m l
Tmm
T12

where each Tij is p-by-p and Toeplitz. Each Tij should be "made up" of (i, j) entries selected from
the Ak matrices. (b) What can you say about the Tij if Ak = A-k • k = l:p - 1?
P4.7.ll Show how to compute the solutions to the systems in (4.7.13) given that the solutions to the
systems in (4.7.12) are available. Assume that all the matrices involved are nonsingular. Proceed to
develop a fast unsymmetric Toeplitz solver for Tx = b assuming that T's leading principal submatrices
are all nonsingular.
P4.7.12 Consider the order-k Yule-Walker system Tky(k) = -r(k) that arises in (4.7.1). Show that if
y(k) = [Ykt .. . • ,Ykk]T for k = l:n - 1 and
1 0 0 0
n
Yl l 1 0 0
L � [ Y22 Y21 1 0
Yn-�,n-l Yn-l,n-2 Yn-1,n-3 Yn-1,1
then LTTnL = diag(l,,81, . . ., .Bn-1) where f3k = 1 + rCk)Ty(k) . Thus, the Durbin algorithm can be
thought of as a fast method for computing and LDLT factorization of T,;-1.
P4.7.13 Show how the Trench algorithm can be used to obtain an initial bracketing interval for the
bisection scheme (4.7.11).
The original references for the three algorithms described in this section are as follows:
J. Durbin (1960). "The Fitting of Time Series Models," Rev. Inst. Int. Stat. 28, 233-243.
N. Levinson (1947). ''The Weiner RMS Error Criterion in Filter Design and Prediction," J. Math.
Phys. 25, 261-278.
W.F. Trench (1964). "An Algorithm for the Inversion of Finite Toeplitz Matrices," J. SIAM 12,
515-522.
As is true with the "fast algorithms" area in general, unstable Toeplitz techniques abound and caution
must be exercised, see:
G. Cybenko (1978). "Error Analysis of Some Signal Processing Algorithms," PhD Thesis, Princeton
University.
G. Cybenko (1980). "The Numerical Stability of the Levinson-Durbin Algorithm for Toeplitz Systems
of Equations," SIAM J. Sci. Stat. Compu.t. 1, 303-319.
J.R. Bunch (1985). "Stability of Methods for Solving Toeplitz Systems of Equations," SIAM J. Sci.
Stat. Compu.t. 6, 349-364.
E. Linzer (1992). "On the Stability of Solution Methods for Band Toeplitz Systems," Lin. Alg. Applic.
1 70, 1-32.
J.M. Varah (1994). "Backward Error Estimates for Toeplitz Systems," SIAM J. Matrix Anal. Applic.
15, 408-417.
A.W. Bojanczyk, R.P. Brent, F.R. de Hoog, and D.R. Sweet (1995). "On the Stability of the Bareiss
and Related Toeplitz Factorization Algorithms," SIAM J. Matrix Anal. Applic. 16, 40 -57.
M.T. Chu, R.E. Funderlic, and R.J. Plemmons (2003). "Structured Low Rank Approximation," Lin.
Alg. Applic. 366, 157-172.
A. Bottcher and S. M. Grudsky (2004). "Structured Condition Numbers of Large Toeplitz Matrices
are Rarely Better than Usual Condition Numbers," Nu.m. Lin. Alg. 12, 95-102.
J.-G. Sun (2005). "A Note on Backwards Errors for Structured Linear Systems," Nu.mer. Lin. Alg.
Applic. 12, 585-603.
P. Favati, G. Lotti, and 0. Menchi (2010). "Stability of the Levinson Algorithm for Toeplitz-Like
Systems," SIAM J. Matrix Anal. Applic. 31, 2531-2552.
Papers concerned with the lookahead idea include:
T.F. Chan and P. Hansen {1992). "A Look-Ahead Levinson Algorithm for Indefinite Toeplitz Systems,"
M. Gutknecht and M. Hochbruck (1995). "Lookahead Levinson and Schur Algorithms for Nonhermi
tian Toeplitz Systems," Nu.mer. Math. 70, 181-227.

4.8. Circulant and Discrete Poisson Systems 219
M. Van Bare! and A. Bulthecl (1997). "A Lookahead Algorithm for the Solution of Block Toeplitz
Systems,'' Lin. Alg. Applic. 266, 291-335.
Various Toeplitz eigenvalue computations are presented in:
G. Cybenko and C. Van Loan (1986). "Computing the Minimum Eigenvalue of a Symmetric Positive
Definite Toeplitz Matrix,'' SIAM J. Sci. Stat. Comput. 7, 123-131.
W.F. Trench (1989). "Numerical Solution ofthe Eigenvalue Problem for Hermitian Toeplitz Matrices,''
SIAM J. Matrix Anal. Appl. 10, 135-146.
H. Voss (1999). "Symmetric Schemes for Computingthe Minimum Eigenvalue ofa Symmetric Toeplitz
Matrix,'' Lin. Alg. Applic. 287, 359-371.
A. Melman (2004). "Computation of the Smallest Even and Odd Eigenvalues of a Symmetric Positive
Definite Toeplitz Matrix,'' SIAM J. Matrix Anal. Applic. 25, 947-963.
4.8 Circulant and Discrete Poisson Systems
If A E <Cnxn has a factorization of the form
v-1AV = A = diag(>.1, . . . , An), (4.8.1)
then the columns of V are eigenvectors and the Ai are the corresponding eigenvalues2.
In principle, such a decomposition can be used to solve a nonsingular Au = b problem:
(4.8.2)
However, if this solution framework is to rival the efficiency of Gaussian elimination or
the Cholesky factorization, then V and A need to be very special. We say that A has
a fast eigenvalue decomposition (4.8.1) if
(1) Matrix-vector products of the form y = Vx require O(n logn) flops
to evaluate.
(2) The eigenvalues A1, . . . , An require O(n log n) flops to evaluate.
(3) Matrix-vector products of the form b = v-1b require O(n log n) flops
to evaluate.
If these three properties hold, then it follows from (4.8.2) that O(n logn) flops are
required to solve Au = b.
Circulant systems and related discrete Poisson systems lend themselves to this
strategy and are the main concern of this section. In these applications, the V-matrices
are associated with the discrete Fourier transform and various sine and cosine trans
forms. (Now is the time to review §1.4.1 and §1.4.2 and to recall that we have n logn
methods for the DFT, DST, DST2, and DCT.) It turns out that fast methods ex
ist for the inverse of these transforms and that is important because of (3). We will
not be concerned with precise flop counts because in the fast transform "business" ,
some n arc friendlier than others from the efficiency point of view. While this issue
may be important in practice, it is not something that we have to worry about in our
brief, proof-of-concept introduction. Our discussion is modeled after §4.3-§4.5 in Van
Loan (FFT) where the reader can find complete derivations and greater algorithmic de
tail. The interconnection between boundary conditions and fast transforms is a central
theme and in that regard we also recommend Strang (1999).
2This section does not depend on Chapters 7 and 8 which deal with computing eigenvalues and
eigenvectors. The eigensystems that arise in this section have closed-form expressions and thus the
algorithms in those later chapters are not relevant to the discussion.

4.8.1 The Inverse of the OFT Matrix
Recall from §1.4.1 that the DFT matrix Fn E <Cnxn is defined by
[ I:"
I
.
_
w<k-l)(j-1)
L"n kJ - n ' Wn = cos (2
:) - i sin (2
:).
It is easy to verify that
H -
Fn = Fn
and so for all p and q that satisfy 0 ::; p < n and 0 ::; q < n we have
n-1 n-1
Fn(:,p + l)HFn(:, q + 1) = :�::::c<)�Pw�q = L W�(q-p).
k=O k=O
If q = p, then this sum equals n. Otherwise,
n-1
L W�(q-p)
k=O
It follows that
1 - w:!(q-p)
1 - wi-p
1 - 1
= 0.
1 - wi-p
H
-
nln = Fn Fn = FnFn.
Thus, the DFT matrix is a scaled unitary matrix and
- 1
1 -
Fn = -Fn.
n
A fast Fourier transform procedure for Fnx can be turned into a fast inverse Fourier
transform procedure for F.;;1x. Since
1 1 -
y = F.;; X = -Fnx,
n
simply replace each reference to Wn with a reference to Wn and scale. See Algorithm
1.4.1.
4.8.2 Circulant Systems
A circulant matrix is a Toeplitz matrix with "wraparound", e.g.,
[Zc Z4
Z1 zo
C(z) = Z2 Z1
Z3 Z2
Z4 Z3
Z3 Z2
Z4 Z3
zo Z4
Z1 Zo
z2 Z1
Z1
I
Z2
Z3 .
Z4
Zo
We assume that the vector z is complex. Any circulant C(z) E <Cnxn is a linear combi
nation of In, Vn, . . . , v;-:-1 where Vn is the downshift permutation defined in §1.2.11.
For example, if n = 5, then

4.8.
and
Circulant and Discrete Poisson Systems
0 0 0 0 1 0 0 0 1 0 0
V� = 1 0 0 0 0 , V� = 0 0 0 0 1 , V� = 0
[o 0 0 I O
J
[o 0 I 0 O
J
[o
0 1 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 1 0 0 0 1
Thus, the 5-by-5 circulant matrix displayed above is given by
C(z) = zol + z1 Vn + z2v; + Z3V� + z4V�.
Note that Vg = /s. More generally,
n-1
:::? C(z) = L ZkV�.
k=O
Note that if v-1vnV = A is diagonal, then
221
1 0 0
n
0 1 0
0 0 1
0 0 0
0 0 0
(4.8.3)
v-1c(z)V = v-1 (�zkv�)v = �zk (V-1vnv-1)k
= �zkAk (4.8.4)
k=O k=O k=O
is diagonal. It turns out that the DFT matrix diagonalizes the downshift permutation.
for j= O:n - 1.
j (2j7f) . . (2j7f)
Aj+l = Wn = COS --;;:- + t Slll --;;:-
Proof. For j= O:n - 1 we have
1
w2i
n
(n-l)j
Wn
(n-l)j
Wn
1
(n-2)j
Wn
= w!,
1
w/,
w;/
(n-l)j
Wn
This vector is precisely FnA(: , j+ 1). Thus, VnV = VA, i.e., v-1vnV = A. D
It follows from (4.8.4) that any circulant C(z) is diagonalized by Fn and the eigenvalues
of C(z) can be computed fast.

Theorem 4.8.2. Suppose z E ccn and C(z) are defined by (4.8.3}. If V = Fn and
>. = Fnz, then v-1c(z)V = diag(>.i . . . . , >.n)·
Proof. Define
and note that the columns of Fn are componentwise powers ofthis vector. In particular,
Fn(:, k + 1) = rk where [rkJ; = Jj. Since A = diag(J), it follows from Lemma 4.8.1
that
n-1 n-1 n-1
v-1c(z)V = L.>kAk = :�::>k diag(J)k = :�::>k diag(J:k)
k=O k=O k=O
(n-1 )
= diag L: zk rk
k=O
completing the proof of the theorem D
Thus, the eigenvalues of the circulant matrix C(z) are the components of the vector
Fnz. Using this result we obtain the following algorithm.
Algorithm 4.8.1 If z E ccn, y E ccn, and C(z) is nonsingular, then the following
algorithm solves the linear system C(z)x = y.
Use an FFT to compute c = FnY and d = P..z.
w = c./d
Use an FFT to compute u = Fnw.
x = u/n
This algorithm requires 0(n log n) flops.
4.8.3 The Discretized Poisson Equation in One Dimension
We now turn our attention to a family of real matrices that have real, fast eigenvalue
decompositions. The starting point in the discussion is the differential equation
cFu
dx2 = -f(x) a � u(x) � {3,
together with one of four possible specifications of u(x) on the boundary.
Dirichlet-Dirichlet (DD): u(a) = Uo:,
Dirichlet-Neumann (DN): u(o:) = Uo:,
Neumann-Neumann (NN): u'(a) = u�,
Periodic (P): u(a) = u(f3).
u(f3) = Uf3,
u'(f3) = u!J,
u'({3) = u!J,
(4.8.5)

By replacing the derivatives in (4.8.5) with divided differences, we obtain a system of
linear equations. Indeed, if m is a positive integer and
then for i = l:m - 1 we have
h
h = (J - Ot
Ui - 1Li-l
h
m
_ Ui-1 - 2Ui + Ui+l _
-f·
- h2 - • (4.8.6)
where fi = f(a+ih) and Ui � u(a+ih). To appreciate this discretization we display the
linear equations that result when m = 5 for the various possible boundary conditions.
The matrices tiDD>, tiDN), tiNN), and ,,.jP) are formally defined afterwards.
For the Dirichlet-Dirichlet problem, the system is 4-by-4 and tridiagonal:
...-(DD) . • =
-1 2 -1 Q U2 _ h2h
[ 2 -1 0 0 l[U1 l [h2fi + Uo: l
14' u(l.4) - - 2 •
0 -1 2 -1 U3 h f3
0 0 -1 2 U4 h2f4 + u,a
For the Dirichlet-Neumann problem the system is still tridiagonal, but us joins u1, . . . , U4
as an unknown:
2 -1 0 0 0 U1 h2fi + Uo:
-1 2 -1 0 0 U2 h2h
75(DN) .
u(1:5) = 0 -1 2 -1 0 U3 = h2h
0 0 -1 2 -1 U4 h2f4
0 0 0 -2 2 U5 2hu'.B
The new equation on the bottom is derived from the approximation u'((J) � (u5-u4)/h.
(The scaling of this equation by 2 simplifies some of the derivations below.) For the
Neumann-Neumann problem, us and uo need to be determined:
2 -2 0 0 0 0 Uo -2hu�
-1 2 -1 0 0 0 U1 h2Ji
16(NN) •
U(0:5)
0 -1 2 -1 0 0 U2 h2h
=
h2h
0 0 -1 2 -1 0 U3
0 0 0 -1 2 -1 U4 h2h
0 0 0 0 -2 2 U5 2hu�
Finally, for the periodic problem we have
2 -1 0 0 -1 U1 h2fi
-1 2 -1 0 0 U2 h2h
75<1·) •
u(1:5) 0 -1 2 -1 0 'U3 = h2h
0 0 -1 2 -1 U4 h2f4
-1 0 0 -1 2 U5 h2fs

The first and last equations use the conditions uo = us and u1 = u6• These constraints
follow from the assumption that u has period /3 - a.
As we show below, the n-by-n matrix
T,_(DD)
n
and its low-rank adjustments
tiDN) =
tiDD) - ene'f:_ 1 ,
tiNN) = tiDD) - ene'f:_ l - e1 er,
tiP) = ti00> - ei e'f: - enef.
(4.8.7)
(4.8.8)
(4.8.9)
(4.8.10)
have fast eigenvalue decompositions. However, the existence of O(n logn) methods for.
these systems is not very interesting because algorithms based on Gaussian elimina
tion are faster: O(n) versus O(n logn). Things get much more interesting when we
discretize the 2-dimensional analogue of (4.8.5).
4.8.4 The Discretized Poisson Equation in Two Dimensions
To launch the 2D discussion, suppose F(x, y) is defined on the rectangle
R =
{(x, y) : ax � X � /3x, O!y � y � /3y}
and that we wish to find a function u that satisfies
[J2u 82u
8x2 +
8y2 = -F(x, y) (4.8.11)
on R and has its value prescribed on the boundary of R. This is Poisson's equation
with Dirichlet boundary conditions. Our plan is to approximate u at the grid points
(ax + ihx, ay + jhy) where i = l:m1 - 1, j = l:m2 - 1, and
h -
f3x - O!x h -
/3y - O!y
x -
m1 Y - m2
·
Refer to Figure 4.8.1, which displays the case when m1 = 6 and m2 = 5. Notice that
there are two kinds of grid points. The function ·u is known at the "•" grid points on
the boundary. The function u is to be determined at the "
o
"
grid points in the interior.
The interior grid points have been indexed in a top-to-bottom, left-to-right order. The
idea is to have Uk approximate the value of u(x, y) at grid point k.
As in the one-dimensional problem considered §4.8.3, we use divided differences
to obtain a set of linear equations that define the unknowns. An interior grid point P
has a north (N), east (E), south (S), and west (W) neighbor. Using this "compass
point" notation we obtain the following approximation to (4.8.11) at P:
u(E) - u(P) u(P) - u(W) u(N) - u(P) u(P) - u(S)
+ = -F(P)

1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Figure 4.8.1. A grid with m1 = 6 and m2 = 5.
The x-partial and y-partial have been replaced by second-order divided differences.
Assume for clarity that the horizontal and vertical grid spacings are equal, i.e., hx =
hy = h. With this assumption, the linear equation at point P has the form
4u(P) - u(N) - u(E) - u(S) - u(W) = h2F(P).
In our example, there are 20 such equations. It should be noted that some of P's
neighbors may be on the boundary, in which case the corresponding linear equation
involves fewer than 5 unknowns. For example, if P is the third grid point then we see
from Figure 4.8.1 that the north neighbor N is on the boundary. It follows that the
associated linear equation has the form
4u(P) - u(E) - u(S) - u(W) = h2F(P) + u(N).
Reasoning like this, we conclude that the matrix of coefficients has the following block
tridiagonal form
/..(DD)
5 0 0 0 2fs
-[5 0 0
0 Ti;DD) 0 0 -[5 2[5 -[5 0
A +
0 0 tiDD) 0 0 -[5 2[5 -[5
0 0 0 /..(DD)
5 0 0 -[5 2Is
i.e.,
A = [4 ® tiDD) + �(DD) ® fs .
Notice that the first matrix is associated with the x-partials while the second matrix
is associated with the y-partials. The right-hand side in Au = b is made up of F
evaluations and specified values of u(x, y) on the boundary.

Extrapolating from our example, we conclude that the matrix of coefficients is an
(m2 - l)-by-(m2 - 1) block tridiagonal matrix with (mi - l)-by-(m1 - 1) blocks:
A I '°' -r(DD) -r(DD) '°'
I
= m2-i 'OI 1m1-i + 'm2-1 'OI m1-l·
Alternative specifications along the boundary lead to systems with similar structure,
e.g.,
(4.8.12)
For example, if we impose Dirichet-Neumann, Neumann-Neumann, or periodic bound
ary conditions along the left and right edges of the rectangular domain R, then A1 will
equal Tm�N), ti.7�L or r<
,;;.> accordingly. Likewise, if we impose Dirichet-Neumann,
Neumann-Neumann, or periodic boundary conditions along the bottom and top edges
of R, then A2 will equal r.Ji�N>, r�:�L or r.J:/ If the system (4.8.12) is nonsingular
and Ai and A2 have fast eigenvalue decomposition&, then it can be solved with just
O(N log N) flops where N = nin2. To see why this is possible, assume that
v-iAiV = Di = diag(Ai. . . . , An, ),
w-1A2W = D2 = diag(µ1 , . . . , µn2)
(4.8.13)
(4.8.14)
are fast eigenvalue decompositions. Using facts about the Kronecker product that are
set forth in §1.3.6-§1.3.8, we can reformulate (4.8.12) as a matrix equation
AiU + UAf = B
where U = reshape(u, ni, n2) and B = reshape(b, ni, n2). Substituting the above eigen
value decompositions into this equation we obtain
DiU + fjD2 = B,
where U = (iii;) = v-iuw-T and B = (bi;) = v-iBw-T. Note how easy it is to
solve this transformed system because Di and D2 are diagonal:
- bij . 1 . 1
Uij = i = :ni. J = :n2.
Ai + µ;
For this to be well-defined, no eigenvalue of Ai can be the negative of an eigenvalue of
A2. In our example, all the Ai and µi are positive. Overall we obtain
Algorithm 4.8.2 (Fast Poisson Solver Framework) Assume that Ai E IRn1 xn, and
A2 E IR.n2xn2 have fast eigenvalue decompositions (4.8.13) and (4.8.14) and that the
matrix A = In2 ® Ai + A2 ® In, is nonsingular. The following algorithm solves the
linear system Au = b where b E IR.n1n2•
fJ = (w-i(V-1B)T)T where B = reshape(b, ni, n2)
for i = l:ni
end
for j = l:n2
iii; = bi;/(Ai + µ;)
end
u = reshape(U, nin2, 1) where U = (W(VU)T)T

The following table accounts for the work involved:
Operation How Many? Work
v-1 times ni-vector n2 O(n2·n1 ·log n1)
w-1 times n2-vector ni O(n1 ·n2·logn2)
V times ni-vector n2 O(n2 ·n1 ·logn1)
W times n2-vector ni O(n1 ·n2 ·logn2)
Adding up the operation counts, we see that O(n1n2 log(n1n2)) = O(N log N) flops
are required where N = nin2 is the size of the matrix A.
Below we show that the matrices TnDD), TnDN), TnNN), and TnP) have fast eigen
value decompositions and this means that Algorithm 4.8.2 can be used to solve dis
crete Poisson systems. To appreciate the speedup over conventional methods, suppose
A1 = Tn�o) and A2 = 'T.!:w). It can be shown that A is symmetric positive definite
with bandwidth n1 + 1. Solving Au = b using Algorithm 4.3.5 (band Cholesky) would
require O(n� n2) = O(N n�) flops.
4.8.5 The Inverse of the DST and OCT Matrices
The eigenvector matrices for Tn°0), TnDN), TnNN), and TnP) are associated with the
fast trigonometric transforms presented in §1.4.2. It is incumbent upon us to show that
the inverse of these transforms can also be computed fast. We do this for the discrete
sine transform (DST) and the discrete cosine transform (DCT) and leave similar fast
inverse verifications to the exercises at the end of the section.
By considering the blocks of the DFT matrix F2m, we can determine the inverses
of the transform matrices DST(m - 1) and DCT(m + 1). Recall from §1.4.2 that if
Cr E nrxr and Sr E nrxr are defined by
then
C - iS
VT
E(C + iS)
1
v
(-l)m
Ev
where C = Cm-1' S = Sm-1, E = Cm-1' and
eT l
(C + iS)E
vTE
E(C - iS)E
eT = ( 1, 1, . . . , 1 )
-...-
m-1
VT = ( -1, 1, . . . , (-lr-l ).
m-1
By comparing the (2,1), (2,2), (2,3), and (2,4) blocks in the equation 2mJ = F2mF2m
we conclude that
0 = 2Ce + e + v,

2mim-1 = 2C2 + 282 + eeT + vvT,
o = 2Cv + e + (-1rv,
0 = 2C2 - 282 + eeT + VVT.
It follows that 282 = mim-1 and 2C2 = mim-l - eeT - vvT. Using these equations it
is easy to verify that
and
[1/2 eT
e/2 Cm-1
1/2 VT
s-1 2
= -Sm-1
m-1 m
1/2
r r
1/2
2
v/2 = - e/2
(-l)m/2
m
1/2
eT
Cm-1
VT
1/2 l
v/2 .
{-l)m/2
Thus, it follows from the definitions {1.4.8) and {l.4.10) that
V = DST(m - 1) =? v-1 = _! DST(m - 1),
m
V = DCT{m + 1) =? v-1 = _! DCT{m + 1).
m
In both cases, the inverse transform is a multiple of the "forward" transform and can
be computed fast. See Algorithms 1.4.2 and 1.4.3.
4.8.6 Four Fast Eigenvalue Decompositions
The matrices TnDD), TnDN), TnNN), and 7:/iP) do special things to vectors of sines and
cosines.
Lemma 4.8.3. Define the real n-vectors s(O) and c(O) by
s(O) � [:J (4.8.15)
where Sk = sin{kO) and Ck = cos(kO). If ek = In(:, k) and .X = 4sin2{B/2), then
ti_DD) ·s(O) = A·s(B) + Bn+1en,
ti_DD) .c(O) = A·c(B) + C1e1 + Cnen,
ti_DN) ·s(B) = .X·s(B) + (sn+l - Bn-i)en,
ti_NN) ·c(B) = .X·c(B) + (Cn - Cn-2)en,
ti_P) ·s(B) = .X·s(B) - Bne1 + (sn+l - s1)en,
ti_P) ·c(B) = A·c(B) + (c1 - Cn-1)e1 + (en - l)en.
(4.8.16)
(4.8.17)
(4.8.18)
(4.8.19)
(4.8.20)
(4.8.21)

4.8. Circulant and Discrete Poisson Systems
Proof. The proof is mainly an exercise in using the trigonometric identities
Sk-1 = C1Sk - S1Ck,
Sk+l = C1Sk + S1Ck,
For example, if y = ti00>s(9), then
Ck-1 C1Ck + S1Sk,
Ck+l = C1Ck - S1Sk·
if k = 1,
{2s 1 - s2 = 2s1(l - c1),
Yk = -Sk-1 + 2sk �sk+1 = 2sk(l - c1),
-Sn-1 + 2sn - 2sn(l - c1) + Sn+l•
if 2 :::; k :::; n - 1,
if k = n.
229
Equation (4.8.16) follows since (1 -c1) = 1 -cos(9) = 2 sin2(9/2). The proof of (4.8.17)
is similar while the remaining equations follow from Equations (4.8.8)-(4.8.10). D
Notice that (4.8.16)-(4.8.21) are eigenvector equations except for the "e1" and "en"
terms. By choosing the right value for 9, we can make these residuals disappear,
thereby obtaining recipes for the eigensystems of Tn°0>, TnDN) , TnNN>, and TnP).
The Dirichlet-Dirichlet Matrix
If j is an integer and (J = j7r/(n + 1), then Sn+l = sin((n + 1)9) = 0. It follows
from (4.8.16) that
j7r
83· =
n + 1 '
for j = l:n. Thus, the columns of the matrix v�vD) E R.nxn defined by
[v.:(DD)] . _ · ( kj'Tr )
n kJ - sm
n + l
are eigenvectors for ti00> and the corresponding eigenvalues are given by
Aj = 4 sin2 ( j7r
)
2(n + l) '
for j = l:n. Note that v�DD) = DST(n). It follows that ti00> has a fast eigenvalue
decomposition.
The Dirichlet-Neumann Matrix
If j is an integer and 9 = (2j - l)7r/(2n), then Sn+l - Sn-1
follows from (4.8.18) that
Ll . - (2j - l)7r
U3 - '
2n
for j = l:n. Thus, the columns of the matrix v�DN) E nnxn defined by
[V�DN)]kj = sin (k(2j
2
: 1)7r)

areeigenvectorsofthematrixrj_DN) andthecorrespondingeigenvaluesaregivenby
. 2 ((2j- l)7r)
Aj = 4sm 4n
forj= l:n. Comparing with (1.4.13) wesee that that V�DN) =DST2(n). Theinverse
DST2canbeevaluatedfast. SeeVanLoan(FFT, p. 242) fordetails,butalsoP4.8.11.
It followsthat /(DN) hasafast eigenvaluedecomposition.
The Neumann-Neumann Matrix
Ifj is an integer and () = (j- 1)7r/(n- 1), then Cn - Cn-2 = -2s1sn-1 = 0. It
follows from (4.8.19) that
(j- 1)7r
()j = n - 1
Thus, thecolumnsofthematrix v�DN) E Rnxn definedby
[v:{NN)J . _
((k-l)(j-1)7r)
n kJ - COS
n-1
areeigenvectorsofthematrixrj_DN) andthe correspondingeigenvaluesaregivenby

4 . 2 ((j-1)7r)
Aj = sm 2(n- 1)
forj = l:n. Comparingwith (1.4.10) weseethat
v�NN) = DCT(n) .diag(2,In-2, 2)
andtherefore/(NN) hasafast eigenvaluedecomposition.
The Periodic Matrix
Wecan proceed to work out the eigenvalue decomposition forrj_P) as we did in
thepreviousthreecases,i.e.,byzeroingtheresidualsin(4.8.20)and(4.8.21). However,
rj_P) isacirculant matrix andsoweknowfromTheorem4.8.2 that
where
Itcanbeshownthat
F,;;1ti_P)Fn = diag(>-.1,... , An)
2
-1
Pn o
-1
2Fn(:,1) -Fn(:,2)- Fn(:, n).
. - 4 . 2 ((j- l)7r)
A1 - sm n

forj = l:n. It follows that ']jP) has a fast eigenvaluedecomposition. However, since
thismatrixisrealit is preferabletohaveareal V-matrix. Usingthefactsthat
and
Fn(:,j) = Fn(:, (n + 2 - j))
forj = 2:n,itcanbeshownthatifm = ceil((n + 1)/2) and
V�P) = (Re(Fn(:, l:m) J lm(Fn(:,m + l:n)) ]
then
(4.8.22)
(4.8.23)
(4.8.24)
(4.8.25)
for j = l:n. Manipulations with this real matrix and its inverse can be carried out
rapidly asdiscussedinVanLoan (FFT, Chap. 4).
4.8.7 A Note on Symmetry and Boundary Conditions
Inourpresentation,thematrices']jDN) and']jNN) arenotsymmetric. However,asim
plediagonalsimilaritytransformationchangesthis. Forexample,ifD = diag(In-1, J2),
then v-1-,jDN)D is symmetric. Working with symmetric second difference matrices
hascertainattractions,i.e.,theautomaticorthogonalityoftheeigenvectormatrix. See
Strang (1999).
Problems
P4.8.1 Suppose z E Rn has the property that z(2:n) = t'n-1z(2:n). Show that C(z) is symmetric
and Fnz is real.
P4.8.2 As measured in the Frobenius norm, what is the nearest real circulant matrix to a given real
Toeplitz matrix?
P4.8.3 Given x, z E <Cn, show how to compute y = C(z)·x in O(n logn) flops. In this case, y is the
cyclic convolution of x and z.
P4.8.4 Suppose a = ( a-n+1 , . . . , a-1, ao, ai, . . . , an-1 ] and let T = (tk;) be the n-by-n Toeplitz
matrix defined by tk; = ak-j· Thus, if a = ( a-2, a-i, ao, a1 , a2 ], then
[ao
T = T(a) = al
a2
It is possible to "embed" T into a circulant, e.g.,
ao a-1 a-2 0
al ao a-1 a-2
a2 al ao a-1
C = 0 a2 al ao
0 0 a2 al
0 0 0 a2
a-2 0 0 0
a-1 a-2 0 0
a-1
ao
al
0
0
a-2
a-1
ao
al
a2
0
a-2
i
a-1 .
ao
0 a2
0 0
0 0
a-2 0
a-1 a-2
ao a-1
al ao
a2 ai
al
a2
0
0
0
a-2
a-1
ao
Given a-n+l · · . . , a-1 , lo, a1, . . . , an-1 and m ;:::: 2n - 1, show how to construct a vector v E <Cm so
that if C = C(v), then C(l:n, l:n) = T. Note that v is not unique if m > 2n - 1.
P4.8.5 Complete the proof of Lemma 4.8.3.

P4.8.6 Show how to compute a Toeplitz-vector product y = Tu in n logn time using the embedding
idea outlined in the previous problem and the fact that circulant matrices have a fast eigenvalue
decomposition.
P4.8.7 Give a complete specification of the vector b in (4.8.12) if A1 = T
J�D), A2 = -r,l�o), and
u(x, y) = 0 on the boundary of the rectangular domain R. In terms of the underlying grid, n1 = m1 - 1
and n2 = m2 - 1.
P4.8.8 Give a complete specification of the vector b in (4.8.12) if A1 = T,l�N), A2 = -r,i�N),
u(x, y) = 0 on the bottom and left edge of R, u.,(x, y) = 0 along the right edge of R, and uy(x, y) = 0
along the top edge of R. In terms of the underlying grid, n1 = m1 and n2 = m2.
P4.8.9 Define a Neumann-Dirichlet matrix TJND) that would arise in conjunction with (4.8.5) if u'(o)
and u(.B) were specified. Show that T
JND) has a fast eigenvalue decomposition.
P4.B.10 . The matrices -r,lNN) and T
JPl arc singular. (a) Assuming that b is in the range of A =
In2 ® TJil + r
J:> ® In1 , how would you solve the linear system Au = b subject to the constraint
that the mean of u's components is zero? Note that this constraint makes the system solvable. (b)
Repeat part (a) replacing T
J�) with ti�N) and r.�:) with r
J:N).
P4.B.11 Let V be the matrix that defines the DST2(n) transformation in (1.4.12). (a) Show that
T n 1 T
V V = -I,. + -vv
2 2
where v = [1, -1, 1, . . . , (-l)n]T. (b) Verify that
v-I = � (1- _!._vvT)V1'
.
n 2n
(c) Show how to compute v-1x rapidly.
P4.8.12 Verify (4.8.22), (4.8.23), and (4.8.25).
P4.B.13 Show that if V = v2<;> defined in (4.8.24), then
vTv = m Un + e1ef + em+1e?:.+d·
What can you say about VTV if V = V2!:�1'?
As we mentioned, this section is based on Van Loan (FFT). For more details about fast Poisson solvers,
see:
R.W. Hockney (1965). "A Fast Direct Solution of Poisson's Equation Using Fourier Analysis," J.
Assoc. Comput. Mach. 12, 95-113.
B. Buzbee, G. Golub, and C. Nielson (1970). "On Direct Methods for Solving Poisson's Equation,"
F. Dorr (1970). "The Direct Solution of the Discrete Poisson Equation on a Rectangle,'' SIAM Review
12, 248-263.
R. Sweet (1973). "Direct Methods for the Solution of Poisson's Equation on a Staggered Grid,'' J.
Comput. Phy.9. 12, 422-428.
P.N. Swarztrauber (1974). "A Direct Method for the Discrete Solution of Separable Elliptic Equa
tions,'' SIAM J. Nu.mer. Anal. 11, 1136-1150.
P.N. Swarztrauber (1977). "The Methods of Cyclic Reduction, Fourier Analysis and Cyclic Reduction
Fourier Analysis for the Discrete Solution of Poisson's Equation on a Rectangle," SIAM Review
19, 490-501.
There are actually eight variants of the discrete cosine transform each of which corresponds to the
location of the Neumann conditions and how the divided difference approximations are set up. For a
unified, matrix-based treatment, see:
G. Strang (1999). "The Discrete Cosine Transform,'' SIAM Review 41, 135-147.

Chapter 5
Orthogonalization and
Least Squares
5.1 Householder and Givens Transformations
5.2 The QR Factorization
5.3 The Full-Rank Least Squares Problem
5.4 Other Orthogonal Factorizations
5.5 The Rank-Deficient Least Squares Problem
5.6 Square and Underdetermined Systems
This chapterisprimarilyconcerned withtheleast squaressolutionofoverdeter
mined systems of equations, i.e., the minimization of II Ax- b 112 where AE m.mxn,
b E JR.m, and m � n. The most reliable solution procedures for this problem involve
thereductionofA tovarious canonical forms viaorthogonal transformations. House
holder reflections and Givens rotations are central to this process and we begin the
chapter withadiscussionofthese important transformations. In §5.2 weshowhow to
compute the factorization A = QR where Q is orthogonal and R is upper triangular.
ThisamountstofindinganorthonormalbasisfortherangeofA. TheQR factorization
canbeused tosolvethefull-rankleastsquaresproblemasweshowin §5.3. Thetech
nique is compared with the method of normal equations after a perturbation theory
is developed. In §5.4and §5.5 we consider methods for handling thedifficultsituation
when A is (nearly) rankdeficient. QR withcolumn pivotingand other rank-revealing
procedures including the SYD are featured. Some remarks about underdetermined
systemsareoffered in §5.6.
Reading Notes
Knowledgeofchapters1, 2, and3and§§4.1-§4.3isassumed. Withinthischapter
therearethefollowingdependencies:
§5.l --+ §5.2 --+ §5.3 --+ §5.4 --+ §5.5 --+ §5.6
.j.
§5.4
233

234 Chapter 5. Orthogonalization and Least Squares
Formorecomprehensivetreatmentsoftheleastsquaresproblem, seeBjorck (NMLS)
andLawsonandHansen(SLS).OtherusefulglobalreferencesincludeStewart( MABD),
Higham (ASNA), Watkins (FMC), Trefethen and Bau (NLA), Demmel (ANLA), and
Ipsen (NMA).
5.1 Householder and Givens Transformations
Recallthat Q E Rmxm is orthogonal if
Orthogonal matrices have an important role to play in least squares and eigenvalue
computations. In this section we introduce Householder reflections and Givens rota
tions, thekeyplayersin this game.
5.1.1 A 2-by-2 Preview
It is instructive to examine the geometry associated with rotations and reflections at
them=2 level. A 2-by-2orthogonalmatrixQ isa rotation ifit has the form
[ cos(O) sin(8) l
Q =
- sin(8) cos(8) ·
Ify = QTx, then y isobtained byrotatingx counterclockwisethroughanangle8.
A 2-by-2orthogonalmatrixQ isareflection ifithastheform
[cos(0) sin(8) l
Q = sin(8) -cos(8) ·
Ify = QTx = Qx, then y isobtained by reflectingthevectorx acrossthe line defined
by
S = span .
{[cos(8/2) l}
sin(0/2)
Reflections and rotations are computationally attractive because they are easily con
structed and because they can be used to introduce zeros in a vector by properly
choosingtherotationangleorthereflectionplane.
5.1. 2 Householder Reflections
Letv E Rm benonzero. Anm-by-mmatrixP oftheform
P = I - /3vvT, (5.1.1)
isaHouseholder reflection. (SynonymsareHouseholdermatrixandHouseholdertrans
formation.) The vector v is the Householder vector. Ifavectorx is multiplied by P,
then it is reflected inthehyperplane span{v}J.. It iseasy to verify that Householder
matricesaresymmetricand orthogonal.

5.1. Householder and Givens Transformations 235
HouseholderreflectionsaresimilartoGausstransformationsintroducedin§3.2.1
inthat theyarerank-I modifications ofthe identity and canbe used to zeroselected
componentsofavector. Inparticular, supposewearegiven0=Fx E 1Rm andwant
Px = I - -- x = x - --v
( 2vvT) 2vTx
vTv vTv
tobeamultipleofei = Im(:, 1). From this weconcludethat v E span{x,ei }. Setting
gives
and
Thus,
Inorderforthecoefficient ofx tobezero, weseta = ±II x II2 forthen
(5.1.2)
Itisthissimpledeterminationofv that makestheHouseholderreflectionssouseful.
5.1.3 Computing the Householder Vector
Thereareanumberofimportantpracticaldetailsassociatedwiththedeterminationof
aHouseholdermatrix, i.e., the determination ofaHouseholdervector. One concerns
thechoiceofsigninthedefinitionofv in (5.1.2). Setting
V1 = X1 - II x 112
leads to the nice property that Px is a positive multiple of ei. But this recipe is
dangerous ifx is close to apositive multiple ofei becausesevere cancellation would
occur. However, theformula
V1 = X1 - II x 112 = x� - II x II�
X1 + II x 112
= -(x� + · · · + x;)
X1 + II x 112
suggestedby Parlett (1971) doesnot sufferfrom this defect intheX1 > 0 case.
In practice, it is handy to normalize the Householder vector so that v(l) = 1.
This permits the storage of v(2:m) where the zeros have been introduced in x, i.e.,
x(2:m). Werefertov(2:m) as the essential part oftheHouseholdervector. Recalling

that (3 = 2/vTv and letting length(:c) specify vector dimension, we may encapsulate
the overall process as follows:
Algorithm 5.1.1 (Householder Vector) Given x E Rm, this function computes v E Rm
with v(l) = 1 and (3 E JR such that P = Im - (3vvT is orthogonal and Px = II x ll2e1 .
function [v, (3] = house(x)
m = length(x), a = x(2:m)Tx(2:m), v = [ x(2
�m)
]
if a = 0 and x(l) >= 0
(3 = 0
elseif a = 0 & x(l) < 0
(3 = -2
else
end
µ = Jx(1)2 + a
if x(l) <= 0
v(l) = x(l) - µ
else
v(l) = -a/(x(l) + µ)
end
(3 = 2v(1)2/(a + v(1)2)
v = v/v(l)
Here, length(·) returns the dimension of a vector. This algorithm involves about 3m
flops. The computed Householder matrix that is orthogonal to machine precision, a
concept discussed below.
5.1.4 Applying Householder Matrices
It is critical to exploit structure when applying P = I - f:JvvT to a matrix A. Premul
tiplication involves a matrix-vector product and a rank-1 update:
PA = (I - (3vvT) A = A - ((3v)(vTA).
The same is true for post-multiplication,
In either case, the update requires 4mn flops if A E n:rnxn. Failure to recognize this and
to treat P as a general matrix increases work by an order of magnitude. Householder
updates never entail the explicit formation of the Householder matrix.
In a typical situation, house is applied to a subcolumn or subrow of a matrix and
(I - (3vvT) is applied to a submatrix. For example, if A E JRmxn, 1 � j < n, and
A(j:m, l:j - 1) is zero, then the sequence
[v, (3] = house(A(j:m,j))
A(j:m,j:n) = A(j:m,j:n) - ((3v)(vTA(j:m,j:n))
A(j + l:m,j) = v(2:m - j + 1)

applies Um-i+l - f3vvT) to A(j:m, l:n) and stores the essential part ofv where the
"new" zerosareintroduced.
5.1.5 Roundoff Properties
TheroundoffpropertiesassociatedwithHouseholdermatricesareveryfavorable. Wilkin
son (AEP, pp. 152-162) shows that house produces a Householder vector v that is
very close totheexact v. IfP = I - 2VvTjvTv then
II P - P 112 = O(u).
Moreover, the computedupdateswith P arecloseto the exact updateswith P :
fl(FA) = P(A + E),
fl(AF) = (A + E)P,
II E 112 = O(ull A 112),
II E 112 = O(ull A 112).
Foramoredetailedanalysis, seeHigham(ASNA, pp. 357-361).
5.1.6 The Factored-Form Representation
Many Householder-based factorization algorithms that are presented in the following
sectionscompute productsofHouseholdermatrices
wheren :::; m andeachv<i> hastheform
(j) -
[ 0 0 0 1 (j) (j) T
v - ' ' . . . ' Vj+l • .. . ' vm ] .
...___,_.......
j-1
(5.1.3)
It is usually not necessary to compute Q explicitly evenifit is involved insubsequent
calculations. Forexample,ifC E Rmxp andwewishtocomputeQTC ,thenwemerely
executethe loop
for j = l:n
C = QiC
end
Thestorageofthe Householder vectors v<1> · · · v<n) and thecorresponding/3i amounts
to afactored-form representation ofQ.
Toillustratetheeconomiesofthefactored-formrepresentation, supposewehave
an arrayA and thatforj = l:n, A(j + l:m,j) housesv<i>(j + l:m), the essentialpart
ofthejth Householder vector. The overwritingofC E nmxp with QTC can then be
implementedas follows:
for j =·l:n
v(j:m) = [ A(j +
l
l:m,j) ]
/3i = 2/(1 + 11 A(j + l:m,j) 11�
C(j:m, :) = C(j:m, :) - (f3rv(j:m)) · (v(j:m)TC(j:m, :))
end
(5.1.4)

This involves about pn{2m - n) flops. If Q is explicitly represented as an m-by-m
matrix, then QTC would involve 2m2p flops. The advantage of the factored form
representationisapparant ifn << m.
Ofcourse, in some applications, it isnecessary to explicitlyformQ (or parts of
it). Therearetwo possible algorithmsfor computing thematrixQ in (5.1.3):
Forward accumulation Backward accumulation
Q = Im Q = Im
for j = l:n for j = n: - 1:1
Q = Q Qj Q = QjQ
end end
Recall that the leading (j - 1)-by-(j - 1) portion of Qi is the identity. Thus, at
the beginningofbackward accumulation, Q is "mostly the identity" and it gradually
becomes full as the iteration progresses. This pattern can be exploited to reduce the
numberofrequiredflops. Incontrast, Q isfullinforwardaccumulation afterthefirst
step. For this reason, backward accumulation is cheaper and the strategy ofchoice.
HerearethedetailswiththeprovisothatweonlyneedQ(:, l:k) where 1 � k � m:
Q = Im{:, l:k)
for j = n: - 1:1
v{j:m) = [ A(j +
l
l:m,j) ]
f3j = 2/(1 + 11 A(j + l:m, j) II�
Q(j:m,j:k) = Q(j:m,j:k) - ({3jv(j:m))(v(j:m)TQ(j:m, j:k))
end
Thisinvolvesabout4mnk - 2{m + k)n2 + {4/3)n3 flops.
5.1.7 The WY Representation
(5.1.5)
SupposeQ = Qi · · · Qr isaproductofm-by-m Householdermatrices. SinceeachQi is
arank-1 modificationofthe identity, it followsfromthestructureoftheHouseholder
vectorsthat Q isarank-rmodificationoftheidentityandcanbe written intheform
(5.1.6)
where W and Y are m-by-r matrices. The key to computing the WY representation
(5.1.6) isthefollowinglemma.
Lemma 5.1.1. Suppose Q = Im - WYT is an m-by-m orthogonal matrix with
W,Y E Rmxj. If P = Im - {3vvT with v E Rm and z = {3Qv, then
Q+ = QP = Im - W+YJ
where W+ = [ W I z ] and Y+ = [ Y I v ] are each m-by-(j + 1).

5.1. Householder and Givens Transformations
Proof. Since
it followsfromthedefinitionofz that
239
Q+ = Im - WYT - zvT = Im - [ W l z ] [ Y l v f = Im - W+Y.r 0
Byrepeatedlyapplyingthelemma,wecantransitionfromafactored-formrepresenta
tion to ablock representation.
Algorithm 5.1.2 Suppose Q = Q1 · · · Qr wherethe Qj = Im - /3jv(i)v(j)T arestored
in factored form. This algorithm computes matrices W,Y E 1Rmxr such that Q =
Im - WYT.
Y = vCll; W = f31vC1)
for j = 2:r
end
z = /3j(/m - WYT)vUl
W = [W l z]
Y = [ Y I vUl ]
Thisalgorithminvolves about 2r2m-2r3/3flops ifthe zeros inthevUl areexploited.
Note that Y is merely the matrix ofHouseholder vectors and is therefore unit lower
triangular. Clearly, thecentraltaskinthegenerationoftheWYrepresentation (5.1.6)
isthecomputationofthe matrix W.
The block representation for products of Householder matrices is attractive in
situations where Q must be applied to amatrix. Suppose C E 1Rmxp. It follows that
theoperation
is rich in level-3 operations. On the other hand, if Q is in factored form, then the
formationofQTC isjust rich in thelevel-2operationsofmatrix-vectormultiplication
andouterproductupdates. Ofcourse, in thiscontext, the distinctionbetweenlevel-2
andlevel-3 diminishes as C getsnarrower.
WementionthattheWY representation (5.1.6) isnotageneralizedHouseholder
transformationfromthegeometricpoint ofview. Trueblockreflectorshavetheform
Q = I - 2VVT
where V E 1Rnxr satisfies vrv = Ir. See Schreiberand Parlett (1987).
5.1.8 Givens Rotations
Householder reflections are exceedingly useful for introducing zeros on agrand scale,
e.g., the annihilation ofall but the first component ofa vector. However, in calcula
tionswhereit isnecessarytozeroelementsmoreselectively, Givens rotations are the
transformation ofchoice. Thesearerank-2 corrections tothe identity oftheform

1
0
G(i, k, 0) =
0
0
0
c
- s
0
i
...
0 0
s ... 0 i
(5.1.7)
c 0 k
0 1
k
where c = cos(O) and s = sin(O) for some 0. Givensrotationsare clearly orthogonal.
Premultiplication by G(i, k, lJ)T amounts to a counterclockwise rotation of(} ra
diansin the (i, k} coordinateplane. Indeed, ifx E Rm and
then
y = G(i, k, O)Tx,
{CXi - SXk,
Yj = SXi + CXk,
Xj,
j = i,
j = k,
j f d, k.
From these formulae it is clear that wecan force Yk tobezerobysetting
Xi
c = '
Jx� + x%
(5.1.8}
Thus, it is a simple matter to zero a specified entry in a vector by using a Givens
rotation. Inpractice, therearebetterwaystocomputec and s than (5.1.8}, e.g.,
Algorithm 5.1.3 Given scalars a and b, this function computes c = cos(O) and
s = sin(0) so
function [c, s] = givens(a, b)
if b = 0
c = l; s = 0
else
if lbl > lal
T = -ajb; S
else
l/Vl + r2; C = ST
r = -b/a; c = 1/./1 + r2; s = er
end
end
Thisalgorithmrequires5 flopsandasinglesquareroot. Note that inverse trigonometric
functions are not involved.

5.1. Householder and Givens Transformations
5.1.9 Applying Givens Rotations
241
ItiscriticalthatthesimplestructureofaGivensrotationmatrixbeexploitedwhenit
isinvolvedinamatrixmultiplication. SupposeA E lllmxn, c = cos(O), and s = sin(O).
IfG(i, k, 0) E lllmxm, then the updateA = G(i, k, O)TA affectsjust tworows,
[ c s ]T
A([i, k], :) = A([i, k], :),
-s c
and involves6n flops:
for j = l:n
end
T1 = A(i, j)
T2 = A(k, j)
A(i,j) = CT1 - ST2
A(k,j) = ST1 + CT2
Likewise,ifG(i, k, 0) E lllnxn, thentheupdateA = AG(i, k, 0) affectsjusttwocolumns,
A(:, [i, k]) = A(:, [i, k]) ,
[ c s ]
- s c
and involves6m flops:
for j = l:m
end
5.1.10
T1 = A(j, i)
T2 = A(j, k)
A(j, 'i) = CT1 - ST2
A(j, k) = ST1 + CT2
Roundoff Properties
ThenumericalpropertiesofGivensrotationsareas favorableas thoseforHouseholder
reflections. Inparticular, it canbeshownthat thecomputed c and s ingivens satisfy
c
s
c(l + Ee),
s(l + Es),
O(u),
O(u).
Ifc ands aresubsequentlyusedina Givens update, thenthecomputedupdateisthe
exact updateofanearbymatrix:
fl[G(i, k, OfA]
fl[AG(i, k, O)]
G(i, k, O)r(A + E),
(A + E)G(i, k, 0),
II E 112 � ull A 112,
II E 112 � ull A 112-
Detailederror analysis ofGivens rotationsmaybe found inWilkinson (AEP, pp. 131-
39), Higham(ASNA,pp. 366-368), andBindel, Demmel, Kahan, andMarques (2002).

5.1.11 Representing Products of Givens Rotations
Suppose Q = G1 · · · Gt is a product of Givens rotations. As with Householder re
flections, it is sometimes more economical to keep Q in factored form rather than to
computeexplicitly the product ofthe rotations. Stewart (1976) has shown how todo
this in avery compact way. The ideais to associate asingle floating point number p
witheachrotation. Specifically, if
z = [ _: : ] '
then we define the scalarpby
if c= 0
p = 1
elseif Isl < lei
p = sign(c)· s/2
else
p = 2· sign(s)/c
end
c2 + s2 = 1,
(5.1.9)
Essentially, thisamounts to storing s/2 ifthesineis smaller and 2/c ifthecosine is
smaller. With thisencoding, it is possible toreconstruct Z (or -Z) as follows:
if p= 1
c = O; s = 1
elseif IPI < 1
s = 2p; c = v'l - s2
else
c = 2/p; s = v'l - c2
end
(5.1.10)
Note that the reconstruction of -Z is not a problem, for ifZ introduces a strategic
zerothensodoes - Z. Thereasonforessentiallystoringthesmallerofc and s isthat
theformulav'l - x2 renderspoorresultsifx isnearunity. Moredetailsmaybefound
in Stewart (1976). Ofcourse, to "reconstruct" G(i, k, 0) we need i and k in addition
totheassociatedp. Thisposesnodifficultyifweagreetostorepinthe (i, k) entryof
somearray.
5.1.12 Error Propagation
An m-by-m floating point matrix Q is orthogonal to working precision ifthere exists
anorthogonal Q E Rmxm suchthat
A corollaryofthisisthat
11 Q - Q II = O(u).

Thematricesdefinedbythe floating point output of house and givens areorthogonal
toworkingprecision.
In many applications, sequences ofHouseholders and/or Given transformations
aregenerated and applied. In thesesettings, the roundingerrors are nicely bounded.
Tobeprecise, suppose A = Ao E 1Rmxn isgivenandthat matricesA1, . . . , AP = B are
generated viatheformula
k = l:p .
AssumethattheaboveHouseholderand Givens algorithmsareusedfor both thegen
eration and application ofthe Qk and Zk. Let Qk and Zk betheorthogonal matrices
that would beproduced intheabsenceofroundoff. It canbeshownthat
(5.1.11)
where II E 112 :::; c · ull A 112 and c is a constant that depends mildly on n, m, and
p. In other words, B is an exact orthogonal update of a matrix near to A. For a
comprehensive error analysis of Householder and Givens computations, see Higham
(ASNA, §19.3, §19.6).
5.1.13 The Complex Case
Most ofthe algorithms that we present in this book have complex versions that are
fairly straightforward to derive from their real counterparts. (This is not to say that
everything is easy and obvious at the implementation level.) As an illustration we
brieflydiscusscomplexHouseholderandcomplexGivenstransformations.
RecallthatifA = (aij) E <Cmxn, thenB = AH E <Cnxm isitsconjugatetranspose.
The2-normofa vector x E <Cn isdefinedby
andQ E <Cnxn is unitary ifQHQ = In· Unitary matrices preservethe 2-norm.
AcomplexHouseholdertransformationis aunitary matrixoftheform
where/3 = 2/vHv. Given anonzerovector x E <C"', it iseasytodetermine v so that
ify = Px, theny(2:m) = 0. Indeed, if
wherer, () E 1R and
then Px = =fei011 x ll2e1. The sign can be determined to maximize 11 v 112 for the sake
ofstability.
Regardingcomplex Givens rotations, it is easy to verify that a 2-by-2 matrix of
theform
Q = .
[ cos(O)
-sin(O)e-•<f>
sin(O)ei<f> l
cos(O)

where 9, ¢ER is unitary. Weshow howto compute c = cos(fJ) and s = sin(fJ)eitf> so
that
(5.1.12)
whereu = u1 +iu2 andv = v1 +iv2 aregivencomplexnumbers. First,givens isapplied
tocomputerealcosine-sinepairs {c0, s0}, {c.a, s.a}, and {co, so} sothat
and
[-� � n:: l [r
; l·
r_:; : n: i r� l·
r_:: :: n:: i r� i
Note that u = r.,e-ia: andv = rve-if3 . If weset
eitf> = ei(f3-a:) (c c + s s ) + •(c s c s )
= -a: {J a: {J
• o. {3 - f3 a: ,
which confirms (5.1.12).
Problems
PS.1.1 Let x and y be nonzero vectors in Rm. Give an algorithm for determining a Householder
matrix P such that Px is a multiple of y.
PS.1.2 Use Householder matrices to show that det(I + xyT) = 1 + xTy where x and y are given
m-vectors.
PS.1.3 (a) Assume that x, y E R2 have unit 2-norm. Give an algorithm that computes a Givens
rotation Q so that y = QTx. Make effective use of givens. (b) Suppose x and y arc unit vectors in Rm.
Give an algorithm using Givens transformations which computes an orthogonal Q such that QTx = y.
PS.1.4 By generalizing the ideas in §5.1.11, develop a compact representation scheme for complex
givens rotations.
PS.1.5 Suppose that Q = I-YTYT is orthogonal where Y E nmxj and T E Jl! xj is upper triangular.
Show that if Q+ = QP where P = I - 2vvT/vTv is a Householder matrix, then Q+ can be expressed
in the form Q+ = I - Y+T+YJ where Y+ E Rmx (j+I) and T+ E R(j+l)x (j+I) is upper triangular.
This is the main idea behind the compact WY representation. See Schreiber and Van Loan (1989).
PS.1.6 Suppose Qi = Im - Y1T1Y1 and Q2 = Im - Y2T2Yl are orthogonal where Y1 E R""xri,
Y2 E Rmxr2, T1 E Wt xr1, and T2 E w2 xr2. Assume that T1 and T2 arc upper triangular. Show how
to compute Y E R""xr and upper triangular T E wxr with r = r1 + r2 so that Q2Q1 = Im - YTYT.
PS.1.7 Give a detailed implementation of Algorithm 5.1.2 with the assumption that v<il(j + l:m),
the essential part of the jth Householder vector, is stored in A(j + l:m, j). Since Y is effectively
represented in A, your procedure need only set up the W matrix.

P5.l.8 Show that if S is skew-symmetric (ST = -S), then Q = (I + S)(I - S)-1 is orthogonal. (The
matrix Q is called the Cayley transform of S.) Construct a rank-2 S so that if x is a vector, then Qx
is zero except in the first component.
PS.1.9 Suppose P E F'xm satisfies II pTP - Im 112 = E < 1. Show that all the singular values of P
are in the interval [1 - E, 1 + e] and that II p - uvT 112 :5 E where p = UEVT is the SVD of P.
PS.1.10 Suppose A E R.2x2. Under what conditions is the closest rotation to A closer than the closest
reflection to A? Work with the Frobenius norm.
PS.1.11 How could Algorithm 5.1.3 be modified to ensure r � O?
PS.1.12 (Fast Givens Transformations) Suppose
[ X J
] D = [ d1
x = and
x2 0
with d1 and d2 positive. Show how to compute
M1 = [ !31
:1 ]
1
0
]
d2
so that if y = Mix and D = M'[D!vli , then Y2 = 0 and D is diagonal. Repeat with !v/1 replaced by
M2 = [ 1 a2
]
!32 1
.
(b) Show that either 11 M'[D!v/1 112 :5 211 D 112 or 11 M:{D!v/2 112 :5 211 D 112. (c) Suppose x E Rm and
that D E irixn is diagonal with positive diagonal entries. Given indices i and j with 1 :5 i < j :5 m,
show how to compute !vI E Rnxn so that if y = !vlx and D = !vJTDM, then Yi = 0 and D is diagonal
with 11 D 112 :5 211 D 112. (d) From part (c) conclude that Q = D112Mb-1/2 is orthogonal and that
the update y = Mx can be diagonally transformed to (D112y) = Q(D112x).
Householder matrices are named after A.S. Householder, who popularized their use in numerical
analysis. However, the properties of these matrices have been known for quite some time, see:
H.W. Turnbull and A.C. Aitken (1961). An Introduction to the Theory of Canonical Matrices, Dover
Publications, New York, 102-105.
Other references concerned with Householder transformations include:
A.R. Gourlay (1970). "Generalization of Elementary Hermitian Matrices," Comput. J. 13, 411-412.
B.N. Parlett (1971). "Analysis of Algorithms for Reflections in Bisectors," SIAM Review 13, 197-208.
N.K. Tsao (1975). "A Note on Implementing the Householder Transformations." SIAM J. Numer.
Anal. 12, 53-58.
B. Danloy (1976). "On the Choice of Signs for Householder Matrices," J. Comput. Appl. Math. 2,
67-69.
J.J.M. Cuppen (1984). "On Updating Triangular Products of Householder Matrices," Numer. Math.
45, 403-410.
A.A. Dubrulle (2000). "Householder Transformations Revisited," SIAM J. Matrix Anal. Applic. 22,
33-40.
J.W. Demmel, M. Hoemmen, Y. Hida, and E.J. Riedy (2009). "Nonnegative Diagonals and High
Performance On Low-Profile Matrices from Householder QR," SIAM J. Sci. Comput. 31, 2832-
2841.
A detailed error analysis of Householder transformations is given in Lawson and Hanson (SLE, pp.
83-89). The basic references for block Householder representations and the associated computations
include:
C.H. Bischof and C. Van Loan (1987). "The WY Representation for Products of Householder Matri
ces," SIAM J. Sci. Stat. Comput. 8, s2-·s13.

B.N. Parlett and R. Schreiber (1988). "Block Reflectors: Theory and Computation," SIAM J. Numer.
Anal. 25, 189-205.
R.S. Schreiber and C. Van Loan (1989). "A Storage-Efficient WY Representation for Products of
Householder Transformations," SIAM J. Sci. Stat. Comput. 10, 52-57.
C. Puglisi (1992). "Modification of the Householder Method Based on the Compact WY Representa
tion," SIAM J. Sci. Stat. Comput. 13, 723-726.
X. Sun and C.H. Bischof (1995). "A Basis-Kernel Representation of Orthogonal Matrices," SIAM J.
Matrix Anal. Applic. 1 6, 1184-1196.
T. Joffrain, T.M. Low, E.S. Quintana-Orti, R. van de Geijn, and F.G. Van Zee (2006). "Accumulating
Householder Transformations, Revisited," A CM TI-ans. Math. Softw. 32, 169-179.
M. Sadkane and A. Salam (2009). "A Note on Symplectic Block Reflectors," ETNA 33, 45-52.
Givens rotations are named after Wallace Givens. There are some subtleties associated with their
computation and representation:
G.W. Stewart (1976). "The Economical Storage of Plane Rotations," Numer. Math. 25, 137-138.
D. Bindel, J. Demmel, W. Kahan, and 0. Marques (2002). "On computing givens rotations reliably
and efficiently," ACM TI-ans. Math. Softw. 28, 206-238.
It is possible to aggregate rotation transformations to achieve high performance, see:
B. Lang (1998). "Using Level 3 BLAS in Rotation-·Based Algorithms," SIAM J. Sci. Comput. 19,
626--634.
Fast Givens transformations (see P5.l.11) are also referred to as square-root-free Givens transfor
mations. (Recall that a square root must ordinarily be computed during the formation of Givens
transformation.) There are several ways fast Givens calculations can be arranged, see:
M. Gentleman (1973). "Least Squares Computations by Givens Transformations without Square
Roots," J. Inst. Math. Appl. 12, 329-336.
C.F. Van Loan (1973). "Generalized Singular Values With Algorithms and Applications," PhD Thesis,
University of Michigan, Ann Arbor.
S. Hammarling (1974). "A Note on Modifications to the Givens Plane Rotation," J. Inst. Math.
Applic. 13, 215-218.
J.H. Wilkinson (1977). "Some Recent Advances in Numerical Linear Algebra," in The State of the
Art in Numerical Analysis, D.A.H. Jacobs (ed.), Academic Press, New York, 1-53.
A.A. Anda and H. Park (1994). "Fast Plane Rotations with Dynamic Scaling," SIAM J. Matrix Anal.
Applic. 15, 162-174.
R.J. Hanson and T. Hopkins (2004). "Algorithm 830: Another Visit with Standard and Modified
Givens Transformations and a Remark on Algorithm 539," ACM TI-ans. Math. Softw. 20, 86-94.
5.2 The Q R Factorization
A rectangularmatrixA ERmxn canbefactoredintoaproductofanorthogonalmatrix
Q ERmxm and an upper triangular matrix RERmxn:
A = QR.
This factorization is referred to as the QR factorization and it has a central role to
playinthelinearleastsquaresproblem. Inthissectionwegivemethodsforcomputing
QR based on Householder, block Householder, and Givens transformations. The QR
factorizationisrelatedtothewell-knownGram-Schmidt process.
5.2.1 Existence and Properties
Westartwithaconstructiveproofofthe QR factorization.
Theorem 5.2.1 (QR Factorization). If A ERmxn, then there exists an orthogonal
Q ERmxm and an upper triangular RE Rmxn so that A = QR.

5.2. The QR Factorization 247
Proof. We use induction. Suppose n = 1 and that Q is a Householder matrix so that
if R = QTA, then R(2:m) = 0. It follows that A = QR is a QR factorization of A. For
general n we partition A,
A = [ Ai I v ],
where v = A(:, n). By induction, there exists an orthogonal Qi E Rmxm so that
Ri = QfAi is upper triangular. Set w = QTv and let w(n:m) = Q2R2 be the QR
factorization of w(n:m). If
then
is a QR factorization of A. D
w(l:n - 1) ]
R2
The columns of Q have an important connection to the range of A and its orthogonal
complement.
Theorem 5.2.2. If A = QR is a QR factorization of a full column rank A E Rmxn
and
A = [ ai I · · · I an ] ,
Q = [ qi I · · · I Qm l
are column partitionings, then for k = l:n
(5.2.1)
and rkk =fa 0. Moreover, if Qi = Q(l:m, l:n), Q2 = Q(l:m, n + l:m), and Ri =
R(l:n, l:n), then
and
ran(A)
ran(A).L
= ran(Qi),
= ran(Q2),
Proof. Comparing the kth columns in A = QR we conclude that
and so
k
ak = L rikQi E span{Qi, . . . , Qk},
i=i
span{a1, . . . , ak} � span{q1 , . . . , qk}·
(5.2.2)
(5.2.3)
If rkk = 0, then ai, . . . , ak are dependent. Thus, R cannot have a zero on its diagonal
and so span{a1, . . . , ak} has dimension k. Coupled with (5.2.3) this establishes (5.2.1).
To prove (5.2.2) we note that

ThematricesQ1 = Q(l:m, l:n) andQ2 = Q(l:m, n+l:m) canbeeasilycomputedfrom
afactored form representationofQ. Werefer to (5.2.2) as the thin QR factorization.
Thenext result addresses its uniqueness.
Theorem 5.2.3 (Thin QR Factorization). Suppose A E IR.mxn has full column
rank. The thin QR factorization
A = QiR1
is unique where Q1 E lRmxn has orthonormal columns and Ri is upper triangular with
positive diagonal entries. Moreover, R1 = GT where G is the lower triangular Cholesky
factor ofATA.
Proof. Since ATA = (Q1R1)T(Q1R1) = RfR1 we see that G = Rf is the Cholesky
factor ofATA. This factor is unique by Theorem4.2.7. Since Qi = AR!i it follows
that Qi isalsounique. D
How are Qi and Ri affected by perturbations in A? To answer this question
we need to extend the notion of 2-norm condition to rectangular matrices. Recall
from §2.6.2 that for square matrices, 11;2(A) is the ratio ofthe largest to the smallest
singularvalue. ForrectangularmatricesA withfullcolumnrankwecontinuewiththis
definition:
(A) _ O'max(A)
K;2 - •
O'min(A)
(5.2.4)
Ifthecolumns ofA are nearly dependent, thenthisquotient is large. Stewart (1993)
hasshownthat0(€) relativeerrorinA induces 0(€·K2(A) ) errorin Qi andRi.
5.2.2 Householder QR
We begin with aQRfactorization method that utilizes Householder transformations.
The essenceofthe algorithm can be conveyed by a small example. Suppose m = 6,
n = 5, andassumethatHouseholdermatricesH1 and H2 havebeencomputedsothat
x x x x x
0 x x x x
H2H1A 0 0 x x x
0 0 x x x
0 0 x x x
0 0 x x x
Concentratingonthehighlightedentries,wedetermineaHouseholdermatrixH3 E lR4x4
such that

IfH3 = diag(hFh), then
x x x x x
0 x x x x
H3H2H1A 0 0 x x x
=
0 0 0 x x
0 0 0 x x
0 0 0 x x
After n such steps we obtain an upper triangular HnHn-1 · · · H1A = R and so by
settingQ = H1 · · · Hn weobtain A = QR.
Algorithm 5.2.1 (Householder QR) Given A E 1Rmxn with m ;:::: n, the following
algorithm finds Householder matrices H1 , . . . , Hn such that if Q = H1 · · · Hn , then
QTA = R is upper triangular. The upper triangular part ofA is overwritten by the
uppertriangularpartofRandcomponentsj + 1:m ofthejthHouseholdervectorare
storedinA(j + l:m,j),j < m.
for j = I:n
end
[v, ,B] = house(A(j:m,j))
A(j:m,j:n) = (I - ,BvvT)A(j:m,j:n)
if j < m
A(j + l:m,j) = v(2:m - j + I)
end
Thisalgorithm requires 2n2(m - n/3) flops.
ToclarifyhowA isoverwritten, if
(j) - [ 0 0 1 (j) (j) jT
V - , . . . , , ,Vj+l' . . . , Vm
�
j-1
isthejthHouseholdervector, then upon completion
r11 r12 r13 T14 r15
(1)
V2 r22 r23 r24 r25
(1 ) 1/2) T33 T34 T35
A
V3 3
(1) (2) v(3)
V4 V4 4 T44 T45
(I)
V5 (2)
V5 (3)
V5
(4)
V5 T55
(1)
V5 (2)
V5 (3)
V5 (4)
V5 (5)
V5
IfthematrixQ = H1 · · · Hn isrequired, thenitcanbeaccumulatedusing (5.1.5). This
accumulation requires 4(m2n - mn2 + n3/3) flops. Note that the ,B-values that arise
in Algorithm 5.2.1 can be retrievedfromthestored Householder vectors:
2
,B
j =
1 + II A(j + l:m,j) 11
2 •

Wementionthat the computed upper triangularmatrixR istheexact R foranearby
A in the sense that zr (A + E) = R where Z is some exact orthogonal matrix and
II E 112 � ull A 112-
5.2.3 Block Householder QR Factorization
Algorithm 5.2.l is rich in the level-2 operations of matrix-vector multiplication and
outer product updates. By reorganizing the computation and using the WY repre
sentation discussed in §5.1.7 we can obtain a level-3 procedure. The idea is to apply
theunderlying Householder transformationsinclustersofsize r. Supposen = 12 and
r = 3. ThefirststepistogenerateHouseholdersH1, H2, andH3 as inAlgorithm5.2.1.
However, unlikeAlgorithm 5.2.l where each Hi isapplied across the entire remaining
submatrix, we apply only H1, H2, and H3 to A(:, 1:3). After this is accomplishedwe
generate the block representation H1H2H3 = I - H'1 Yt and then perform the level-3
update
A(:, 4:12) = (J - WYT)A(:, 4:12).
Next, wegenerate H4, H5, and H6 as in Algorithm 5.2.1. However, these transforma
tionsarenotappliedtoA(:, 7:12) untiltheirblockrepresentationH4H5H6 = I-W2Y{
isfound. Thisillustratesthegeneralpattern.
Algorithm 5.2.2 (Block Householder QR) If A E 1Rmxn and r is a positive inte
ger, then the following algorithm computes an orthogonal QE 1Rmxm and an upper
triangularR E 1Rmxn sothat A = QR.
Q = Im; A = l; k = 0
while A � n
end
T �min(A + r - 1, n); k = k+ 1
UseAlgorithm 5.2.1, to upper triangularize A(A:m, A:T),
generatingHouseholdermatrices H>.., . . . , Hr.
UseAlgorithm 5.1.2 toget the block representation
I - WkYk = H>.. · · · Hr.
A(A:rn, T + l:n) = (I - WkY{)TA(A:rn, T + l:n)
Q(:,A:m) = Q(:,A:m)(J - WkY{)
A = T + 1
Thezero-nonzerostructureofthe Householder vectors that define H>., . . . , Hr implies
that the first A - 1 rows of Wk and Yk are zero. This fact would be exploited in a
practicalimplementation.
The properwayto regard Algorithm 5.2.2 isthroughthepartitioning
N = ceil(n/r)
where block column Ak is processed during the kth step. In the kth step of the
reduction, a block Householder is formed that zeros the subdiagonal portion ofAk·
Theremaining block columnsare then updated.

The roundoffpropertiesofAlgorithm 5.2.2 areessentiallythesame as thosefor
Algorithm 5.2.1. There is a slight increase in the number of flops required because
ofthe W-matrix computations. However, as a result ofthe blocking, all but a small
fraction ofthe flops occur in the context ofmatrix multiplication. In particular, the
level-3 fractionofAlgorithm5.2.2 isapproximately 1- 0(1/N). SeeBischofandVan
Loan (1987) forfurtherdetails.
5.2.4 Block Recursive QR
A more flexible approach to blocking involves recursion. Suppose A E Rmxn and as
sumeforclaritythat A has fullcolumnrank. Partitionthe thin QR factorizationofA
as follows:
where n1 = floor(n/2), n2 = n - ni, A1, Q1 E Rmxni and A2, Q2 E Rmxn2• From
the equations Q1R11 = Ai, Ri2 = Q[A2, and Q2R22 = A2 - Q1R12 we obtain the
followingrecursiveprocedure:
Algorithm 5.2.3 (Recursive Block QR) Suppose A E Rmxn has full column rank
andnb isapositiveblockingparameter. The followingalgorithmcomputesQ E Rmxn
with orthonormalcolumnsand upper triangularR E Rnxn suchthat A = QR.
function [Q, R] = BlockQR(A, n, nb)
if n ::; nb
else
end
end
UseAlgorithm5.2.1 to computethe thin QR factorization A = QR.
n1 = floor(n/2)
[Q1 , Rn] = BlockQR(A(:, l:n1), ni,nb)
Ri2 = Q[A(:, ni + l:n)
A(:, n1 + l:n) = A(:, ni + l:n) - QiR12
[Q2 , R22] = BlockQR(A(:, n1 + l:n), n - n1, nb)
Q = [ Q1 I Q2 ], R =
[R�i
Thisdivide-and-conquerapproachisrichinmatrix-matrixmultiplicationandprovides
aframeworkfortheeffectiveparallelcomputationoftheQR factorization. SeeElmroth
andGustavson (2001). KeyimplementationideasconcerntherepresentationoftheQ
matricesandtheincorporationofthe§5.2.3blockingstrategies.

5.2.5 Givens QR Methods
GivensrotationscanalsobeusedtocomputetheQRfactorizationandthe4-by-3case
illustrates the general idea:
Wehighlightedthe2-vectorsthatdefinetheunderlyingGivensrotations. IfGj denotes
the jth Givens rotation in the reduction, then QTA = R is upper triangular, where
Q = G1 · · · Gt and tisthetotalnumberofrotations. Forgeneralm andn wehave:
Algorithm 5.2.4 (Givens QR) Given A E nrxn with m :'.:'. n, thefollowingalgorithm
overwritesA with QTA = R, whereR isuppertriangularandQ is orthogonal.
for j= l:n
end
for i = m: - l:j+ 1
end
[c, s] = givens(A(i - l,j), A(i,j))
A(i - l:i,j:n) =
[ c 8 ]T
A(i - l:i, j:n)
-s c
This algorithm requires 3n2(m - n/3) flops. Note that we could use the represen
tation ideas from §5.1.11 to encode the Givens transformations that arise during the
calculation. Entry A(i, j) canbeoverwritten with theassociated representation.
With the Givens approach to the QR factorization, there is flexibility in terms
oftherowsthat areinvolved ineach update andalsotheorderinwhichthezerosare
introduced. Forexample, wecanreplacetheinnerloop body inAlgorithm 5.2.4 with
[c, s] = givens(A(j,j), A(i,j))
A([j i ],j:n) = [ c 8
]TA([ j i ],j:n)
-s c
and still emerge with the QR factorization. It is also possible to introduce zeros by
row. Whereas Algorithm 5.2.4 introduceszerosbycolumn,
theimplementation
[; : :l
2 5 x
'
1 4 6

5.2. The QR Factorization
for i = 2:m
end
for j = l:i - 1
end
[c, s] = givens(A(j, j), A(i, j))
A([j i], j:n) = [ -� � rA( [j i] , j:n )
introduceszerosby row, e.g.,
5.2.6 Hessenberg QR via Givens
253
Asanexampleofhow Givensrotationscanbe usedin astructuredproblem,weshow
how they can be employed to compute the QR factorization ofan upper Hessenberg
matrix. (Other structured QRfactorizations arediscussed in Chapter 6 and §11.1.8.)
Asmallexample illustrates the general idea. Supposen = 6 and that after twosteps
wehavecomputed
x x x x x x
0 x x x x x
G(2, 3,02fG(l, 2, 01)TA
0 0 x x x x
0 0 x x x x
0 0 0 x x x
0 0 0 0 x x
Next, we compute G(3, 4, 03) tozerothe current (4,3) entry, therebyobtaining
G(3, 4, 83fG(2, 3, ll2fG(l, 2, ll1fA
x
0
0
0
0
0
x
x
0
0
0
0
Continuingin thiswayweobtain the followingalgorithm.
x� x x x
x x x x
x x x x
0 x x x
0 x x x
0 0 x x
Algorithm 5.2.5 (Hessenberg QR) IfA E 1Rnxn is upper Hessenberg, then the fol
lowing algorithm overwrites A with QTA = R where Q is orthogonal and R is upper
triangular. Q = G1 · · · Gn- J is a product ofGivens rotations where Gj has the form
Gi = G(j, j + l, Oi)·

for j = l:n - 1
end
[ c, s ] = givens(A(j, j), A(j + l, j))
A(j:j + 1,j:n) = [ c s
]
T
A(j:j + l,j:n)
-s c
This algorithm requires about 3n2 fl.ops.
5.2.7 Classical Gram-Schmidt Algorithm
We now discuss two alternative methods that can be used to compute the thin QR
factorization A = Q1R1 directly. Ifrank(A) = n, thenequation (5.2.3) canbe solved
forQk:
Thus, wecanthinkofQk as aunit 2-norm vector inthedirectionof
k-1
Zk = ak - LTikQi
i=l
wheretoensureZk E span{q1, . . . ,Qk-d..L wechoose
i = l:k- 1.
Thisleadstothe classical Gram-Schmidt (CGS) algorithmforcomputingA = Q1R1•
R(l, 1) = II A(:, 1) 112
Q(:, 1) = A(:, 1)/R(l, 1)
for k = 2:n
end
R(l:k - 1, k) = Q(l:rn, l:k - l)TA(l:rn, k)
z = A(l:rn, k) - Q(l:m, l:k - l) · R(l:k - 1, k)
R(k, k) = 11 z 112
Q(l:m, k) = z/R(k, k)
InthekthstepofCGS,thekthcolumnsofbothQ andR aregenerated.
5.2.8 Modified Gram-Schmidt Algorithm
Unfortunately, the CGS method has very poor numerical properties in that there is
typically a severe loss of orthogonality among the computed Qi· Interestingly, a re
arrangement ofthe calculation, known as modified Gram-Schmidt (MGS), leads to a
morereliableprocedure. Inthe kthstepofMGS, the kthcolumnofQ (denotedbyQk)

and thekth row ofR (denoted by rf) are determined. To derive the MGS method,
define the matrix A(k) E Rmx(n-k+l) by
k-i
[ O I A<kl ] = A - Lqir[ =
i=i
Itfollowsthatif A(k) = [ z I B I
n-k
then Tkk = II z 112, Qk = z/rkk, and h,k+l , . . . , Tkn] = qfB. Wethen compute the
outer product A(k+l) = B - Qk [rk,k+i · · · Tkn] and proceed to the next step. This
completelydescribesthe kthstepofMGS.
Algorithm 5.2.6 (Modified Gram-Schmidt) Given A E Rmxn with rank(A) = n, the
followingalgorithmcomputesthethin QR factorizationA = QiRi where Qi E Rmxn
hasorthonormalcolumnsand Ri E Rnxn is upper triangular.
for k = l:n
end
R(k,k) = II A(l:m, k) 112
Q(l:m, k) = A(l:m, k)/R(k,k)
for j = k+ l:n
R(k,j) = Q(l:m, k)TA(l:m, j)
A(l:m, j) = A(l:m, j) - Q(l:m, k)R(k,j)
end
Thisalgorithmrequires 2mn2 flops. ItisnotpossibletooverwriteA withbothQi 'and
Ri. Typically, the MGS computation is arranged so that A isoverwritten by Qi and
thematrixR1 isstored inaseparatearray.
5.2.9 Work and Accuracy
Ifoneisinterestedincomputinganorthonormalbasisforran(A), thentheHouseholder
approachrequires 2mn2 - 2n3/3 flops toget Q in factored form and another 2mn2 -
2n3/3flopstogetthefirstn columnsofQ. (Thisrequires "payingattention" tojustthe
first n columnsofQ in (5.1.5).) Therefore, for theproblemoffindinganorthonormal
basis for ran(A), MGS is about twice as efficient as Householder orthogonalization.
However, Bjorck (1967) hasshownthat MGS producesacomputed Qi = [ Qi I · · · I <ln ]
that satisfies
whereasthecorrespondingresult for theHouseholderapproachisoftheform
Thus, iforthonormalityiscritical, then MGS shouldbeused tocomputeorthonormal
basesonlywhenthevectorstobeorthogonalizedarefairlyindependent.

WealsomentionthatthecomputedtriangularfactorR producedbyMGSsatisfies
II A - QR II :::::: u ll A II and that there exists a Q with perfectly orthonormal columns
suchthat II A - QR II :::::: u ll A II· SeeHigham (ASNA, p. 379) andadditionalreferences
givenat the end ofthis section.
5.2.10 A Note on Complex Householder QR
Complex Householder transformations (§5.1.13) canbeusedtocompute the QRfac
torizationofacomplexmatrix A E <Cm xn.
AnalogoustoAlgorithm5.2.1 wehave
for j = l:n
ComputeaHouseholdermatrixQ; sothat Q;A isuppertriangular
throughitsfirstj columns.
A = Q;A
end
Upontermination, A has been reduced to an upper triangularmatrix R E <Cm x n
and
we have A = QR where Q = Q1 • • • Qn is unitary. The reduction requires about four
times the numberofflops as the realcase.
Problems
P5.2.1 Adapt the Householder QR algorithm so that it can efficiently handle the case when A E Rmxn
has lower bandwidth p and upper bandwidth q.
PS.2.2 Suppose A E Rnxn and let £ be the exchange permutation £n obtained by reversing the order
of the rows in In. (a) Show that if R E Rnxn is upper triangular, then L = £R£ is lower triangular.
(b) Show how to compute an orthogonal Q E Rnxn and a lower triangular L E nnxn so that A = QL
assuming the availability of a procedure for computing the QR factorization.
PS.2.3 Adapt the Givens QR factorization algorithm so that the zeros are introduced by diagonal.
That is, the entries are zeroed in the order {m, 1), (m - 1, 1), {m, 2), (m - 2, 1), (m - 1, 2}, {m, 3) ,
etc.
PS.2.4 Adapt the Givens QR factorization algorithm so that it efficiently handles the case when A is
n-by-n and tridiagonal. Assume that the subdiagonal, diagonal, and superdiagona.l of A are stored in
e{l:n - 1), a(l:n), /{l:n - 1}, respectively. Design your algorithm so that these vectors are overwritten
by the nonzero portion of T.
PS.2.5 Suppose L E Rmxn with m ;:: n is lower triangular. Show how Householder matrices
Hi . . . . , Hn can be used to determine a lower triangular L1 E Rnxn so that
Hn · · · H1L = [ �1
] .
Hint: The second step in the 6-by-3 case involves finding H2 so that
with the property that rows 1 and 3 are left alone.
PS.2.6 Suppose A E Rnxn and D = diag(di , . . . , d,.) E Rnx". Show how to construct an orthogonal
Q such that
is upper triangular. Do not worry about efficiency-this is just an exercise in QR manipulation.

5.2. The QR Factorization
P5.2.7 Show how to compute the QR factorization of the product
A = Av · · · A2Ai
257
without explicitly multiplying the matrices Ai , . . . , Ap together. Assume that each A; is square. Hint:
In the p = 3 case, write
QfA = QfAaQ2QfA2QiQfAi
and determine orthogonal Q; so that Qf(A;Q;-I } is upper triangular. (Qo = I. )
P5.2.8 MGS applied to A E Rmxn is numerically equivalent to the first step in Householder QR
applied to
A = [ � ]
where On is the n-by-n zero matrix. Verify that this statement is true after the first step of each
method is completed.
P5.2.9 Reverse the loop orders in Algorithm 5.2.6 (MGS) so that R is computed column by column.
P5.2.10 How many flops are required by the complex QR factorization procedure outlined in §5.10?
P5.2.ll Develop a complex version of the Givens QR factorization in which the diagonal of R is
nonnegative. See §5.1.13.
PS.2.12 Show that if A E Rnxn and a; = A(:, i), then
ldet(A)I � II ai 1'2 . . · II an 112·
Hint: Use the QR factorization.
P5.2.13 Suppose A E Rmxn with m 2': n. Construct an orthogonal Q E R(m+n)x(m+n) with the
property that Q(l:m, l:n) is a scalar multiple of A. Hint. If a E R is chosen properly, then I - a2ATA
has a Cholesky factorization.
P5.2.14 Suppose A E Rmxn. Analogous to Algorithm 5.2.4, show how fast Givens transformations
(P5.1.12) can be used to compute 1'.1 E Rmxm and a diagonal D E R?"xm with positive diagonal
entries so that MTA = S is upper triangular and MJv/T = D. Relate M and S to A's QR factors.
PS.2.15 (Parallel Givens QR) Suppose A E R9X3 and that we organize a Givens QR so that the
subdiagonal entries arc zeroed over the course of ten "time steps" as follows:
Step Entries Zeroed
T = l (9,1)
T = 2 (8,1)
T = 3 (7,1) (9,2)
T = 4 (6,1) (8,2)
T = 5 (5,1) (7,2) (9,3)
T = 6 (4,1) (6,2) (8,3)
T = 7 (3,1) (5,2) (7,3)
T = 8 (2,1) (4,2) (6,3)
T = 9 (3,2) (5,3)
T = 10 (4,3)
Assume that a rotation in plane (i - 1, i) is used to zero a matrix entry (i, j). It follows that the
rotations associated with any given time step involve disjoint pairs of rows and may therefore be
computed in parallel. For example, during time step T = 6, there is a (3,4), (5,6), and (7,8) rotation.
Three separate processors could oversee the three updates. Extrapolate from this example to the
m-by-n case and show how the QR factorization could be computed in O(m + n) time steps. How
many of those time steps would involve n "nonoverlapping" rotations?
The idea of using Householder transformations to solve the least squares problem was proposed in:

A.S. Householder (1958). "Unitary Triangularization of a Nonsymmetric Matrix," J. ACM 5, 339-342.
The practical details were worked out in:
P. Businger and G.H. Golub (1965). "Linear Least Squares Solutions by Householder Transforma
tions," Nu.mer. Math. 7, 269-276.
G.H. Golub (1965). "Numerical Methods for Solving Linear Least Squares Problems," Nu.mer. Math.
7, 206-216.
The basic references for Givens QR include:
W. Givens (1958). "Computation of Plane Unitary Rotations Transforming a General Matrix to
Triangular Form," SIAM J. Appl. Math. 6, 26-50.
M. Gentleman (1973). "Error Analysis of QR Decompositions by Givens Transformations," Lin. Alg.
Applic. 10, 189-197.
There are modifications for the QR factorization that make it more attractive when dealing with rank
deficiency. See §5.4. Nevertheless, when combined with the condition estimation ideas in §3.5.4, the
traditional QR factorization can be used to address rank deficiency issues:
L.V. Foster (1986). "Rank and Null Space Calculations Using Matrix Decomposition without Column
Interchanges," Lin. Alg. Applic. 74, 47-71.
The behavior of the Q and R factors when A is perturbed is of interest. A main result is that the
resulting changes in Q and R are bounded by the condition of A times the relative change in A, see:
G.W. Stewart (1977). "Perturbation Bounds for the QR Factorization of a Matrix," SIAM J. Numer.
Anal. 14, 509-518.
H. Zha (1993). "A Componentwise Perturbation Analysis ofthe QR Decomposition," SIAM J. Matrix
Anal. Applic. 4, 1124-1131.
G.W. Stewart (1993). "On the Perturbation ofLU Cholesky, and QR Factorizations," SIAM J. Matrix
Anal. Applic. 14, 1141--1145.
A. Barrlund (1994). "Perturbation Bounds for the Generalized QR Factorization," Lin. Alg. Applic.
207, 251-271.
J.-G. Sun (1995). "On Perturbation Bounds for the QR Factorization," Lin. Alg. Applic. 215,
95-112.
X.-W. Chang and C.C. Paige (2001). "Componentwise Perturbation Analyses for the QR factoriza
tion," Nu.mer. Math. 88, 319-345.
Organization of the computation so that the entries in Q depend continuously on the entries in A is
discussed in:
T.F. Coleman and D.C. Sorensen (1984). "A Note on the Computation of an Orthonormal Basis for
the Null Space of a Matrix," Mathematical Programming 2.9, 234-242.
References for the Gram-Schmidt process and various ways to overcome its shortfalls include:
J.R. Rice (1966). "Experiments on Gram-Schmidt Orthogonalization," Math. Comput. 20, 325-328.
A. Bjorck (1967). "Solving Linear Least Squares Problems by Gram-Schmidt Orthogonalization," BIT
7, 1-21.
N.N. Abdelmalek (1971). "Roundoff Error Analysis for Gram-Schmidt Method and Solution of Linear
Least Squares Problems," BIT 11, 345-368.
A. Ruhe (1983). "Numerical Aspects of Gram-Schmidt Orthogonalization of Vectors," Lin. Alg.
Applic. 52/53, 591-601.
W. Jalby and B. Philippe (1991). "Stability Analysis and Improvement of the Block Gram-Schmidt
Algorithm," SIAM J. Sci. Stat. Comput. 12, 1058--1073.
A. Bjorck and C.C. Paige (1992). "Loss and Recapture ofOrthogonality in the Modified Gram-Schmidt
A. Bjorck (1994). "Numerics of Gram-Schmidt Orthogonalization," Lin. Alg. Applic. 197/198,
297-316.
L. Giraud and J. Langon (2003). "A Robust Criterion for the Modified Gram-Schmidt Algorithm with
Selective Reorthogonalization," SIAM J. Sci. Comput. 25, 417-441.
G.W. Stewart (2005). "Error Analysis ofthe Quasi-Gram-Schmidt Algorithm," SIAM J. Matrix Anal.
Applic. 27, 493-506.

L. Giraud, J. Langou, M. R.ozlonk, and J. van den Eshof (2005). "Rounding Error Analysis of the
Classical Gram-Schmidt Orthogonalization Process," Nu.mer. Math. 101, 87- 100.
A. Smoktunowicz, J.L. Barlow and J. Langou (2006). "A Note on the Error Analysis of Classical
Gram-Schmidt," Nu.mer. Math. 105, 299-313.
Various high-performance issues pertaining to the QR factorization are discussed in:
B. Mattingly, C. Meyer, and J. Ortega (1989). "Orthogonal Reduction on Vector Computers," SIAM
J. Sci. Stat. Comput. 10, 372-381.
P.A. Knight (1995). "Fast Rectangular Matrix Multiplication and the QR Decomposition,'' Lin. Alg.
Applic. 221, 69-81.
J.J. Carrig, Jr. and G.L. Meyer (1997). "Efficient Householder QR Factorization for Superscalar
Processors,'' ACM '.lhms. Math. Softw. 23, 362-378.
D. Vanderstraeten (2000). "An Accurate Parallel Block Gram-Schmidt Algorithm without Reorthog
onalization,'' Nu.mer. Lin. Alg. 7, 219-236.
E. Elmroth and F.G. Gustavson (2000). "Applying Recursion to Serial and Parallel QR Factorization
Leads to Better Performance,'' IBM J. Res. Dev. .44, 605-624.
Many important high-performance implementation ideas apply equally to LU, Cholesky, and QR, see:
A. Buttari, J. Langou, J. Kurzak, and .J. Dongarra (2009). "A Class of Parallel Tiled Linear Algebra
Algorithms for Multicore Architectures,'' Parallel Comput. 35, 38-53.
J. Kurzak, H. Ltaief, and J. Dongarra (2010). "Scheduling Dense Linear Algebra Operations on
Multicore Processors," Concurrency Comput. Pract. Exper. 22, 15-44.
J. Demmel, L. Grigori, M, Hoemmen, and J. Langou (2012). "Methods and Algorithms for Scientific
Computing Communication-optimal Parallel and Sequential QR and LU Factorizations," SIAM J.
Sci. Comput. 34, A206-A239.
Historical references concerned with parallel Givens QR include:
W.M. Gentleman and H.T. Kung (1981). "Matrix Triangularization by Systolic Arrays,'' SPIE Proc.
298, 19-26.
D.E. Heller and I.C.F. Ipsen (1983). "Systolic Networks for Orthogonal Decompositions,'' SIAM J.
Sci. Stat. Comput. 4, 261-269.
M. Costnard, J.M. Muller, and Y. Robert (1986). "Parallel QR Decomposition of a Rectangular
Matrix,'' Nu.mer. Math. 48, 239-250.
L. Eldin and R. Schreiber (1986). "An Application of Systolic Arrays to Linear Discrete Ill-Posed
Problems,'' SIAM J. Sci. Stat. Comput. 7, 892-903.
F.T. Luk (1986). "A Rotation Method for Computing the QR Factorization," SIAM J. Sci. Stat.
Comput. 7, 452-459.
J.J. Modi and M.R.B. Clarke (1986). "An Alternative Givens Ordering," Nu.mer. Math. 43, 83-90.
The QR factorization of a structured matrix is usually structured itself, see:
A.W. Bojanczyk, R.P. Brent, and F.R. de Hoog (1986). "QR Factorization of Toeplitz Matrices,''
Nu.mer. Math. 49, 81-94.
S. Qiao (1986). "Hybrid Algorithm for Fast Toeplitz Orthogonalization,'' Nu.mer. Math. 53, 351-366.
C.J. Demeure (1989). "Fast QR Factorization ofVandermonde Matrices,'' Lin. Alg. Applic. 122/123/124
165-194.
L. Reichel (1991). "Fast QR Decomposition of Vandermonde-Like Matrices and Polynomial Least
Squares Approximation,'' SIAM J. Matrix Anal. Applic. 12, 552-564.
D.R. Sweet (1991). "Fast Block Toeplitz Orthogonalization," Nu.mer. Math. 58, 613-629.
Quantum computation has an interesting connection to complex Givens rotations and their application
to vectors, see:
G. Cybenko (2001). "Reducing Quantum Computations to Elementary Unitary Transformations,"
Comput. Sci. Eng. 3, 27-32.
D.P. O'Leary and S.S. Bullock (2005). "QR Factorizations Using a Restricted Set of Rotations,"
ETNA 21, 20-27.
N.D. Mermin (2007). Quantum Computer Science, Cambridge University Press, New York.

5.3 The Full-Rank Least Squares Problem
Considertheproblemoffindingavectorx E IRn suchthatAx=b wherethedata matrix
AE IRmxn and the observation vector b E IRm aregivenand m ;:::: n. Whenthere are
more equations than unknowns, we say that the system Ax = b is overdetermined.
Usuallyanoverdeterminedsystemhasnoexactsolutionsinceb mustbeanelementof
ran(A), apropersubspaceofIRm.
This suggests that we strive to minimize IIAx- b !IP for some suitable choice of
p. Differentnormsrenderdifferentoptimumsolutions. Forexample, ifA= [ 1, 1, 1JT
and b = [ bi, b2, b3 jT with b1 ;:::: b2 ;:::: b3 ;:::: 0, then it canbe verified that
p = 1 ::::} Xopt
p = 2 ::::} Xopt
p 00 ::::} Xopt
= (b1 + b2 + b3)/3,
= (b1 + b3)/2.
Minimizationinthe1-normandinfinity-normiscomplicatedbythefactthatthefunc
tion f(x) = II Ax- b !IP is not differentiable for these values ofp. However, there are
several good techniquesavailablefor I-normand oo-norm minimization. SeeColeman
and Li (1992), Li (1993), and Zhang (1993).
Incontrast togeneralp-norm minimization, the least squares (LS) problem
ismoretractablefortworeasons:
min II Ax- b 112
xERn
(5.3.1)
• ¢(x) = � II Ax- b II� isadifferentiable functionofx andsothe minimizers of¢
satisfythegradientequation'1¢(x)=0. Thisturnsouttobeaneasilyconstructed
symmetriclinearsystemwhichispositivedefiniteifAhasfullcolumnrank.
• The 2-norm is preserved under orthogonal transformation. This means that
we can seek an orthogonal Q such that the equivalent problem of minimizing
II (QTA)x- (QTb) 112 is "easy" tosolve.
In this section we pursue these two solutionapproachesfor the case when A has full
columnrank. MethodsbasedonnormalequationsandtheQRfactorizationaredetailed
andcompared.
5.3.1 Implications of Full Rank
Supposex E IRn, z E IRn , aE IR, and considertheequality
11 A(x+ az) - b 11� = II Ax-b II� + 2azTAT(Ax-b) + a2 1! AzII�
where AE IRmxn and b E IRm. Ifx solvesthe LS problem (5.3.1), then wemust have
AT(Ax- b) = 0. Otherwise, ifz = -AT(Ax- b) and we make a small enough, then
weobtain thecontradictoryinequality II A(x+ az) - b 112 < II Ax- b 112. Wemayalso
conclude that ifx and x+ az areLS minimizers, thenz E null(A).
Thus, if A has full column rank, then there is a unique LS solution XLS and it
solvesthesymmetricpositivedefinite linear system
ATAxLs =Arb.

5.3. The Full-Rank Least Squares Problem
Thesearecalledthenormal equations. Notethatif
1
¢(x) = 211Ax - b 11�,
then
261
sosolvingthenormalequationsistantamounttosolvingthegradientequationV'¢ = 0.
Wecall
the minimum residual andweusethe notation
todenote its size. Notethat ifPLs is small, thenwecando agoodjob "predicting" b
byusingthecolumnsofA.
ThusfarwehavebeenassumingthatA E 1Rmxn hasfullcolumnrank,anassump
tion that isdroppedin §5.5. However, evenif rank(A) = n, troublecanbeexpected if
A isnearlyrankdeficient. TheSVD can beusedtosubstantiatethisremark. If
n
A = UI:VT = LaiUiVr
i=l
isthe SVD ofafull rankmatrixA E 1Rmxn, then
n m
II Ax - b II� = II (UTAV)(VTx) - urb II� = L(aiYi - (u[b))2 + L (ufb)2
i=l i=n+I
where y = vrx. Itfollows that thissummationisminimized by settingYi = ufb/ai,
i = l:n. Thus,
and 2
P�s = L (u[b)2.
i=n+l
(5.3.2)
(5.3.3)
Itisclearthat thepresenceofsmallsingularvaluesmeansLS solutionsensitivity. The
effectofperturbationsontheminimumsumofsquaresislessclearandrequiresfurther
analysiswhichweofferbelow.
WhenassessingthequalityofacomputedLS solution±Ls,therearetwoimportant
issuestobearinmind:
• How small isf1,s = b - A±Ls compared torLs = b - AxLs?

Therelativeimportanceofthesetwocriteriavariesfromapplicationtoapplication. In
anycaseit isimportantto understandhowXLs and rLs are affected by perturbations
inA and b. Ourintuitiontellsusthat ifthecolumnsofA arenearlydependent, then
thesequantitiesmaybequitesensitive. Forexample,suppose
� l·
b = [� l·
6b = [�l·
10-s 1 0
andthat XLs andXLs minimize II Ax - b lb and II (A + 6A)x - (b + 6b) 112, respectively.
IfrLs andf1.s arethecorrespondingminimumresiduals, then itcanbeshownthat
XLs = [ � ] , XLs = [ 999;. 104 ] , rLs = [� ], fLs = [-.999g· 10-2 ].
. 1 .9999 . 10°
Recall that the 2-norm condition ofarectangular matrix is the ratio ofits largest to
smallestsingularvalues. Sinceit2(A)= 106 wehave
and
II XLs - XLs 112 � _9999 . 104 < (A)2 II 6A 112 = 1012 . 10-s
II XLs 112
K2
II A 112
II fLs - rLs 112 _ 7070 . 10-2 < K (A) II 6A 112 = 106 . 10-s
II b 112
- . - 2
II A 112
.
The example suggests that the sensitivity ofXLs can depend upon it2(A)2. Below we
offeran LS perturbation theory that confirmsthepossibility.
5.3.2 The Method of Normal Equations
A widely-used method for solving the full-rank LS problem is the method of normal
equations.
Algorithm 5.3.1 (Normal Equations) GivenA E 1Rmxn withthepropertythatrank(A) =
nandb E 1Rm, this algorithmcomputesa vector XLs that minimizes II Ax - b lb·
Compute the lowertriangularportionofC = ATA.
Form the matrix-vectorproduct d = ATb.
Computethe Cholesky factorizationC = GGT.
Solve Gy = d andGTXLs =y.
This algorithm requires (m+ n/3)n2 fl.ops. The normal equation approach is conve
nient because it relies on standard algorithms: Cholesky factorization, matrix-matrix
multiplication, andmatrix-vectormultiplication. Thecompressionofthem-by-ndata
matrixA intothe(typically) muchsmallern-by-ncross-productmatrixC isattractive.
Letusconsidertheaccuracyofthecomputednormalequationssolutioni:Ls· For
clarity, assume that no roundofferrors occur during the formation of C = ATA and

5.3. The Full-Rank Least Squares Problem 263
d = ATb. ItfollowsfromwhatweknowabouttheroundoffpropertiesoftheCholesky
factorization (§4.2.6) that
where
Thus, wecanexpect
II XLs - XLs 112 ,..., (ATA) _
(A)2
II II
,..., UK2 - UK2 ·
X1.s 2
(5.3.4)
In other words, theaccuracy ofthe computed normal equations solution depends on
thesquareofthecondition. SeeHigham(ASNA, §20.4)foradetailedroundoffanalysis
ofthenormalequations approach.
It should be noted that the formation ofATA can result in asignificant loss of
information. If
A = [Ju � ].
0 JU
then K2(A) � JU. However,
isexactlysingular. Thus, themethodofnormalequationscanbreakdownonmatrices
that arenot particularlyclosetobeingnumericallyrankdeficient.
5.3.3 LS Solution Via QR Factorization
Let A E Rmxn with m � n and b E Rm be given and suppose that an orthogonal
matrixQ E Rmxm has been computedsuchthat
n
m-n
is upper triangular. If
then
[ � ] n
m-n
II Ax- b II� = II QTAx- QTb II� = II Rix - c II� + II d II�
(5.3.5)
for any xE Rn. Since rank(A) = rank(R1) = n, it follows that xLs is defined by the
uppertriangularsystem
Notethat
PLs = II d 112·

Weconcludethatthefull-rankLSproblemcanbereadilysolvedoncewehavecomputed
theQR factorizationofA. DetailsdependontheexactQR procedure. IfHouseholder
matricesareused and QT is applied infactoredformtob, thenwe obtain
Algorithm 5.3.2 (Householder LS Solution) If A E IRmxn has full column rank
and bE IRm, then the following algorithm computes a vector XLs E IRn such that
II AxLs - b112 is minimum.
UseAlgorithm 5.2.1 tooverwriteA withits QR factorization.
for j = l:n
v = [ A(j + �: m, j) ]
f3 = 2/vTv
b(j : m) = b(j : m) - fJ(vTb(j : m))v
end
Solve R(l : n, 1 : n) ·XLs = b(l:n) .
This method for solving the full-rank LS problem requires 2n2(m - n/3} flops. The
O(mn) flopsassociatedwiththeupdatingofbandtheO(n2) flopsassociatedwiththe
backsubstitutionarenot significant compared totheworkrequiredto factor A.
It canbeshownthat thecomputedi:Ls solves
where
and
minll (A + 8A)x - (b+ 8b} lb
II 8blb :::; (6m - 3n + 40} n u II b112 + O(u2).
(5.3.6}
(5.3.7}
(5.3.8}
TheseinequalitiesareestablishedinLawsonand Hanson (SLS, p. 90ff} andshowthat
i:Ls satisfies a "nearby" LS problem. (We cannot address the relative error in i:Ls
without an LS perturbation theory, to bediscussedshortly.) We mention that similar
resultsholdifGivens QR isused.
5.3.4 Breakdown in Near-Rank-Deficient Case
As withthe method ofnormal equations, the Householdermethod for solving the LS
problem breaks down in the back-substitution phase if rank(A) < n. Numerically,
trouble can be expected if K2(A) = K2(R) � 1/u. This is in contrast to the normal
equationsapproach,wherecompletionoftheCholeskyfactorizationbecomesproblem
aticalonceK2(A) isintheneighborhoodof1/JU asweshowedabove. Hencetheclaim
inLawsonand Hanson (SLS, pp. 126-127} that forafixedmachineprecision, awider
classofLS problemscanbesolvedusingHouseholderorthogonalization.
5.3.5 A Note on the MGS Approach
In principle, MGS computes the thin QR factorization A = Q1R1. This is enough
to solve the full-rank LS problem because it transforms the normal equation system

(ATA)x = ATb to the upper triangular system Rix = Qfb. But an analysis ofthis
approachwhenQ[b isexplicit!�fo�medintroducesal'i:2(A)2 term. Thisisbecausethe
computedfactor Qi satisfies 11 QfQi - In 112 � Ul'i:2(A) aswementioned in §5.2.9.
However, ifMGSisappliedtotheaugmentedmatrix
then z = Qfb. ComputingQfb inthisfashionandsolvingRixLs = z producesanLS
solution ::l5 that is "just as good" as the Householder QRmethod. That is tosay, a
result ofthe form (5.3.6)-(5.3.8) applies. See Bjorckand Paige (1992).
It should be noted that the MGSmethodisslightlymoreexpensivethanHouse
holderQRbecauseitalwaysmanipulatesm-vectorswhereasthelatterproceduredeals
withvectorsthatbecomeshorter inlengthasthealgorithmprogresses.
5.3.6 The Sensitivity of the LS Problem
Wenowdevelopaperturbationtheoryforthefull-rankLSproblemthat assistsinthe
comparison ofthe normal equations and QR approaches. LS sensitivity analysis has
along and fascinating history. Grear (2009, 2010) compares about a dozen different
results that have appeared in the literature over the decades and the theorem below
follows his analysis. It examines how the LS solution and its residual are affected by
changes inA and b and thereby sheds light on the conditionofthe LS problem. Four
factsabout A E llmxn areused inthe proof, whereit is assumed thatm > n:
1
1
II A(ATA)-iAT 112,
II I - A(ATA)-iAT 112,
1
an(A)
1
Theseequationsare easily verified using the SVD.
Theorem 5.3.1. Suppose that XLs, rL5, ±Ls, and fLs satisfy
II AxLs - b 112 min,
(5.3.9)
II (A + 8A)i:Ls - (b + 8b) 112 min, fLs = (b + 8b) - (A + 8A)xLs,
where A has rank n and II 8A 112 < an(A). Assume that b, rL5, and XLs are not zero.
Let (}Ls E (0, 7r/2) be defined by
If
and
{II 8A 112 II 8b 112 }
max
II A 112 ' lfblr;
II AxLs 1'2
O"n(A) ll XLs 112 '
(5.3.1.0)

then
(5.3.11)
and
(5.3.12)
Proof. LetE andf bedefinedbyE = 8A/e andf= 8b/e. ByTheorem2.5.2wehave
rank(A + tE) = n for allt E [O, e]. It follows that the solutionx(t) to
(5.3.13)
iscontinuouslydifferentiableforallt E [O, e]. SinceXLs = x(O) andXLs = x(e), wehave
XLs = XLs + f.X(O) + O(e2).
Bytakingnormsanddividingby II XLs 112 weobtain
(5.3.14)
In order to bound II :i:(O) 112, wedifferentiate (5.3.13) and set t = 0 in theresult. This
gives
i.e.,
(5.3.15)
Using (5.3.9) andtheinequalities II f112 :::; II b 112 and II E 112 :::; II A 112, it follows that
II :i:(O) II
<
II (ATA)-1ATf 112 + II (ATA)-1ATExLs lb + II (ATA)-1ETrLs 112
:::; Jl!Jk_ + II A 11211 XLs 112 +
II A 11211 rLs 112
<1n(A) <1n(A) <1n(A)2
By substituting this into (5.3.14) weobtain
II XLs - XLs 112 <
E ( II b 112 + II A 112 + II A 11211 rLs 112 ) + O(e2)
II XLs 112 - <1n(A)ll XLs 112 <1n(A) <1n(A)211 XLs 112
. .
Inequality (5.3.11) followsfromthedefinitionsofii:2(A) and VLs and theidentities
(() ) II Tr,s 112
tan Ls =
II A II
·
X1.s 2
(5.3.16)
Theproofoftheresidualbound(5.3.12)issimilar. Definethedifferentiablevector
function r(t) by
r(t) = (b + ti) - (A + tE)x(t)

5.3. The Full-Rank Least Squares Problem
andobservethatrLs = r(O) andfLs = r(e). Thus,
From (5.3.15) wehave
IIfLs - rLs 112
II T1,s 112
r(O) = (I - A(ATA)-1AT) (! - ExLs) - A(ATA)-1ETrLs·
267
(5.3.17)
Bytakingnorms, using (5.3.9) andtheinequalities II f 112 � II b 112 and II E 112 � II A 112,
weobtain
II r(O) 112 � II b 112 + II A 11211XLs 112 + II A!:��)s 112
andthusfrom (5.3.17) wehave
II fLs - T1,s 112 <
II rLs 112 -
Theinequality (5.3.12) followsfromthedefinitionsofK2(A) and llLs and theidentities
(5.3.16). 0
It is instructive to identify conditions that turn the upper bound in (5.3.11) into a
bound that involves K2(A)2. The example in §5.3.1 suggests that this factor might
figure in the definitionofan LS condition number. However, thetheoremshowsthat
thesituationis moresubtle. Notethat
II AxLs 112 < II A 112 = K2(A).
O'n(A)ll X1,s 112 - O'n(A)
The SVD expansion (5.3.2) suggests that ifb hasamodest component in thedirection
oftheleftsingularvectorUn, then
Ifthis is the case and (}Ls is sufficiently bounded away from 7r/2, then the inequality
(5.3.11) essentiallysaysthat
II X1,s- X1,s 112 ( (A) P1,s (A)2)
II XLs 112
� € K2 +
ifblf;K2 . (5.3.18)
AlthoughthissimpleheuristicassessmentofLSsensitivityisalmostalwaysapplicable,
it important torememberthat thetrueconditionofaparticularLS problemdepends
on llLS• (}LS• and K2(A).
Regarding the perturbationoftheresidual, observe that the upper bound in the
residual result (5.3.12) is less than the upper bound in the solution result (5.3.11) by
afactor ofllLstan(OLs)· We also observe that if(}Ls is sufficiently bounded away from
both 0 and7r/2, then (5.3.12) essentiallysaysthat
II fLs - rLs 112 (A) (53 19)
II T1,s 112
� € • K2 . . .
For more insights into the subtleties behind Theorem 5.3.1., see Wedin (1973), Van
dersluis (1975), Bjorck (NMLS, p. 30), Higham (ASNA, p. 382), and Grcar(2010).

5.3.7 Normal Equations Versus QR
It is instructive to comparethe normal equation and QR approaches to the full-rank
LS probleminlightofTheorem5.3.1.
• The method ofnormal equations produces an :i:Ls whose relative error depends
on ,.,;2(A)2, a factor that can be considerably largerthan the condition number
associated with a "smallresidual" LS problem.
• The QR approach (Householder, Givens, careful MGS) solvesanearbyLS prob
lem. Therefore, these methods produce acomputed solution with relative error
that is "predicted" bythe condition oftheunderlyingLSproblem.
Thus, the QR approachismoreappealinginsituationswherebis close to the spanof
A's columns.
Finally,wementiontwootherfactorsthatfigureinthedebateabout QR versus
normal equations. First, the normal equations approach involves about half of the
arithmeticwhenm » n anddoesnot requireasmuchstorage, assumingthatQ(:, l:n)
is required. Second, QR approaches are applicable to a wider class ofLS problems.
This is because the Cholesky solve in the methodofnormal equations is "in trouble"
if,.,;2(A) � 1/JU whiletheR-solvestepinaQR approachisintroubleonlyif,.,;2(A) �
1/u. Choosingthe "right" algorithmrequireshavinganappreciationforthesetradeoffs.
5.3.8 Iterative Improvement
AtechniqueforrefininganapproximateLSsolutionhasbeenanalyzedbyBjorck(1967,
1968). It isbasedontheideathat if
(5.3.20)
then II b-Ax112 = min. Thisfollowsbecauser +Ax=bandATr = 0 implyATAx=
Arb. The above augmented system is nonsingular ifrank(A) = n, which we hereafter
assume. BycastingtheLS problem intheformofasquare linearsystem,theiterative
improvementscheme §3.5.3 canbeapplied:
r<0l =0, x<0l =0
for k = 0, 1, . . .
[�;:� l = [ � ] -
[
A: � l [:::�l
end
[A: � l [p(k)
l
z(k)
[r<k+t) l =
x<k+t) [r(k)
l + [p(k)
l
x(k) z(k)

TheresidualsJ(k) andg(k) mustbecomputedinhigherprecision, andanoriginalcopy
ofA must be around for thispurpose.
If the QR factorization of A is available, then the solution of the augmented
system is readily obtained. In particular, if A = QR and Ri = R(l:n, l:n), then a
system ofthe form
[A; � l [:l [� l
transformsto
[l !�-· 1·][�
l
� [�l
where
[ Ji ] n
h m-n [ � ] n
m-n
Thus, p and z can be determined by solving the triangular systems Rfh = g and
R1z = Ji -handsetting
Assumingthat Q isstored infactoredform,eachiterationrequires8mn-2n2 flops.
Thekeyto the iteration'ssuccessis that both the LS residual and solution are
updated-notjust the solution. Bjorck (1968) shows that if 11:2(A) � f3q and t-digit,
{3-base arithmetic is used, then x(k) has approximately k(t-q) correct base-/3 digits,
provided the residuals arecomputedindoubleprecision. Notice that it is 11:2(A), not
11:2(A)2, that appearsinthis heuristic.
5.3.9 Some Point/Line/Plane Nearness Problems in 3-Space
Thefieldsofcomputergraphicsandcomputervisionarerepletewithmanyinteresting
matrix problems. Below we pose three geometric "nearness" problems that involve
points, lines, andplanes in3-space. Eachisahighly structured least squaresproblem
withasimple,closed-formsolution. Theunderlyingtrigonometryleadsrathernaturally
tothevectorcrossproduct,sowestartwithaquickreviewofthisimportantoperation.
The cross product ofavectorp E 1R3 withavectorq E 1R3 isdefinedby
[P2Q3 - p3q2 l
p x q = p3q1 - Plq3 ·
P1Q2 - P2Q1
Thisoperation canbeframed as amatrix-vector product. Foranyv E 1R3, define the
skew-symmetric matrix vc by

It follows that
Using theskew-symmetryofpc and qc, it iseasytoshowthat
p x q E span{p , q}l..
Otherpropertiesinclude
(p x q)T(r x s) = (pcq)T· (rcs) = det((p qf[r s]),
PcPc = PPT - II P ll� ·la,
II pcq II� = II P II�· IIq II�·
(1 -
(II�I��q 112 )
2
)·
(5.3.21)
(5.3.23)
(5.3.24)
(5.3.25)
Wearenowsettostatethethreeproblemsandspecifytheirtheoretical solutions.
For hints at howtoestablishthecorrectnessofthe solutions, seeP5.3.13-P5.3.15.
Problem 1. GivenalineL and apoint y, findthepoint z0Pt on Lthat isclosesttoy,
i.e., solve
min IIz - y 112•
zE L
IfLpasses throughdistinctpointsPl andP2, then it can beshownthat
v = P2 - p1. (5.3.26)
Problem 2. Given lines L1 and L2,findthepoint z?t onL1 thatisclosesttoL2 and
the point z�pt on L2 that is closest toLi, i.e., solve
min II z1 - z2 112·
z1 E L1 , z2 EL2
IfL1 passesthroughdistinctpointsPl andP2 andL2 passesthroughdistinctpointsqi
andq2, then itcanbeshownthat
opt 1
T c( )
z1 = PI + 1' · vw · r qi - P1 ,
r r
opt 1
T c( )
Z2 = qi + -· WV ·
T ql - Pl ,
rTr
where v = P2 - pi, w = q2 - qi, andr = vcw.
(5.3.27)
(5.3.28)
Problem 3. Givenaplane P andapoint y, findthepoint z0Pt on P that isclosestto
y, i.e., solve
min IIz - y 112•
zEP

If P passesthroughthreedistinctpointsp1 , p2, andp3, then itcanbeshownthat
opt 1 c c ( )
z = P1 - -- · v v y - P1
vTv
wherev = (p2 - P1 )c (p3 - P1).
(5.3.29)
Thenice closed-formsolutions (5.3.26)-(5.3.29) are deceptivelysimpleandgreat care
must beexercisedwhen computing with theseformulaeortheirmathematicalequiva
lents. SeeKahan (2011).
Problems
P5.3.l Assume ATAx = Arb, (ATA + F)x = Arb, and 211 F 112 � an(A)2. Show that if r = b - Ax
and f = b - Ax, then f - r = A(ATA + F)-1Fx and
P5.3.2 Assume that ATAx = ATb and that ATAx = Arb + f where II f 1'2 � cull AT 11211 b 1'2 and A
has full column rank. Show that
P5.3.3 Let A E Rmxn (m 2 n), w E Rn, and define
Show that an(B) 2 a,.(A) and a1 (B) � Jll A II� + I w II�· Thus, the condition of a matrix may
increase or decrease if a row is added.
P5.3.4 (Cline 1973) Suppose that A E Rmxn has rank n and that Gaussian elimination with partial
pivoting is used to compute the factorization PA = LU, where L E R'nxn is unit lower triangular,
U E Rnxn is upper triangular, and P E Rmxm is a permutation. Explain how the decomposition in
P5.2.5 can be used to find a vector z E Rn such that II Lz - Pb 112 is minimized. Show that if Ux = z,
then II Ax - b '2 is minimum. Show that this method of solving the LS problem is more efficient than
Householder QR from the flop point of view whenever m � 5n/3.
P5.3.5 The matrix C = (ATA)-1 , where rank(A) = n, arises in many statistical applications. Assume
that the factorization A = QR is available. (a) Show C = (RTR)-1. (b) Give an algorithm for
computing the diagonal of C that requires n3/3 flops. (c) Show that
R = [ � v; ] � C = (RTR)-1 = [ (1 +��f�/cx2 -v7g1/cx ]
where C1 = (srs)-1 . (d) Using (c), give an algorithm that overwrites the upper triangular portion
of R with the upper triangular portion of C. Your algorithm should require 2n3/3 flops.
P5.3.6 Suppose A E Rnxn is symmetric and that r = b - Ax where r, b, x E Rn and x is nonzero.
Show how to compute a symmetric E E Rnxn with minimal Frobenius norm so that (A + E)x = b.
Hint: Use the QR factorization of [ x I r ] and note that Ex = r � (QTEQ)(QTx) = QTr.
P5.3.7 Points P1 , . . . , Pn on the x-axis have x-coordinates x1, . . . , Xn. We know that x1 = 0 and wish
to compute x2, . . . , Xn given that we have estimates dij of the separations:
1 � i < j � n.
Using the method of normal equations, show how to minimize
n-1 n
¢(x1, . . . , xn) = L L (xi - Xj - d;j)2
i=l j=i+l

subject to the constraint x1 = 0.
PS.3.8 Suppose A E Rmxn has full rank and that b E Rm and c E Rn are given. Show how to compute
a = cTXLs without computing XLS explicitly. Hint: Suppose Z is a Householder matrix such that
zTc is a multiple of In(:, n). It follows that a = (ZTc)TYLs where YLs minimizes II Ay -b 1'2 with
y = zTx and A = AZ.
PS.3.9 Suppose A E R"'xn and b E Rm with m 2'. n . How would you solve the full rank least squares
problem given the availability of a matrix !vi E Rmxm such that MTA = S is upper triangular and
MTM = D is diagonal?
PS.3.10 Let A E Rmxn have rank n and for a 2'. 0 define
Show that
[ aim
M(a) =
AT
CTm+n(M(a)) = min {a , -�+ crn(A)2 + (�)2
}
and determine the value of a that minimizes �2(M(a)).
P5.3.ll Another iterative improvement method for LS problems is the following:
x(O) = 0
for k = 0, 1, ...
end
r(k) = b - Ax(k) (double precision)
II Az(k) - rCk) 112 = min
x<k+l) =
x(k) + z(k)
(a) Assuming that the QR factorization of A is available, how many flops per iteration are required?
(b) Show that the above iteration results by setting g(k) = 0 in the iterative improvement scheme
given in §5.3.8.
P5.3.12 Verify (5.3.21)-(5.3.25).
P5.3.13 Verify (5.3.26) noting that L = {Pl + T(p2 -PI) : T E R }.
P5.3.14 Verify (5.3.27) noting that the minimizer Topt E R2 of II (p1 - q1) - [ p2 -PI I Q2 -QI ]T 1'2
is relevant.
P5.3.15 Verify (5.3.29) noting that P = { x : xT((p2 - p1) x (p3 -P1)) = 0.
Some classical references for the least squares problem include:
F.L. Bauer (1965). "Elimination with Weighted Row Combinations for Solving Linear Equations and
Least Squares Problems,'' Numer. Math. 7, 338-352.
G.H. Golub and J.H. Wilkinson (1966). "Note on the Iterative Refinement of Least Squares Solution,''
Numer. Math. 9, 139-148.
A. van der Sluis (1975). "Stability of the Solutions of Linear Least Squares Problem," Numer. Math.
29, 241-254.
The use of Gauss transformations to solve the LS problem has attracted some attention because they
are cheaper to use than Householder or Givens matrices, see:
G. Peters and J.H. Wilkinson (1970). "The Least Squares Problem and Pseudo-Inverses,'' Comput.
J. 19, 309-16.
A.K. Cline (1973). "An Elimination Method for the Solution of Linear Least Squares Problems,''
SIAM J. Numer. Anal. 1 0, 283-289.
R.J. Plemmons (1974). "Linear Least Squares by Elimination and MGS," J. ACM 21, 581-585.

The seminormal equations are given by RTRx = ATb where A = QR. It can be shown that by solving
the serninormal equations an acceptable LS solution is obtained if one step of fixed precision iterative
improvement is performed, see:
A. Bjorck (1987). "Stability Analysis of the Method of Seminormal Equations," Lin. Alg. Applic.
88/89, 31-48.
Survey treatments of LS perturbation theory include Lawson and Hanson (SLS), Stewart and Sun
(MPT), and Bjorck (NMLS). See also:
P.-A. Wedin (1973). "Perturbation Theory for Pseudoinverses," BIT 13, 217-232.
A. Bjorck (1991). "Component-wise Perturbation Analysis and Error Bounds for "Linear Least Squares
Solutions," BIT 31, 238-244.
B. Walden, R. Karlson, J. Sun (1995). "Optimal Backward Perturbation Bounds for the Linear Least
Squares Problem," Numerical Lin. Alg. Applic. 2, 271-286.
J.-G. Sun (1996). "Optimal Backward Perturbation Bounds for the Linear Least-Squares Problem
with Multiple Right-Hand Sides," IMA J. Numer. Anal. 16, 1-11.
J.-G. Sun (1997). "On Optimal Backward Perturbation Bounds for the Linear Least Squares Problem,"
BIT 37, 179-188.
R. Karlson and B. Walden (1997). "Estimation of Optimal Backward Perturbation Bounds for the
Linear Least Squares Problem," BIT 37, 862-869.
J.-G. Sun (1997). "On Optimal Backward Perturbation Bounds for the Linear Least Squares Problem,"
BIT 37, 179-188.
M. Gu (1998). "Backward Perturbation Bounds for Linear Least Squares Problems," SIAM J. Matrix
Anal. Applic. 20, 363-372.
M. Arioli, M. Baboulin and S. Gratton (2007). "A Partial Condition Number for Linear Least Squares
Problems," SIAM J. Matrix Anal. Applic. 29, 413 433.
M. Baboulin, J. Dongarra, S. Gratton, and J. Langon (2009). "Computing the Conditioning of the
Components of a Linear Least-Squares Solution," Num. Lin. Alg. Applic. 16, 517-533.
M. Baboulin and S. Gratton (2009). "Using Dual Techniques to Derive Componentwise and Mixed
Condition Numbers for a Linear Function of a Least Squares Solution," BIT 49, 3-19.
J. Grear (2009). "Nuclear Norms of Rank-2 Matrices for Spectral Condition Numbers of Rank Least
Squares Solutions," ArXiv:l003.2733v4.
J. Grear (2010). "Spectral Condition Numbers of Orthogonal Projections and Full Rank Linear Least
Squares Residuals," SIAM J. Matri.1: Anal. Applic. 31, 2934-2949.
Practical insights into the accuracy of a computed least squares solution can be obtained by applying
the condition estimation ideas of §3.5. to the R matrix in A = QR or the Cholesky factor of ATA
should a normal equation approach be used. For a discussion of LS-specific condition estimation, see:
G.W. Stewart (1980). "The Efficient Generation of Random Orthogonal Matrices with an Application
to Condition Estimators," SIAM J. Numer. Anal. 1 7, 403-9.
S. Gratton (1996). "On the Condition Number of Linear Least Squares Problems in a Weighted
Frobenius Norm," BIT 36, 523-530.
C.S. Kenney, A. .J. Laub, and M.S. Reese (1998). "Statistical Condition Estimation for Linear Least
Squares," SIAM J. Matrix Anal. Applic. 19, 906-923.
Our restriction to least squares approximation is not a vote against minimization in other norms.
There are occasions when it is advisable to minimize II Ax - b llP for p = 1 and oo. Some algorithms
for doing this are described in:
A.K. Cline (1976). "A Descent Method for the Uniform Solution to Overdetermined Systems of
Equations," SIAM J. Numer. Anal. 13, 293-309.
R.H. Bartels, A.R. Conn, and C. Charalambous (1978). "On Cline's Direct Method for Solving
Overdetermined Linear Systems in the L00 Sense," SIAM J. Numer. Anal. 15, 255-270.
T.F. Coleman and Y. Li (1992). "A Globally and Quadratically Convergent Affine Scaling Method
for Linear Li Problems," Mathematical Programming 56, Series A, 189-222.
Y. Li (1993). "A Globally Convergent Method for Lp Problems," SIAM J. Optim. 3, 609-629.
Y. Zhang (1993). ·'A Primal-Dual Interior Point Approach for Computing the Li and L00 Solutions
of Overdeterrnirwd Linear Systems," J. Optim. Theory Applic. 77, 323-341.
Iterative improvement in the least squares context is discussed in:

G.H. Golub and J.H. Wilkinson (1966). "Note on Iterative Refinement of Least Squares Solutions,"
Numer. Math. 9, 139-148.
A. Bjorck and G.H. Golub (1967). "Iterative Refinement of Linear Least Squares Solutions by House-
holder Transformation," BIT 7, 322-337.
A. Bjorck (1967). "Iterative Refinement of Linear Least Squares Solutions I," BIT 7, 257--278.
A. Bjorck (1968). "Iterative Refinement of Linear Least Squares Solutions II," BIT 8, 8-30.
J. Gluchowska and A. Smoktunowicz (1999). "Solving the Linear Least Squares Problem with Very
High Relative Acuracy," Computing 45, 345-354.
J. Demmel, Y. Hida, and E.J. Riedy (2009). "Extra-Precise Iterative Refinement for Overdetermined
Least Squares Problems," ACM Trans. Math. Softw. 35, Article 28.
The following texts treat various geometric matrix problems that arise in computer graphics and
vision:
A.S. Glassner (1989). An Introduction to Ray Tracing, Morgan Kaufmann, Burlington, MA.
R. Hartley and A. Zisserman (2004). Multiple View Geometry in Computer Vision, Second Edition,
Cambridge University Press, New York.
M. Pharr and M. Humphreys (2010). Physically Based Rendering, from Theory to Implementation,
Second Edition, Morgan Kaufmann, Burlington, MA.
For a numerical perspective, see:
W. Kahan (2008). "Computing Cross-Products and Rotations in 2- and 3-dimensional Euclidean
Spaces," http://www.cs.berkeley.edu/ wkahan/MathHl 10/Cross.pdf.
5.4 Other Orthogonal Factorizations
SupposeA E 1R.mx4 hasathin QRfactorizationofthefollowingform:
A
Note that ran(A) hasdimension3 but does not equal span{q1, Q2, q3}, span{q1, Q2, q4},
span{Q1,q3, q4}, orspan{Q2, q3,q4} becausea4 docsnotbelongtoanyofthesesubspaces.
Inthiscase, theQRfactorizationrevealsneithertherangenorthenullspaceofA and
the number of nonzeros on R's diagonal does not equal its rank. Moreover, the LS
solutionprocessbasedontheQRfactorization (Algorithm5.3.2) breaksdownbecause
the upper triangular portion ofRissingular.
Westart thissectionby introducingseveraldecompositions that overcomethese
shortcomings. They all have the form QTAZ = T where T is a structured block
triangular matrix that sheds light on A's rank, range, and nullspacc. We informally
refertomatrixreductionsofthisformasrank revealing. SeeChandrasckarenandIpsen
(1994) foramorepreciseformulationoftheconcept.
Our focus is on a modification of the QR factorization that involves column
pivoting. The resulting R-matrix has a structure that supports rank estimation. To
set thestage for updatingmethods, we briefly discus the ULV and UTV frameworks
Updatingisdiscussedin§6.5 andreferstotheefficientrecomputationofafactorization
afterthematrixundergoesalow-rankchange.
AllthesemethodscanberegardedasinexpensivealternativestotheSVD,which
represents the "gold standard" in the area of rank determination. Nothing "takes
apart" a matrix so conclusively as the SVD and so we include an explanation ofits
airtightreliability. Thecomputationofthefull SVD, whichwediscussin §8.6, begins

5.4. Other Orthogonal Factorizations 275
with the reduction to bidiagonal form using Householder matrices. Because this de
composition is important in its own right, we provide some details at the end ofthis
section.
5.4.1 Numerical Rank and the SVD
Suppose A E 1Rmxn has SVD urAV = E = diag(ai)· If rank(A) = r < n, then
according to the exact arithmetic discussion of §2.4 the singular values ar+1,.. . , an
arezero and
r
A = L, akukvf. (5.4.1)
i=l
Theexposureofrankdegeneracycouldnot bemoreclear.
InChapter8 wedescribetheGolub-Kahan-Reinschalgorithmforcomputingthe
SVD. Properlyimplemented, itproducesnearlyorthogonalmatricesfJ and V sothat
� 1' � �
U AV � E = diag(&1,..., &n),
(Other SVD procedures have this property as well.) Unfortunately, unless remark
ablecancellationoccurs, none ofthecomputedsingular values will bezero because of
roundofferror. Thisforcesanissue. Ontheonehand,wecanadheretothestrictmath
ematicaldefinitionofrank,countthenumberofnonzerocomputedsingularvalues,and
concludefrom
n
A � L_ &kfh1ff; (5.4.2)
i=l
that A has full rank. However, working with every matrix as ifit possessed full col
umn rank is not particularly useful. It is more productive to liberalize the notion of
rankby setting small computed singular values to zero in (5.4.2). This results in an
approximationoftheform
r
A � L,&kukfi[, (5.4.3)
i=l
whereweregardr as the numerical rank. Forthisapproachtomakesenseweneed to
guaranteethat lai -ail issmall.
For a properly implemented Golub-Kahan-Reinsch SVD algorithm, it can be
shown that
u
V Z + b.V, zrz = In ,
E = wr(A + b.A)Z,
II b.U 112 < f,
II b.V 112 < f,
II b.A 112 < €11 A IJ2,
(5.4.4)
where «: is a small multiple of u, the machine precision. In other words, the SVD
algorithmcomputesthesingularvaluesofanearbymatrixA + b.A.

Note that fJ and V are not necessarily close to their exact counterparts. However,
we can show that ch is close to CTk as follows. Using Corollary 2.4.6 we have
erk = min II A -
B 112 min II (E - B) - E 112
where
and
Since
and
it follows that
rank(B)=k- 1 rank( B)=k- 1
II E -
B II - II E II < II E - B II < II E - B II + II E II
min II Ek - B 112 ak ,
rank(B)=k - 1
Jerk - ihl ::; E a1
for k = 1: n . Thus, if A has rank r, then we can expect n - r of the computed singular
values to be small. Near rank deficiency in A cannot escape detection if the SVD of A
is computed.
Of course, all this hinges on having a definition of "small." This amounts to
choosing a tolerance o > 0 and declaring A to have numerical rank r if the computed
singular values satisfy
(5.4.5)
We refer to the integer f as the o-rank ofA. The tolerance should be consistent with the
machine precision, e.g., o = u ll A II=· However, if the general level of relative error in
the data is larger than u, then o should be correspondingly bigger, e.g., o = 10-211 A II=
if the entries in A are correct to two digits.
For a given o it is important to stress that, although the SVD provides a great deal
of rank-related insight, it does not change the fact that the determination of numerical
rank is a sensitive computation. If the gap between &;: and &;:+l is small, then A is
also close (in the o sense) to a matrix with rank r - 1. Thus, the amount of confidence
We have in the correctness of T and in how WC proceed to USC the approximation (5.4.2)
depends on the gap between &; and &;+1.
5.4.2 QR with Column Pivoting
We now examine alternative rank-revealing strategies to the SVD starting with a mod
ification of the Householder QR factorization procedure (Algorithm 5.2.1). In exact
arithmetic, the modified algorithm computes the factorization
r n - r
(5.4.6)

where r = rank(A), Q is orthogonal, R1 1 is upper triangular and nonsingular, and
II is a permutation. If we have the column partitionings AII = [ ac1 I · · · I ac., ] and
Q = [ Q1 I · · · I qm ] , then for k = l:n we have
implying
min{r,k}
ack = L Tikqi E span{q1, . . . , qr}
·i=l
ran(A) = span{qi, . . . , qr}.
To see how to compute such a factorization, assume for some k that we have
computed Householder matrices H1, . . . , Hk-1 and permutations II1, . . . , IIk-1 such
that
[ R��-l) R��-l) ] k-1
O R��-i) m-k+i
k-1 n-k+l
where Ri�-l) is a nonsingular and upper triangular matrix. Now suppose that
Rck-1) _
[ (k-1) I I (k-1) l
22 - Zk · · · Zn
is a column partitioning and let p � k be the smallest index such that
II (k-l) II {11 (k-l) II II (k-i) II }
Zp 2 = rnax zk 2 , • • •
, Zn 2 •
(5.4.7)
(5.4.8)
Note that if rank(A) = k- 1, then this maximum is zero and we are finished. Otherwise,
let Ilk be the n-by-n identity with columns p and k interchanged and determine a
Householder matrix Hk such that if
then R(k)(k + l:m, k) = 0. In other words, Ilk moves the largest column in R��-l} to
the lead position and Hk zeroes all of its subdiagonal components.
The column norms do not have to be recomputed at each stage if we exploit the
property
T [ O: ] 1
Q z -
w s-1
II w II� = II z II� - 0:2,
which holds for any orthogonal matrix Q E 1Rsxs. This reduces the overhead associated
with column pivoting from O(mn2) flops to O(mn) flops because we can get the new
column norms by updating the old column norms, e.g.,
II (k) 112 - II (k-1) 112 2
zi 2 - zi 2
- rki j = k + l:n.
Combining all ofthe above we obtain the following algorithm first presented by Businger
and Golub (1965):

Algorithm 5.4.1 (Householder QR With Column Pivoting) Given A E nrxn with
m 2: n, the following algorithm computes r = rank(A) and the factorization (5.4.6)
with Q = H1 · · · Hr and II = II1 · · · IIr. The upper triangular part of A is overwritten
by the upper triangular part of R and components j + l:m of the jth Householder
vector are stored in A(j + l:m,j). The permutation II is encoded in an integer vector
piv. In particular, IIj is the identity with rows j and piv(j) interchanged.
for j = l:n
c(j) = A(l:m,j)TA(l:m,j)
end
r = 0
T = max{c(l), . . . , c(n)}
while T > 0 and r < n
r = r + l
Find smallest k with r :S k :S n so c(k) = T.
piv(r) = k
A(l:m, r) ++ A(l:m, k)
c(r) ++ c(k)
[v, ,B] = house(A(r:m, r))
A(r:m, r:n) = (Im-r+l - ,BvvT)A(:r:m, r:n)
A(r + l:m, r) = v(2:m - r + 1)
for i = r + l:n
c(i) = c(i) - A(r, i)2
end
T = max{c(r + 1), . . . , c(n)}
end
This algorithm requires 4mnr - 2r2(m + n) + 4r3/3 flops where r = rank(A).
5.4.3 Numerical Rank and AII = QR
In principle, QR with column pivoting reveals rank. But how informative is the method
in the context of floating point arithmetic? After k steps we have
[ R
�(k) �(k)
11 R12 ] k
0 �(k) m-k
R22
k n-k
(5.4.9)
If fiW is suitably small in norm, then it is reasonable to terminate the reduction and
declare A to have rank k. A typical termination criteria might be

for some small machine-dependent parameter €1. In view of the roundoff properties
associated with Householder matrix computation (cf. §5.1.12), we know that R(k) is
the exact R-factor of a matrix A + Ek, where
€2 = O(u).
Using Corollary 2.4.4 we have
O'k+l(A + Ek) = O'k+l(R(k)) � II Ei�) 112 .
Since O'k+i(A) � O"k+l (A + Ek) + 11 Ek 112, it follows that
In other words, a relative perturbation of 0(€1 + €2) in A can yield a rank-k matrix.
With this termination criterion, we conclude that QR with column pivoting discovers
rank deficiency if R��) is small for some k < n. However, it does not follow that the
matrix R��) in (5.4.9) is small if rank(A) = k. There are examples of nearly rank
deficient matrices whose R-factor look perfectly "normal." A famous example is the
Kahan matrix
1 -c -c -c
0 1 -c -c
Kahn(s) diag(l, s, . . . ' sn-l)
1 -c
0 1
Here, c2+s2 = 1 with c, s > 0. (See Lawson and Hanson (SLS, p. 31).) These matrices
are unaltered by Algorithm 5.4.1 and thus II ��) 112 � sn-l for k = l:n - 1 . This
inequality implies (for example) that the matrix Kah300(.99) has no particularly small
trailing principal submatrix since s299 � .05. However, a calculation shows that 0'300
= 0(10-19).
Nevertheless, in practice, small trailing R-suhmatrices almost always emerge that
correlate well with the underlying rank. In other words, it is almost always the case
that R��) is small if A has rank k.
5.4.4 Finding a Good Column Ordering
It is important to appreciate that Algorithm 5.4.1 is just one way to determine the
column pemmtation II. The following result sets the stage for a better way.
Theorem 5.4.1. If A E Rmxn and v E Rn is a unit 2-norm vector, then there exists
a permutation II so that the QR factorization
AII = QR
satisfies lrnnl < ..fii,a where a = 11 Av 112.

Proof. Suppose II E JR.nxn is a permutation such that if w = IITv, then
lwnl = max lvil·
Since Wn is the largest component of a unit 2-norm vector, lwnl � 1/.Jii,. If AII = QR
is a QR factorization, then
u = II Av 112 = II (QTAII)(IITv) 112 = II R(l:n, l:n)w 112 � lrnnWnl � lrnnl/.Jii,. D
Note that if v = Vn is the right singular vector corresponding to Umin (A), then lrnnl ::;
.Jii,an. This suggests a framework whereby the column permutation matrix II is based
on an estimate of Vn:
Step 1. Compute the QR factorization A = Qollo and note that Ro has the
same right singular vectors as A.
Step 2. Use condition estimation techniques to obtain a unit vector v with
II flov 112 :::::: Un.
Step 3. Determine II and the QR factorization AII = QR.
See Chan (1987) for details about this approach to rank determination. The permu
tation II can be generated as a sequence of swap permutations. This supports a very
economical Givens rotation method for generating of Q and R from Qo and Ro.
5.4.5 More General Rank-Revealing Decompositions
Additional rank-revealing strategies emerge if we allow general orthogonal recombina
tions ofthe A's columns instead ofjust permutations. That is, we look for an orthogonal
Z so that the QR factorization
AZ = QR
produces a rank-revealing R. To impart the spirit of this type of matrix reduction,
we show how the rank-revealing properties of a given AZ = QR factorization can be
improved by replacing Z, Q, and R with
respectively, where Qa and Za are products of Givens rotations and Rnew is upper
triangular. The rotations are generated by introducing zeros into a unit 2-norm n
vector v which we assume approximates the n-th right singular vector of AZ. In
particular, if Z'{;v = en = In(:, n) and 11 Rv 112 :::::: Un, then
II Rnew€n 112 = II Q'[;RZaen 112 = II Q�Rv 112 = II Rv 112 :::::: Un
This says that the norm of the last column of Rnew is approximately the smallest
singular value of A, which is certainly one way to reveal the underlying matrix rank.
We use the case n = 4 to illustrate how the Givens rotations arise and why the
overall process is economical. Because we are transforming v to en and not e1, we
need to "flip" the mission of the 2-by-2 rotations in the Za computations so that top
components are zeroed, i.e.,
[ � l =
[_: : l [ : l·

5.4. Other Orthogonal Factorizations
This requires only a slight modification of Algorithm 5.1.3.
In the n = 4 case we start with
and proceed to compute
and
x
x
0
0
x
x
x
0 �l v
[n
281
as products of Givens rotations. The first step is to zero the top component of v with
a "flipped" (1,2) rotation and update R accordingly:
x
x
0
0
x
x
x
0 rn
To remove the unwanted subdiagonal in R, we apply a conventional (nonflipped) Givens
rotation from the left to R (but not v):
The next step is analogous:
[�
And finally,
[�
x
x
0
0
x
x
x
0
x
x
0
0
x
x
0
0
x
x
x
0
x
x
x
0
x
x
x
0
x
x
x
x
v
v
[�l
[�l
v
[H
[H

=
[� � � �1
0 0 x x '
0 0 0 x
v
[H
The pattern is clear, for i = l:n - 1, a Gi,i+l is used to zero the current Vi and an
Hi,i+l is used to zero the current ri+l,i· The overall transition from {Q, Z, R} to
{Qnew, Znew, Rnew} involves O(mn) flops. If the Givens rotations are kept in factored
form, this flop count is reduced to O(n2). We mention that the ideas in this subsection
can be iterated to develop matrix reductions that expose the structure of matrices
whose rank is less than n - 1. "Zero-chasing" with Givens rotations is at the heart of
many important matrix algorithms; see §6.3, §7.5, and §8.3.
5.4.6 The UTV Framework
As mentioned at the start of this section, we are interested in factorizations that are
cheaper than the SVD but which provide the same high quality information about rank,
range, and nullspace. Factorizations of this type are referred to as UTV factorizations
where the "T" stands for triangular and the "U" and "V" remind us of the SVD and
orthogonal U and V matrices of singular vectors.
The matrix T can be upper triangular (these are the URV factorizations) or
lower triangular (these are the ULV factorizations). It turns out that in a particular
application one may favor a URV approach over a ULV approach, see §6.3. More
over, the two reductions have different approximation properties. For example, sup
pose <7k(A) > O"k+l(A) and S is the subspace spanned by A's right singular vectors
Vk+i,. . . , vn. Think of S as an approximate nullspace of A. Following Stewart (1993),
if
UTAV =R = [ �1
�:: ]m�k
k n-k
and V = [ Vi I V2 ] is partitioned conformably, then
where
. ( ( ) S) II R12 112
dist ran l'2 , $
(l _
2 ) . (R )
Pn <1mm 11
II R22 112
pn =
<1min(R11)
is assumed to be less than 1. On the other hand, in the ULV setting we have
UTAV =L = [ Lu 0 ] k
L21 L22 m-k
k n-k
(5.4.10)

If V = [ Vi I V2 ] is partitioned conformably, then
where
dist(ran(V2), S) � II £12 112
PL (1 - PDCTmin(Lu)
II L22 ll2
PL =
CTmin(Lu)
283
(5.4.11)
is also assumed to be less than 1. However, in practice the p-factors in both (5.4.10)
and (5.4.11) are often much less than 1. Observe that when this is the case, the upper
bound in (5.4.11) is much smaller than the upper bound in (5.4.10).
5.4.7 Complete Orthogonal Decompositions
Related to the UTV framework is the idea of a complete orthogonal factorization. Here
we compute orthogonal U and V such that
UTAV =
[ Tu 0
0
] r
0 m-r
r n-r
(5.4.12)
where r
=
rank(A). The SVD is obviously an example of a decomposition that has
this structure. However, a cheaper, two-step QR process is also possible. We first use
Algorithm 5.4.1 to compute
r n-r
and then follow up with a second QR factorization
via Algorithm 5.2.1. If we set V = IIQ, then (5.4.12) is realized with Tu =
S'{. Note
that two important subspaces are defined by selected columns of U = [ u1 I · · ·
I Um ]
and V =
[ V1 I · · · I Vn ] :
ran(A) =
span{u1, . . . , Ur },
null(A) = span{Vr+ 1 , . . . , Vn }·
Of course, the computation of a complete orthogonal decomposition in practice would
require the careful handling of numerical rank.

5.4.B Bidiagonalization
There is one other two-sided orthogonal factorization that is important to discuss and
that is the bidiagonal factorization. It is not a rank-revealing factorization per se, but
it has a useful role to play because it rivals the SVD in terms of data compression.
Suppose A E 1Rmxn and m � n. The idea is to compute orthogonal Un (m-by-m)
and V8 (n-by-n) such that
d1 Ji 0 0
0 d2 h 0
U'{;AVB 0 dn-l fn-1 (5.4.13)
0 0 dn
0
Un = U1 · · · Un and V8 = Vi · · · Vn-2 can each be determined as a product of House
holder matrices, e.g.,
x
0
0
0
0
[�
x
0
0
0
0
x 0 0
x x 0
0 x x
0 x x
0 x x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
0
x
x
x
x
x
x
x
x
x
�I
x
0
� o0
0
x x
0 x
0 0
0 0
0 0
0
x
x
0
0
x
x
x
x
x
x
x
0
0
0
x
x
x
x
x
0
x
x
x
x
�I
0
x
x
x
x
In general, Uk introduces zeros into the kth column, while Vk zeros the appropriate
entries in row k. Overall we have:
Algorithm 5.4.2 (Householder Bidiagonalization) Given A E 1Rmxn with m � n, the
following algorithm overwrites A with U'{;AV8 = B where B is upper bidiagonal and
Un = U1 · · · Un and V8 = V1 · · · Vn-2· The essential part of Uj's Householder vector is
stored in A(j + l:m,j) and the essential part of Vj's Householder vector is stored in
A(j, j + 2:n).

for j = l:n
end
[v, ,B] = house(A(j:m,j))
A(j:m, j:n) = Um-;+1 - ,BvvT)A(j:m, j:n)
A(j + l:m, j) = v(2:m - j + 1)
if j � n - 2
end
[v, .BJ = house(A(j,j + l:n)T)
A(j:m,j + l:n) = A(j:m, j + l:n)(In-j - ,BvvT)
A(j,j + 2:n) = v(2:n - j)T
285
This algorithm requires 4mn2 - 4n3/3 flops. Such a technique is used by Golub and
Kahan (1965), where bidiagonalization is first described. If the matrices U8 and Vu
are explicitly desired, then they can be accumulated in 4m2n - 4n3/3 and 4n3/3 flops,
respectively. The bidiagonalization of A is related to the tridiagonalization of ATA.
See §8.3.1.
5.4.9 R-Bidiagonalization
If m » n, then a faster method of bidiagonalization method results if we upper trian
gularize A first before applying Algorithm 5.4.2. In particular, suppose we compute an
orthogonal Q E Rmx m such that
is upper triangular. We then bidiagonalize the square matrix Ri ,
where Un and Vu are orthogonal. If U8 = Q diag (Un, Im-
n), then
is a bidiagonalization of A.
The idea of computing the bidiagonalization in this manner is mentioned by
Lawson and Hanson (SLS, p. 119) and more fully analyzed by Chan (1982). We refer
to this method as R-bidiagonalization and it requires (2mn2 + 2n3) flops. This is less
than the flop count for Algorithm 5.4.2 whenever m � 5n/3.
Problems
PS.4.1 Let x, y E Rm and Q E R"" x m be given with Q orthogonal. Show that if
then uTv = xTy - afj.
QTx = [ Q ] 1
u m- 1 ' QTy = [ (3 ] 1
V m- 1

P5.4.2 Let A = [ a1 I · · · I an ) E Rmxn and b E Rm be given. For any column subset {ac1 , . . . , ack }
define
min II [ Uc1 I · · · I ack ) X - b 112
x E Rk
Describe an alternative pivot selection procedure for Algorithm 5.4.1 such that if QR All
[ ac1 I · · · I acn J in the final factorization, then for k = l:n:
min res ([aq , . . . , ack-l , ac.J) .
i � k
P5.4.3 Suppose T E Rnxn is upper triangular and tkk = lTmin(T). Show that T(l:k - 1, k) = 0 and
T(k, k + l:n) = 0.
P5.4.4 Suppose A E Rmxn with m 2 n. Give an algorithm that uses Householder matrices to
compute an orthogonal Q E Rmxm so that if QTA = L, then L(n + l:m, :) = 0 and L(l:n, l:n) is
lower triangular.
P5.4.5 Suppose R E Rnxn is upper triangular and Y E Rnxj has orthonormal columns and satisfies
II RY 112 = u. Give an algorithm that computes orthogonal U and V, each products of Givens rotations,
so that UTRV = Rnew is upper triangular and vTy = Ynew has the property that
Ynew(n -j + l:n, :) = diag(±l).
What can you say about Rncw(n - j + l:n, n - j + l:n)?
P5.4.6 Give an algorithm for reducing a complex matrix A to real bidiagonal form using complex
Householder transformations.
P5.4.7 Suppose B E Rnxn is upper bidiagonal with bnn = 0. Show how to construct orthogonal U
and V (product of Givens rotations) so that UTBV is upper bidiagonal with a zero nth column.
P5.4.8 Suppose A E Rmxn with m < n. Give an algorithm for computing the factorization
uTAv = [ B I O J
where B is an m-by-m upper bidiagonal matrix. (Hint: Obtain the form
x
x
0
0
0
x
x
0
0
0
x
x
0
0
0
x
using Householder matrices and then "chase" the (m, m+ 1) entry up the (m+ l)st column by applying
Givens rotations from the right.)
P5.4.9 Show how to efficiently bidiagonalize an n-by-n upper triangular matrix using Givens rotations.
P5.4.10 Show how to upper bidiagonalize a tridiagonal matrix T E Rnxn using Givens rotations.
P5.4.ll Show that if B E R'xn is an upper bidiagonal matrix having a repeated singular value, then
B must have a zero on its diagonal or superdiagonal.
QR with column pivoting was first discussed in:
P.A. Businger and G.H. Golub (1965). "Linear Least Squares Solutions by Householder Transforma-
tions," Numer. Math. 7, 269-276.
In matters that concern rank deficiency, it is helpful to obtain information about the smallest singular
value of the upper triangular matrix R. This can be done using the techniques of §3.5.4 or those that
are discussed in:
I. Karasalo (1974). "A Criterion for Truncation of the QR Decomposition Algorithm for the Singular
Linear Least Squares Problem,'' BIT 14, 156-166.
N. Anderson and I. Karasalo (1975). "On Computing Bounds for the Least Singular Value of a
Triangular Matrix,'' BIT 15, 1-4.

C.-T. Pan and P.T.P. Tang (1999). "Bounds on Singular Values Revealed by QR Factorizations," BIT
39, 740-756.
C.H. Bischof (1990). "Incremental Condition Estimation," SIAM J. Matrix Anal. Applic., 11, 312-
322.
Revealing the rank of a matrix through a carefully implementated factorization has prompted a great
deal of research, see:
T.F. Chan (1987). "Rank Revealing QR Factorizations," Lin. Alg. Applic. 88/8g, 67-82.
T.F. Chan and P. Hansen (1992). "Some Applications of the Rank Revealing QR Factorization,"
SIAM J. Sci. Stat. Comp. 13, 727-741.
S. Chandrasekaren and l.C.F. Ipsen (1994). "On Rank-Revealing Factorizations," SIAM J. Matrix
Anal. Applic. 15, 592-622.
M. Gu and S.C. Eisenstat (1996). "Efficient Algorithms for Computing a Strong Rank-Revealing QR
Factorization," SIAM J. Sci. Comput. 1 7, 848-869.
G.W. Stewart (1999). "The QLP Approximation to the Singular Value Decomposition," SIAM J. Sci.
Comput. 20, 1336-1348.
D.A. Huckaby and T.F. Chan {2005). "Stewart's Pivoted QLP Decomposition for Low-Rank Matri
ces," Num. Lin. Alg. Applic. 12, 153- 159.
A. Dax (2008). "Orthogonalization via Deflation: A Minimum Norm Approach to Low-Rank Approx
imation of a Matrix," SIAM J. Matrix Anal. Applic. 30, 236-260.
z. Drma.C and Z. Bujanovic (2008). "On the Failure of Rank-Revealing QR Factorization Software-A
Case Study," ACM Trans. Math. Softw. 35, Article 12.
We have more to say about the UTV framework in §6.5 where updating is discussed. Basic references
for what we cover in this section include:
G.W. Stewart (1993). "UTV Decompositions," in Numerical Analysis 1993, Proceedings of the 15th
Dundee Conference, June-July 1993, Longman Scientic & Technical, Harlow, Essex, UK, 225-236.
P.A. Yoon and J.L. Barlow (1998) "An Efficient Rank Detection Procedure for Modifying the ULV
Decomposition," BIT 38, 781-801.
J.L. Barlow, H. Erbay, and I. Slapnicar (2005). "An Alternative Algorithm for the Refinement of ULV
Decompositions," SIAM J. Matrix Anal. Applic. 27, 198-211.
Column-pivoting makes it more difficult to achieve high performance when computing the QR factor
ization. However, it can be done:
C.H. Bischof and P.C. Hansen (1992). "A Block Algorithm for Computing Rank-Revealing QR Fac
torizations," Numer. Algorithms 2, 371-392.
C.H. Bischof and G. Quintana-Orti (1998). "Computing Rank-revealing QR factorizations of Dense
Matrices,'' ACM Trans. Math. Softw. 24, 226-253.
C.H. Bischof and G. Quintana-Orti (1998). "Algorithm 782: Codes for Rank-Revealing QR factoriza
tions of Dense Matrices," A CM Trans. Math. Softw. 24, 254-257.
G. Quintana-Orti, X. Sun, and C.H. Bischof (1998). "A BLAS-3 Version ofthe QR Factorization with
Column Pivoting," SIAM J. Sci. Comput. 19, 1486-1494.
A carefully designed LU factorization can also be used to shed light on matrix rank:
T-M. Hwang, W-W. Lin, and E.K. Yang (1992). "Rank-Revealing LU Factorizations," Lin. Alg.
Applic. 175, 115-141.
T.-M. Hwang, W.-W. Lin and D. Pierce (1997). "Improved Bound for Rank Revealing LU Factoriza
tions," Lin. Alg. Applic. 261, 173-186.
L. Miranian and M. Gu (2003). "Strong Rank Revealing LU Factorizations,'' Lin. Alg. Applic. 367,
1-16.
Column pivoting can be incorporated into the modified Gram-Schmidt process, see:
A. Dax (2000). "A Modified Gram-Schmidt Algorithm with Iterative Orthogonalization and Column
Pivoting," Lin. Alg. Applic. 310, 25-42.
M. Wei and Q. Liu (2003). "Roundoff Error Estimates of the Modified GramSchmidt Algorithm with
Column Pivoting," BIT 43, 627-645.
Aspects of the complete orthogonal decomposition are discussed in:

R.J. Hanson and C.L. Lawson (1969). "Extensions and Applications of the Householder Algorithm
for Solving Linear Least Square Problems," Math. Comput. 23, 787-812.
P.A. Wedin (1973). "On the Almost Rank-Deficient Case of the Least Squares Problem," BIT 13,
344-354.
G.H. Golub and V. Pereyra (1976). "Differentiation of Pseudo-Inverses, Separable Nonlinear Least
Squares Problems and Other Tales," in Generalized Inverses and Applications, M.Z. Nashed (ed.),
Academic Press, New York, 303-324.
The quality of the subspaces that are exposed through a complete orthogonal decomposition are
analyzed in:
R.D. Fierro and J.R. Bunch (1995). "Bounding the Subspaces from Rank Revealing Two-Sided Or
thogonal Decompositions," SIAM J. Matrix Anal. Applic. 16, 743-759.
R.D. Fierro (1996). "Perturbation Analysis forTwo-Sided (or Complete) Orthogonal Decompositions,"
The bidiagonalization is a particularly important decomposition because it typically precedes the
computation of the SVD as we discuss in §8.6. Thus, there has been a strong research interest in its
efficient and accurate computation:
B. Lang (1996). "Parallel Reduction of Banded Matrices to Bidiagonal Form,'' Parallel Comput. 22,
1-18.
J.L. Barlow (2002). "More Accurate Bidiagonal Reduction for Computing the Singular Value Decom
position," SIAM J. Matrix Anal. Applic. 23, 761-798.
J.L. Barlow, N. Bosner and Z. Drmal (2005). "A New Stable Bidiagonal Reduction Algorithm," Lin.
Alg. Applic. 397, 35-84.
B.N. Parlett (2005). "A Bidiagonal Matrix Determines Its Hyperbolic SVD to Varied Relative Accu
racy," SIAM J. Matrix Anal. Applic. 26, 1022-1057.
N. Bosner and J.L. Barlow (2007). "Block and Parallel Versions of One-Sided Bidiagonalization,''
G.W. Howell, J.W. Demmel, C.T. Fulton, S. Hammarling, and K. Marmol (2008). "Cache Efficient
Bidiagonalization Using BLAS 2.5 Operators," ACM TI-ans. Math. Softw. 34, Article 14.
H. Ltaief, J. Kurzak, and J. Dongarra (2010). "Parallel Two-Sided Matrix Reduction to Band Bidi
agonal Form on Multicorc Architectures," IEEE TI-ans. Parallel Distrib. Syst. 21, 417-423.
5.5 The Rank-Deficient Least Squares Problem
If A is rank deficient, then there are an infinite number of solutions to the LS problem.
We must resort to techniques that incorporate numerical rank determination and iden
tify a particular solution as "special." In this section we focus on using the SVD to
compute the minimum norm solution and QR-with-column-pivoting to compute what
is called the basic solution. Both of these approaches have their merits and we conclude
with a subset selection procedure that combines their positive attributes.
5.5.1 The Minimum Norm Solution
Suppose A E Rmxn and rank(A) = r < n. The rank-deficient LS problem has an
infinite number of solutions, for if x is a minimizer and z E null(A), then x + z is also
a minimizer. The set of all minimizers
X = {x E Rn : 11 Ax - b 112 = min }
is convex and so if x1 , X2 E X and >. E (0, 1], then
min 11 Ax - b 112 ·
xERn

5.5. The Rank-Deficient Least Squares Problem 289
Thus, AX1+ (1 - A)x2 E X. It follows that X has a unique element having minimum
2-norm and we denote this solution by XLs· (Note that in the full-rank case, there is
only one LS solution and so it must have minimal 2-norm. Thus, we are consistent
with the notation in §5.3.)
Any complete orthogonal factorization (§5.4.7) can be used to compute XLs· In
particular, if Q and Z are orthogonal matrices such that
then
where
[Tu o ] r
0 O m-r
r n-r
, r = rank(A)
II Ax - b II� = II (QTAZ)ZTx - QTb II� = II Tuw- c II� + II d II�
zTx =
[ W ] r
y n-r
Clearly, if x is to minimize the sum of squares, then we must have w = Ti11c. For x to
have minimal 2-norm, y must be zero, and thus
11 c
[T-1 l
XLs = Z
Q
•
Of course, the SYD is a particularly revealing complete orthogonal decomposition.
It provides a neat expression for XLs and the norm of the minimum residual PLs =
II Ax1,s - b 112•
Theorem 5.5.1. Suppose UTAV = :E is the SVD ofA E :nrxn with r = rank{A). If
U = [ u1I · · · I Um] and V = [ V1 I · · · I Vn] are column partitionings and b E Rm, then
r u'!'b
XLs = L -'-vi
i=l <Ti
(5.5.1)
minimizes II Ax - b 112 and has the smallest 2-norm of all minimizers. Moreover
m
P�s = II AxLs - b II� = L (ufb)2•
i=r+l
Proof. For any x E Rnwe have
II Ax - b II� = II (UTAV)(VTx) - uTb II� = II :Ea - UTb II�
r m
= L(aiai - ufb)2 + L (ufb)2,
i=l i=r+l
(5.5.2)
where a = VTx. Clearly, if x solves the LS problem, then ai = (ufb/ai) for i = l:r. If
we set a(r + l:n) = 0, then the resulting x has minimal 2-norm. D

5.5.2 A Note on the Pseudoinverse
If we define the matrix A+ E Rnxm by A+ = VE+uT where
'("I
+ d
" ( 1 1
) mnxm
LI = iag -, . . . , -, O, . . . , O E JC, ,
0'1 O'r
r = rank(A),
then XLs = A+b and PLs = II (J - AA+)b 112· A+ is referred to as the pseudo-inverse
of A. It is the unique minimal Frobenius norm solution to the problem
min II AX - Im llr
XERmXn
(5.5.3)
If rank(A) = n, then A+ = (ATA)-1AT, while if m = n = rank(A), then A+ = A-1 •
Typically, A+ is defined to be the unique matrix X E Rnxm that satisfies the four
Moore-Penrose conditions:
(i) AXA = A,
(ii) XAX = X,
(iii), (AX)T = AX,
(iv) (XA)T = XA.
These conditions amount to the requirement that AA+ and A+A be orthogonal pro
jections onto ran(A) and ran(AT), respectively. Indeed,
AA+ = U1U[
where U1 = U(l:m, l:r) and
A+A = Vi Vt
where Vi = V(l:n, l:r).
5.5.3 Some Sensitivity Issues
In §5.3 we examined the sensitivity of the full-rank LS problem. The behavior of XLs
in this situation is summarized in Theorem 5.3.1. If we drop the full-rank assumption,
then xLs is not even a continuous function of the data and small changes in A and
b can induce arbitrarily large changes in xLs = A+b . The easiest way to see this is
to consider the behavior of the pseudoinverse. If A and 8A are in Rmxn, then Wedin
(1973) and Stewart (1975) show that
ll (A + 8A)+ - A+ llF $ 2ll 8A llF max { ll A+ ll� , ll (A + 8A)+ 11n .
This inequality is a generalization of Theorem 2.3.4 in which perturbations in the
matrix inverse are bounded. However, unlike the square nonsingular case, the upper
bound does not necessarily tend to zero as 8A tends to zero. If
and
then
and (A + 8A)+ = [ 1
1
0 0 ]
l/€ 0 '

5.5. The Rank-Deficient Least Squares Problem
and
II A+ - (A + 8A)+ 112 = 1/f.
291
The numerical determination ofan LS minimizer in the presence ofsuch discontinuities
is a major challenge.
5.5.4 The Truncated SVD Solution
Suppose ff, E, and V are the computed SVD factors of a matrix A and r is accepted
as its 8-rank, i.e.,
un :::; · · · :::; u;- :::; 8 < a;- :::; · · · :::; u1.
It follows that we can regard
f ATb
x;- = L u� Vi
i=l O'i
as an approximation to XLs · Since II x;- 112 � 1/u;- :::; 1/8, then 8 may also be chosen
with the intention of producing an approximate LS solution with suitably small norm.
In §6.2.1, we discuss more sophisticated methods for doing this.
If a;- » 8, then we have reason to be comfortable with x;- because A can then be
unambiguously regarded as a rank(A;-) matrix (modulo 8).
On the other hand, {u1, . . . ,Un} might not clearly split into subsets of small and
large singular values, making the determination of r by this means somewhat arbitrary.
This leads to more complicated methods for estimating rank, which we now discuss in
the context of the LS problem. The issues are readily communicated by making two
simplifying assumptions. Assume that r = n, and that LlA = 0 in (5.4.4), which
implies that WTAZ = E = I:: is the SVD. Denote the ith columns of the matrices ff,
W, V, and Z by Ui, Wi, Vi, and Zi, respectively. Because
n Tb f ATb
xLs - x;- = L wi Zi - L
ui Vi
i=l O'i i=l O'i
=
t ((wi - ui)Tb)zi + (ufb)(zi - vi) +
t wfbzi
i=l O'i i=f+l O'i
it follows from II Wi - Ui 112 :::; f, II Ui 112 :::; 1 + f, and II Zi - Vi 112 :::; f that
n ( Tb)2
:L: � .
i=r+l O'i
The parameter r can be determined as that integer which minimizes the upper bound.
Notice that the first term in the bound increases with r, while the second decreases.
On occasions when minimizing the residual is more important than accuracy in
the solution, we can determine r on the basis of how close we surmise II b - Ax;- 112 is
to the true minimum. Paralleling the above analysis, it can be shown that
II b - Ax;- 112 :::; II b - AxLs 112 + (n - r) ll b 112 + frll b 112 (1 + (1 + €) ::).
Again r could be chosen to minimize the upper bound. See Varah (1973) for practical
details and also LAPACK.

5.5.5 Basic Solutions via QR with Column Pivoting
Suppose A E Rmxn has rank r. QR with column pivoting (Algorithm 5.4.1} produces
the factorization Aii = QR where
r n-r
Given this reduction, the LS problem can be readily solved. Indeed, for any x E Rn
we have
II Ruy - (c - R12z) 11� + 11 d II�.
where
[ � ] r r
n-r m-r
Thus, if x is an LS minimizer, then we must have
If z is set to zero in this expression, then we obtain the basic solution
Xn = II .
[
R"]}
c
l
0
Notice that Xn has at most r nonzero components and so Axn involves a subset of A's
columns.
The basic solution is not the minimal 2-norm solution unless the submatrix R12
is zero since
II
[R"]}
R12
l II
II XLs 112 = min xB - II _
z
zER"-2 In-r 2
Indeed, this characterization of II XLs 112 can be used to show that
1 < II XB 112 < . 11 + II R-1R 112
- II XLs 112 - V 11 12 2 .
See Golub and Pereyra (1976) for details.
5.5.6 Some Comparisons
(5.5.4}
(5.5.5}
As we mentioned, when solving the LS problem via the SVD, only E and V have to be
computed assuming that the right hand side b is available. The table in Figure 5.5.1
compares the flop efficiency of this approach with the other algorithms that we have
presented.

LS Algorithm
Normal equations
Householder QR
Modified Gram-Schmidt
Givens QR
Householder Bidiagonalization
R-Bidiagonalization
SVD
R-SVD
Flop Count
mn2 + n3/3
n3/3
2mn2
3mn2 - n3
4mn2 - 2n3
2mn2 + 2n3
4mn2 + 8n3
2mn2 + lln3
Figure 5.5.1. Flops associated with various least squares methods
5.5.7 SVD-Based Subset Selection
293
Replacing A by Ar in the LS problem amounts to filtering the small singular values
and can make a great deal of sense in those situations where A is derived from noisy
data. In other applications, however, rank deficiency implies redundancy among the
factors that comprise the underlying model. In this case, the model-builder may not be
interested in a predictor such as ArXr that involves all n redundant factors. Instead, a
predictor Ay may be sought where y has at most r nonzero components. The position
of the nonzero entries determines which columns of A, i.e., which factors in the model,
are to be used in approximating the observation vector b. How to pick these columns
is the problem of subset selection.
QR with column pivoting is one way to proceed. However, Golub, Klema, and
Stewart (1976) have suggested a technique that heuristically identifies a more indepen
dent set of columns than arc involved in the predictor Ax8• The method involves both
the SVD and QR with column pivoting:
Step 1. Compute the SVD A = UEVT and use it to determine
a rank estimate r.
Step 2. Calculate a permutation matrix P such that the columns of the
matrix Bi E Rmxr in AP = [ Bi I B2 ] are "sufficiently independent."
Step 3. Predict b with Ay where y = P [ � ] and z E R" minimizes II Biz - b112-
The second step is key. Because
min II Biz - b112
zERr
II Ay - b 112 > min II Ax - b 112
xER"
it can be argued that the permutation P should be chosen to make the residual r =
(I - BiBt}b as small as possible. Unfortunately, such a solution procedure can be

unstable. For example, if
f = 2, and P = I, then min II Biz - b 112 = 0, but II Bib 112 = 0(1/€). On the other
hand, any proper subset involving the third column of A is strongly independent but
renders a much larger residual.
This example shows that there can be a trade-off between the independence of
the chosen columns and the norm of the residual that they render. How to proceed in
the face of this trade-off requires useful bounds on u;:(Bi), the smallest singular value
of Bi .
Theorem 5.5.2. Let the SVD of A E Rmxn be given by UTAV = E = diag(ui) and
define the matrix Bi E Rmxr, f � ran k(A), by
where P E Rnxn is a permutation. If
r n-T
pTy = [ �: �: ]n�i'
and Vu is nonsingular, then
i' n-i'
(5.5.6)
Proof. The upper bound follows from Corollary 2.4.4. To establish the lower bound,
partition the diagonal matrix of singular values as follows:
i' n-i'
If w E R" is a unit vector with the property that II Biw 112 = u;:(Bi), then
u;:(Bi)2 = ll Biw ll� = llUEVTP [ � Jll: = ll EiVi}w ll� + ll E2Vi�w ll�·
The theorem now follows because II EiVi}w 112 2:: u;:(A)/11 Vii
i
112. D
This result suggests that in the interest of obtaining a sufficiently independent subset
of columns, we choose the permutation P such that the resulting Vu submatrix is as

5.5. The Rank-Deficient Least Squares Problem 295
well-conditioned as possible. A heuristic solution to this problem can be obtained by
computing the QR with column-pivoting factorization of the matrix [ V11 V21 J, where
f n-f
is a partitioning of the matrix V, A's matrix of right singular vectors. In particular, if
we apply QR with column pivoting (Algorithm 5.4.1) to compute
QT[ V11 V21 ]P = [ Ru I Ri2]
f n-f
where Q is orthogonal, P is a permutation matrix, and Ru is upper triangular, then
(5.5.6) implies
Note that Ru is nonsingular and that II Vi11 112 = II K]} 112. Heuristically, column
pivoting tends to produce a well-conditioned Ru, and so the overall process tends to
produce a well-conditioned Vu.
Algorithm 5.5.1 Given A E n:rxn and bE IEr the following algorithm computes a
permutation P, a rank estimate r, and a vector z E IRf' such that the first r columns
of B = AP are independent and II B(:, l:r)z -b112 is minimized.
Compute the SVD urAV = diag(ai, . . . , an) and save V.
Determine r :::; rank(A) .
Apply QR with column pivoting: QTV(:, l:f)TP = [ R11 I R12 ] and set
AP = [ B1 I B2 ] with B1 E IRmxr
and B2 E IRmx {n-fl .
Determine z E It such that II b-B1z 112 = min.
5.5.8 Column Independence Versus Residual Size
We return to the discussion of the trade-off between column independence and norm
of the residual. In particular, to assess the above method of subset selection we need
to examine the residual of the vector y that it produces
Here, B1 = B(:, l:r) with B = AP. To this end, it is appropriate to compare ry with
rx;- = b-Axr
since we are regarding A as a rank-r matrix and since Xr solves the nearest rank-r LS
problem min II Arx -b112-

Theorem 5.5.3. Assume that urAV = � is the SVD ofA E IRmxn and that ry and
rx,, are defined as above. If V11 is the leading r-by-r principal submatrix ofpTv, then
II II
ar+i(A)
II V:-1 II II b II
rx;. - ry 2 :5
ar(A) 11 2 2·
Proof. Note that rx;- = (I - U1U'{)b and ry = (I - Q1Q[)b where
r m-T
is a partitioning of the matrix U and Q1 = B1(BfB1)-112• Using Theorem 2.6.1 we
obtain
II rx,, - ry 112 :5 II U1U[ - QiQT 112 II b 112 = II U'[Q1 112 II b 112
while Theorem 5.5.2 permits us to conclude
1
< ar+1(A)ar(B1)
and this establishes the theorem. 0
Noting that
II r., - r, II, � ,,B,y - t,(ufb)ul
we see that Theorem 5.5.3 sheds light on how well B1y can predict the "stable" compo
nent of b, i.e., U[b. Any attempt to approximate U'{b can lead to a large norm solution.
Moreover, the theorem says that if ar+i(A) « ar(A), then any reasonably independent
subset of columns produces essentially the same-sized residual. On the other hand, if
there is no well-defined gap in the singular values, then the determination of r becomes
difficult and the entire subset selection problem becomes more complicated.
Problems
P5.5.1 Show that if
A = [ T S ] r
0 0 m-r
r n-r
where r = rank(A) and Tis nonsingular, then
X
= [ T�l � ]n�r
7· m-r
satisfies AXA = A and (AX)T = (AX). In this case, we say that X is a (1,3) pseudoinverse of A.
Show that for general A, x n = Xb where X is a (1,3) pseudoinverse of A.
P5.5.2 Define B(>.) E Rnxm by

where .>. > 0. Show that
II B(.>.) - A+ 112 O'r(A)[ar(A)2 + .>.] '
and therefore that B(.>.) -+ A+ as .>. -+ 0.
P5.5.3 Consider the rank-deficient LS problem
297
r = rank(A),
where R E wxr, S E wxn-r, y E Rr, and z E Rn-r. Assume that R is upper triangular and nonsin
gular. Show how to obtain the minimum norm solution to this problem by computing an appropriate
QR factorization without pivoting and then solving for the appropriate y and z.
P5.5.4 Show that if Ak -+ A and At -+ A+, then there exists an integer ko such that rank(Ak) is
constant for all k :'.:'. ko.
P5.5.5 Show that if A E Rmxn has rank n, then so does A + E if 11 E lbll A+ 112 < 1.
P5.5.6 Suppose A E Rmxn is rank deficient and b E R"'. Assume for k = 0, 1, . . . that x<k+l) mini-
mizes
<l>k(x) = II Ax - b 11� + .>.II x - x<kJ 11;
where .>. > 0 and xC0l = O. Show that x(k) -+ XLS·
P5.5.8 Suppose A E Rmxn and that II uTA lb = er with uTu = 1. Show that if uT(Ax - b) = 0 for
x E Rn and b E Rm, then II x 112 2'. luTbl/a.
P5.5.9 In Equation (5.5.6) we know that the matrix pTv is orthogonal. Thus, II V1J:1 112 = II V221 112
from the CS decomposition (Theorem 2.5.3). Show how to compute P by applying the QR with
column-pivoting algorithm to [ V2� Ivl�]. (For i' > n/2, this procedure would be more economical than
the technique discussed in the text.) Incorporate this observation in Algorithm 5.5.1.
P5.5.10 Suppose F E R""xr and G E Rnxr each have rank r. (a) Give an efficient algorithm for
computing the minimum 2-norm minimizer of II FGTx - b 112 where b E Rm. (b) Show how to compute
the vector x8 .
For a comprehensive treatment of the pseudoinverse and its manipulation, see:
M.Z. Na.shed (1976). Generalized Inverses and Applications, Academic Press, New York.
S.L. Campbell and C.D. Meyer (2009). Generalized Inverses of Linear Transformations, SIAM Pub-
lications, Philadelphia, PA.
For an analysis of how the pseudo-inverse is affected by perturbation, sec:
P.A. Wedin (1973). "Perturbation Theory for Pseudo-Inverses," BIT 13, 217-232.
G.W. Stewart (1977). "On the Perturbation ofPseudo-Inverses, Projections, and Linear Least Squares,"
Even for full rank problems, column pivoting seems to produce more accurate solutions. The error
analysis in the following paper attempts to explain why:
L.S. Jennings and M.R. Osborne (1974). "A Direct Error Analysis for Least Squares," Numer. Math.
22, 322-332.
Various other aspects of the rank-deficient least squares problem are discussed in:
J.M. Varah (1973). "On the Numerical Solution of Ill-Conditioned Linear Systems with Applications
to Ill-Posed Problems," SIAM J. Numer. Anal. 1 0, 257-67.
G.W. Stewart (1984). "Rank Degeneracy," SIAM J. Sci. Stat. Comput. 5, 403-413.
P.C. Hansen (1987). "The Truncated SVD as a Method for Regularization," BIT 27, 534-553.
G.W. Stewart (1987). "Collinearity and Least Squares Regression," Stat. Sci. 2, 68-100.

R.D. Fierro and P.C. Hansen (1995). "Accuracy of TSVD Solutions Computed from Rank-Revealing
Decompositions," Nu.mer. Math. 70, 453-472. 1
P.C. Hansen (1997). Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear
Inversion, SIAM Publications, Philadelphia, PA.
A. Dax and L. Elden (1998). "Approximating Minimum Norm Solutions of Rank-Deficient Least
Squares Problems," Nu.mer. Lin. Alg. 5, 79-99.
G. Quintana-Orti, E.S. Quintana-Orti, and A. Petitet (1998). "Efficient Solution of the Rank-Deficient
Linear Least Squares Problem," SIAM J. Sci. Comput. 20, 1155-1163.
L.V. Foster (2003). "Solving Rank-Deficient and Ill-posed Problems Using UTV and QR Factoriza
tions," SIAM J. Matrix Anal. Applic. 25, 582-600.
D.A. Huckaby and T.F. Chan (2004). "Stewart's Pivoted QLP Decomposition for Low-Rank Matri
ces," Nu.mer. Lin. Alg. 12, 153-159.
L. Foster and R. Kommu (2006). "Algorithm 853: An Efficient Algorithm for Solving Rank-Deficient
Least Squares Problems," ACM TI-ans. Math. Softw. .?2, 157-165.
For a sampling of the subset selection literature, we refer the reader to:
H. Hotelling (1957). "The Relations ofthe Newer Multivariate Statistical Methods to Factor Analysis,"
Brit. J. Stat. Psych. 10, 69-79.
G.H. Golub, V. Klema and G.W. Stewart (1976). "Rank Degeneracy and Least Squares Problems,"
Technical Report TR-456, Department of Computer Science, University of Maryland, College Park,
MD.
S. Van Huffel and J. Vandewalle (1987). "Subset Selection Using the Total Least Squares Approach
in Collinearity Problems with Errors in the Variables," Lin. Alg. Applic. 88/89, 695-714.
M.R. Osborne, B. Presnell, and B.A. Turla.ch (2000}. "A New Approach to Variable Selection in Least
Squares Problems,'' IMA J. Nu.mer. Anal. 20, 389-403.
5.6 Square and Underdetermined Systems
The orthogonalization methods developed in this chapter can be applied to square
systems and also to systems in which there are fewer equations than unknowns. In this
brief section we examine the various possibilities.
5.6.1 Square Systems
The least squares solvers based on the QR factorization and the SVD can also be used
to solve square linear systems. Figure 5.6.1 compares the associated flop counts. It is
Method Flops
Gaussian elimination 2n3/3
Householder QR 4n3/3
Modified Gram-Schmidt 2n3
Singular value decomposition 12n3
Figure 5.6.1. Flops associated with various methods for square linear systems
assumed that the right-hand side is available at the time of factorization. Although
Gaussian elimination involves the least amount of arithmetic, there are three reasons
why an orthogonalization method might be considered:

5.6. Square and Underdetermined Systems 299
• The flop counts tend to exaggerate the Gaussian elimination advantage. When
memory traffic and vectorization overheads are considered, the QR approach is
comparable in efficiency.
• The orthogonalization methods have guaranteed stability; there is no "growth
factor" to worry about as in Gaussian elimination.
• In cases of ill-conditioning, the orthogonal methods give an added measure of
reliability. QR with condition estimation is very dependable and, of course, SVD
is unsurpassed when it comes to producing a meaningful solution to a nearly
singular system.
We are not expressing a strong preference for orthogonalization methods but merely
suggesting viable alternatives to Gaussian elimination.
We also mention that the SVD entry in the above table assumes the availability
of b at the time of decomposition. Otherwise, 20n3 flops are required because it then
becomes necessary to accumulate the U matrix.
If the QR factorization is used to solve Ax = b, then we ordinarily have to carry
out a back substitution: Rx = QTb. However, this can be avoided by "preprocessing"
b. Suppose H is a Householder matrix such that Hb = /Jen where en is the last column
of In. If we compute the QR factorization of (HA)T, then A = HTRTQT and the
system transforms to
RTy = /Jen
where y = QTx. Since RT is lower triangular, y = (/3/rnn)en and so
/3
x = -Q(:, n).
Tnn
5.6.2 Underdetermined Systems
In §3.4.8 we discussed how Gaussian elimination with either complete pivoting or rook
pivoting can be used to solve a full-rank, underdetermincd linear system
Ax = b, (5.6.1)
Various orthogonal factorizations can also be used to solve this problem. Notice that
(5.6.1) either has no solution or has an infinity of solutions. In the second case, it is
important to distinguish between algorithms that find the minimum 2-norm solution
and those that do not. The first algorithm we present is in the latter category.
Assume that A has full row rank and that we apply QR with column pivoting to
obtain
QTAii = [ R1 I R2 ]
where R1 E 1Rmxm is upper triangular and R2 E 1Rmx (n-rn) _ Thus, Ax = b transforms
to
where

with z1 E lRm and z2 E lR(n-m) . By virtue of the column pivoting, R1 is nonsingular
because we are assuming that A has full row rank. One solution to the problem is
therefore obtained by setting z1 = R"11QTb and z2 = 0.
Algorithm 5.6.1 Given A E lRmxn with rank(A) = m and b E lRm, the following
algorithm finds an x E JR" such that Ax = b.
Compute QR-with-column-pivoting factorization: QTAII = R.
Solve R(l:m, l:m)z1 = QTb.
Set x = II [� l·
This algorithm requires 2m2n - m3/3 flops. The minimum norm solution is not guar
anteed. (A different II could render a smaller zi-) However, if we compute the QR
factorization
AT = QR = Q [�1
l
with Ri E lRmxm, then Ax = b becomes
where
In this case the minimum norm solution does follow by setting z2 = 0.
Algorithm 5.6.2 Given A E lRmxn with rank(A) = m and b E lRm, the following algo
rithm finds the minimum 2-norm solution to Ax = b.
Compute the QR factorization AT = QR.
Solve R(l:m, l:m)Tz = b.
Set x = Q(:, l:m)z.
This algorithm requires at most 2m2n - 2m3/3 flops.
The SVD can also be used to compute the minimum norm solution of an under
determined Ax = b problem. If
is the SVD of A, then
r
A = L aiuiv"[, r = rank(A)
i=l
� ufb
X = � -Vi·
i=l ai
As in the least squares problem, the SVD approach is desirable if A is nearly rank
deficient.

5.6. Square and Underdetermined Systems
5.6.3 Perturbed Underdetermined Systems
301
We conclude this section with a perturbation result for full-rank underdetermined sys
tems.
Theorem 5.6.1. Suppose rank(A) = m :::; n and that A E JR.mxn, oA E JR.mxn, O =f
b E 1R.m, and ob E 1R.m satisfy
€ = max{t:A, t:b} < O"m(A),
where €A = II oA 112/ll A 112 and fb = II ob 112/ll b 112· If x and x are minimum norm
solutions that satisfy
then
Ax = b, (A + oA)x = b + ob,
< K2(A) (t:A min{2, n - m + 1} + fb ) + O(t:2).
Proof. Let E and f be defined by oA/t: and ob/t:. Note that rank(A + tE) = m for all
0 < t < " and that
x(t) = (A + tE)T ((A + tE)(A + tE)r) -1
(b+ tf)
satisfies (A+tE)x(t) = b+tf. By differentiating this expression with respect to t and
setting t = 0 in the result we obtain
Because
and
we have
II X 112 = IIAT(AAT)-1b 112 2:: O"m(A)ll (AAT)-1b 1121
III - AT(AAT)-1A 112 = min(l, n - m),
ll x - x ll2 =
x(t:) - x(O)
= "11 ±(0) 112 + O(t:2)
II x 112 II x(o) 112 11 x 112
. ){IIE 112 II f 112 II E 112 } 2
:::; € mm(l, n - m
IIA 112 +
TiblG'
+ II A 112
K2(A) + O(t: ),
from which the theorem follows. 0
Note that there is no K2(A)2 factor as in the case of overdetermined systems.
Problems
PS.6.1 Derive equation (5.6.2).
PS.6.2 Find the minimal norm solution to the system Ax = b where A = [ 1 2 3 ] and b = 1.
(5.6.2)
PS.6.3 Show how triangular system solving can be avoided when using the QR factorization to solve
an underdetermined system.
PS.6.4 Suppose b, x E Rn are given and consider the following problems:

(a) Find an unsymmetric Toeplitz matrix T so Tx = b.
(b) Find a symmetric Toeplitz matrix T so Tx = b.
(c) Find a circulant matrix C so Ox = b.
Pose each problem in the form Ap = b where A is a matrix made up of entries from x and p is the
vector of sought-after parameters.
For an analysis of linear equation solving via QR, see:
N.J. Higham (1991}. "Iterative Refinement Enhances the Stability of QR Factorization Methods for
Solving Linear Equations," BIT 31, 447-468.
Interesting aspects concerning singular systems are discussed in:
T.F. Chan (1984}. "Deflated Decomposition Solutions of Nearly Singular Systems," SIAM J. Nu.mer.
Anal. 21, 738-754.
Papers concerned with underdetermined systems include:
R.E. Cline and R.J. Plemmons (1976). "L2-Solutions to Underdetermined Linear Systems," SIAM
Review 18, 92-106.
M.G. Cox (1981}. "The Least Squares Solution of Overdetermined Linear Equations having Band or
Augmented Band Structure," IMA J. Nu.mer. Anal. 1, 3-22.
M. Arioli and A. Laratta (1985). "Error Analysis of an Algorithm for Solving an Underdetermined
System," Nu.mer. Math. 46, 255-268.
J.W. Demmel and N.J. Higham (1993). "Improved Error Bounds for Underdetermined System
Solvers," SIAM J. Matrix Anal. Applic. 14, 1-14.
S. Joka.r and M.E. Pfetsch (2008}. "Exact and Approximate Sparse Solutions of Underdetermined
Linear Equations," SIAM J. Sci. Comput. 31, 23-44.
The central matrix problem in the emerging field of compressed sensing is to solve an underdetermined
system Ax = b such that the I-norm of x is minimized, see:
E. Candes, J. Romberg, and T. Tao (2006}. "Robust Uncertainty Principles: Exact Signal Recon
struction from Highly Incomplete Frequency Information," IEEE Trans. Information Theory 52,
489-509.
D. Donoho (2006}. "Compressed Sensing," IEEE Trans. Information Theory 52, 1289-1306.
This strategy tends to produce a highly sparse solution vector x.

Chapter 6
Modified Least Squares
Problems and Methods
6.1 Weighting and Regularization
6.2 Constrained Least Squares
6.3 Total Least Squares
6.4 Subspace Computations with the SVD
6.5 Updating Matrix Factorizations
In this chapter we discuss an assortment of least square problems that can be
solved using QR and SVD. We also introduce a generalization of the SVD that can
be used to simultaneously diagonalize a pair of matrices, a maneuver that is useful in
certain applications.
The first three sections deal with variations of the ordinary least squares problem
that we treated in Chapter 5. The unconstrained minimization of II Ax - b lb does not
always make a great deal of sense. How do we balance the importance of each equation
in Ax = b? How might we control the size of x if A is ill-conditioned? How might we
minimize II Ax - b 112 over a proper subspace of 1R11? What if there are errors in the
"data matrix" A in addition to the usual errors in the "vector of observations" b?
In §6.4 we consider a number of multidimensional subspace computations includ
ing the problem of determining the principal angles between a pair of given subspaces.
The SVD plays a prominent role.
The final section is concerned with the updating of matrix factorizations. In many
applications, one is confronted with a succession of least squares (or linear equation)
problems where the matrix associated with the current step is highly related to the
matrix associated with the previous step. This opens the door to updating strategies
that can reduce factorization overheads by an order of magnitude.
Reading Notes
Knowledge of Chapter 5 is assumed. The sections in this chapter are independent
of each other except that §6.1 should be read before §6.2. Excellent global references
include Bjorck (NMLS) and Lawson and Hansen (SLS).
303

304 Chapter 6. Modified Least Squares Problems and Methods
6.1 Weighting and Regularization
We consider two basic modifications to the linear least squares problem. The first
concerns how much each equation "counts" in the II Ax - b 112 minimization. Some
equations may be more important than others and there are ways to produce approx
imate minimzers that reflect this. Another situation arises when A is ill-conditioned.
Instead of minimizing II Ax - b 112 with a possibly wild, large norm x-vector, we settle
for a predictor Ax in which x is "nice" according to some regularizing metric.
6.1.1 Row Weighting
In ordinary least squares, the minimization of II Ax - b 112 amounts to minimizing the
sum of the squared discrepancies in each equation:
m
II Ax - b 112 = L (afx - bi)2
•
i=l
We assume that A E R.m x n
, b E R.m
, and af = A(i, :). In the weighted least squares
problem the discrepancies are scaled and we solve
m
min ll D(Ax - b) ll2 = min Ld� (afx - bi)2
xER" xERn i= l
(6.1.1)
where D = diag(di , . . . , dm) is nonsingular. Note that if Xv minimizes this summation,
then it minimizes II Ax - b 112 where A = DA and b = Db. Although there can be
numerical issues associated with disparate weight values, it is generally possible to
solve the weighted least squares problem by applying any Chapter 5 method to the
"tilde problem." For example, if A has full column rank and we apply the method of
normal equations, then we are led to the following positive definite system:
(ATD2A)xv = ATD2b. (6.1.2)
Subtracting the unweighted system ATAxLs = ATb we see that
Xv - X1.s = (ATD2A)-1AT(D2 - I)(b - AxL5). (6.1.3)
Note that weighting has less effect if b is almost in the range of A.
At the component level, increasing dk relative to the other weights stresses the
importance of the kth equation and the resulting residual r = b - Axv tends to be
smaller in that component. To make this precise, define
D(8) = diag(d1, . . . , dk-t. dk v'f+1 , dk+l• . . . , dm)
where 8 > -1. Assume that x(8) minimizes II D(8)(Ax - b) 112 and set
rk(8) = e[(b - Ax(8)) = bk - ak(ATD(8)2A)-1ATD(8)2b
where ek = Im(:, k). We show that the penalty for disagreement between a[x and bk
increases with 8. Since

6.1. Weighting and Regularization
and
:8 [(ATD(8)2A)-1] = -(ATD(8)2A)-1(AT(d%eke[}A)(ATD(8)2A)-1,
it can be shown that
d
d8rk(8) = -d% (ak(ATD(8)2A)-1ak) rk(8).
305
{6.1.4)
Assuming that A has full rank, the matrix (ATD(8)A)-1 is positive definite and so
�[rk(8)2] = 2rk(8) · �rk(8) = -2d% (ak(ATD(8)2A)-1ak) rk(8)2 < 0.
It follows that lrk(8)1 is a monotone decreasing function of 8. Of course, the change in
rk when all the weights are varied at the same time is much more complicated.
Before we move on to a more general type of row weighting, we mention that
(6.1.1) can be framed as a symmetric indefinite linear system. In particular, if
[:�2
: l [: l = [� l·
then x minimizes (6.1.1). Compare with (5.3.20).
6.1.2 Generalized Least Squares
(6.1.5)
In statistical data-fitting applications, the weights in (6.1.1) are often chosen to increase
the relative importance of accurate measurements. For example, suppose the vector
of observations b has the form btrue + D.. where D..i is normally distributed with mean
zero and standard deviation <1i· If the errors are uncorrelated, then it makes statistical
sense to minimize (6.1.1) with di = 1/ai.
In more general estimation problems, the vector b is related to x through the
equation
b = Ax+w (6.1.6)
where the noise vector w has zero mean and a symmetric positive definite covariance
matrix a2W. Assume that W is known and that W = BBT for some B E Rmxm.
The matrix B might be given or it might be W's Cholesky triangle. In order that
all the equations in (6.1.6) contribute equally to the determination of x, statisticians
frequently solve the LS problem
min 11 B-1(Ax - b) 112 . (6.1.7)
xERn
An obvious computational approach to this problem is to form A = B-1Aand b = B-1b
and then apply any ofour previous techniques to minimize II Ax - b 112· Unfortunately,
if B is ill-conditioned, then x will be poorly determined by such a procedure.
A more stable way of solving (6.1.7) using orthogonal transformations has been
suggested by Paige {1979a, 1979b). It is based on the idea that (6.1.7) is equivalent to
the generalized least squares problem,
min vTv .
b=Ax+Bv
(6.1.8)

Notice that this problem is defined even if A and B are rank deficient. Although in
the Paige technique can be applied when this is the case, we shall describe it under the
assumption that both these matrices have full rank.
The first step is to compute the QR factorization of A:
n m-n
Next, an orthogonal matrix Z E IRmxm is determined such that
(QfB)Z = [ 0 I s l '
n m-n n m-n
where S is upper triangular. With the use of these orthogonal matrices, the constraint
in (6.1.8) transforms to
[Qfb l = [Ri lx + [QfBZ1 QfBZ2 ] [zrv l ·
Qfb 0 0 S Zfv
The bottom half of this equation determines v while the top half prescribes x:
Su = Qfb, v = Z2u, (6.1.9)
Rix = Qfb - (QfBZ1Z[ + QfBZ2Zf)v = Qfb - QfBZ2u. (6.1.10)
The attractiveness of this method is that all potential ill-conditioning is concentrated
in the triangular systems (6.1.9) and (6.1.10). Moreover, Paige (1979b) shows that the
above procedure is numerically stable, something that is not true of any method that
explicitly forms B-1A.
6.1.3 A Note on Column Weighting
Suppose G E IRnxn is nonsingular and define the G-norm 11 • Ila on IRn by
If A E IRmxn, b E IRm, and we compute the minimum 2-norm solution YLs to
min II (AG-1)y - b 112 ,
xERn
then Xa = c-1YLs is a minimizer of II Ax - b 112• If rank(A) < n, then within the set
of minimizers, Xa has the smallest G-norm.
The choice of G is important. Sometimes its selection can be based upon a
priori knowledge of the uncertainties in A. On other occasions, it may be desirable to
normalize the columns of A by setting
G = Go = diag(ll A(:, 1) 112, . • . , II A(:, n) 112).

6.1. Weighting and Regularization 307
Van der Sluis (1969) has shown that with this choice, 11:2(AG-1) is approximately
minimized. Since the computed accuracy of YLs depends on 11:2(AG-1), a case can be
made for setting G = Go.
We remark that column weighting affects singular values. Consequently, a scheme
for determining numerical rank may not return the same estimate when applied to A
and Ao-1• See Stewart (1984).
6.1.4 Ridge Regression
In the ridge regression problem we are given A E Rmxn and b E Rm and proceed to
solve
min 11 Ax - b 11� + All x II� .
x
(6.1.11)
where the value of the ridge parameter A is chosen to "shape" the solution x = x(A)
in some meaningful way. Notice that the normal equation system for this problem is
given by
It follows that if r
A = uEvT = :L:ui uivr
i=l
is the SYD of A, then (6.1.12) converts to
and so
By inspection, it is clear that
lim x(A) = XLs
.>.-+O
(6.1.12)
(6.1.13)
(6.1.14)
and 11 x(A) 112 is a monotone decreasing function of A. These two facts show how an
ill-conditioned least squares solution can be regularized by judiciously choosing A. The
idea is to get sufficiently close to XLs subject to the constraint that the norm of the
ridge regression minimzer x(A) is sufficiently modest. Regularization in this context is
all about the intelligent balancing of these two tensions.
The ridge parameter can also be chosen with an eye toward balancing the "im
pact" ofeach equation in the overdetermined system Ax = b. We describe a A-selection
procedure due to Golub, Heath, and Wahba (1979). Set
Dk = I - ekef = diag(l, . . . , 1, 0, 1, . . . , 1) E Rmxm
and let xk(A) solve
min II Dk(Ax - b) II� + All x II� .
xER"
(6.1.15)

Thus, xk(A) is the solution to the ridge regression problem with the kth row of A and
kth component of b deleted, i.e., the kth equation in the overdetermined system Ax = b
is deleted. Now consider choosing A so as to minimize the cross-validation weighted
square error C(A) defined by
C(A) = !fWk(aIxk(A) - bk)2 •
k=l
Here, W1' . . . 'Wm are nonnegative weights and ar is the kth row of A. Noting that
we see that (aIXk(A) - bk)2 is the increase in the sum ofsquares that results when the
kth row is "reinstated." Minimizing C(A) is tantamount to choosing A such that the
final model is not overly dependent on any one experiment.
A more rigorous analysis can make this statement precise and also suggest a
method for minimizing C(A). Assuming that A > 0, an algebraic manipulation shows
that
(') (') aix(A) - bk
Xk A = X A +
T
Zk
1 - zk ak (6.1.16)
where Zk = (ATA + AJ)-1ak and x(A) = (ATA + AJ)-1ATb. Applying -aI to
(6.1.16) and then adding bk to each side of the resulting equation gives
eT(J - A(ATA + AJ)-1AT)b
rk = bk - aIXk(A) =
ef(I _ A(ATA + AJ)-1AT)ek ·
(6.1.17)
Noting that the residual r = [r1, . . . ,rm jT = b - Ax(A) is given by the formula
we see that
1 m ( )2
C(A) =
m
�Wk ar:iabk (6.1.18)
The quotient rk/(ork/obk) may be regarded as an inverse measure of the "impact" of
the kth observation bk on the model. If ork/obk is small, then this says that the error
in the model's prediction of bk is somewhat independent of bk. The tendency for this
to be true is lessened by basing the model on the A* that minimizes C(A).
The actual determination of A* is simplified by computing the SYD of A. Using
the SYD (6.1.13) and Equations (6.1.17) and (6.1.18), it can be shown that
1 m
= - Lwk
m
k=l
1 - tu�i ( 2
aJ
)
i=l ui + A
(6.1.19)
where b = UTb. The minimization of this expression is discussed in Golub, Heath, and
Wahba (1979).

6.1.5 Tikhonov Regularization
In the Tikhonov regularization problem, we are given A E Rmxn, B E Rnxn, and b E Rm
and solve
min
x
min II Ax - b II� + Ail Bx 11�-
x
The normal equations for this problem have the form
(6.1.20)
(6.1.21)
This system is nonsingular if null(A) n null(B) = {O}. The matrix B can be chosen
in several ways. For example, in certain data-fitting applications second derivative
smoothness can be promoted by setting B = Too , the second difference matrix defined
in Equation 4.8.7.
To analyze how A and B interact in the Tikhonov problem, it would be handy
to transform (6.1.21) into an equivalent diagonal problem. For the ridge regression
problem (B = In) the SVD accomplishes this task. For the Tikhonov problem, we
need a generalization of the SVD that simultaneously diagonalizes both A and B.
6.1.6 The Generalized Singular Value Decomposition
The generalized singular value decomposition (GSVD) set forth in Van Loan (1974)
provides a useful way to simplify certain two-matrix problems such as the Tychanov
regularization problem.
Theorem 6.1.1 (Generalized Singular Value Decomposition). Assume that
A E Rm1 xni and B E Rm2xni with m1 ;:::: ni and
r = rank ([ � ]) .
There exist orthogonal U1 E 1Rm1 xmi and U2 E Rm2xm2 and invertible X E Rn1xni
such that
[�
0
n
p
U'{AX DA = diag(O:p+l• . . . ,O:r) r-p
0 m1-r
(6.1.22)
p r-p n1-r
[�
0
n
p
U'{BX = DIJ diag(,BP+l• . . . ,.Br) r-p
0 m2-r
(6.1.23)
p r-p n1-r
where p = max{r - m2, O}.

Proof. The proof makes use of the SVD and the CS decomposition (Theorem 2.5.3).
Let
[A l =
[Qu Q12 l [Er 0 lzT
B Q21 Q22 0 0
(6.1.24)
be the SVD where Er E IR'"xr is nonsingular, Qu E IRm,xr, and Q21 E IRm2xr. Using
the CS decomposition, there exist orthogonal matrices U1 (m1-by-m1), U2 (mrby-m2),
and Vi (r-by-r) such that
(6.1.25)
where DA and D8 have the forms specified by (6.1.21) and (6.1.22). It follows from
(6.1.24) and (6.1.25) that
By setting
the proof is complete. D
Note that if B = In, and we set X = U2, then we obtain the SVD of A. The GSVD is
related to the generalized eigenvalue problem
ATAx = µ2BTBx
which is considered in §8.7.4. As with the SVD, algorithmic issues cannot be addressed
until we develop procedures for the symmetric eigenvalue problem in Chapter 8.
To illustrate the insight that can be provided by the GSVD, we return to the
Tikhonov regularization problem (6.1.20). If B is square and nonsingular, then the
GSVD defined by (6.1.22) and (6.1.23) transforms the system (6.1.21) to
T T T-
(DA DA + >.D8 DB)y = DA b
where x = Xy, b = U[b, and
(DIDA + >.D�D8) = diag(a� + >.,B�, . . . , a� + >.,B�).

Thus, if
is a column partitioning, then
(6.1.26)
solves (6.1.20). The "calming influence" of the regularization is revealed through this
representation. Use of A to manage "trouble" in the direction of Xk depends on the
values of ak and f3k·
Problems
P6.1.l Verify (6.1.4).
P6.l.2 What is the inverse of the matrix in (6.1.5)?
P6.1.3 Show how the SVD can be used to solve the generalized LS problem (6.1.8) if the matrices A
and B are rank deficient.
P6.l.4 Suppose A is the m-by-1 matrix of 1's and letb E Rm. Show that the cross-validation technique
with unit weights prescribes an optimal A given by
where b = (b1 + · · · + bm)/m and
m
s = L:<bi - b)21cm - 1).
i= l
P6.1.5 Using the GSVD, give bounds for II x(.A) - x(O) II and II Ax(.A) - b II� - II Ax(O) - b II� where
x(.A) is defined by (6.1.26).
Row and column weighting in the LS problem is discussed in Lawson and Hanson (SLS, pp. 180-88).
Other analyses include:
A. van der Sluis (1969). "Condition Numbers and Equilibration of Matrices," Numer. Math. 14,
14-23.
G.W. Stewart (1984). "On the Asymptotic Behavior of Scaled Singular Value and QR Decomposi
tions," Math. Comput. 43, 483-490.
A. Forsgren (1996). "On Linear Least-Squares Problems with Diagonally Dominant Weight Matrices,"
P.D. Hough and S.A. Vavasis (1997). "Complete Orthogonal Decomposition for Weighted Least
J.K. Reid (2000). "Implicit Scaling of Linear Least Squares Problems,'' BIT 40, 146-157.
For a discussion of cross-validation issues, see:
G.H. Golub, M. Heath, and G. Wahba (1979). "Generalized Cross-Validation as a Method for Choosing
a Good Ridge Parameter,'' Technometrics 21, 215-23.
L. Elden (1985). "A Note on the Computation of the Generalized Cross-Validation Function for
Ill-Conditioned Least Squares Problems,'' BIT 24, 467-472.
Early references concerned with the generalized singular value decomposition include:
C.F. Van Loan (1976). "Generalizing the Singular Value Decomposition,'' SIAM J. Numer. Anal.
13, 76-83.

C.C. Paige and M.A. Saunders (1981). "Towards A Generalized Singular Value Decomposition," SIAM
J. Numer. Anal. 18, 398-405.
The theoretical and computational aspects of the generalized least squares problem appear in:
C.C. Paige (1979). "Fast Numerically Stable Computations for Generalized Linear Least Squares
Problems," SIAM J. Numer. Anal. 16, 165-171.
C.C. Paige (1979b). "Computer Solution and Perturbation Analysis of Generalized Least Squares
Problems," Math. Comput. 33, 171-84.
S. Kourouklis and C.C. Paige (1981). "A Constrained Least Squares Approach to the General Gauss
Markov Linear Model," J. Amer. Stat. Assoc. 76, 620-625.
C.C. Paige (1985). "The General Limit Model and the Generalized Singular Value Decomposition,"
Generalized factorizations have an important bearing on generalized least squares problems, see:
C.C. Paige (1990). "Some Aspects of Generalized QR Factorization," in Reliable Numerical Compu
tations, M. Cox and S. Hammarling (eds.), Clarendon Press, Oxford.
E. Anderson, z. Bai, and J. Dongarra (1992). "Generalized QR Factorization and Its Applications,"
Lin. Alg. Applic. 162/163/164, 243-271.
The development of regularization techniques has a long history, see:
L. Elden (1977). "Algorithms for the Regularization of Ill-Conditioned Least Squares Problems," BIT
1 7, 134-45.
D.P. O'Leary and J.A. Simmons (1981). "A Bidiagonalization-Regularization Procedure for Large
Scale Discretizations of Ill-Posed Problems," SIAM J. Sci. Stat. Comput. 2, 474 -489.
L. Elden (1984). "An Algorithm for the Regularization of Ill-Conditioned, Banded Least Squares
Problems," SIAM J. Sci. Stat. Comput. 5, 237-254.
P.C. Hansen (1990). "Relations Between SYD and GSVD of Discrete Regularization Problems in
Standard and General Form," Lin.Alg. Applic. 141, 165-176.
P.C. Hansen (1995). "Test Matrices for Regularization Methods," SIAM J. Sci. Comput. 16, 506-512.
A. Neumaier (1998). "Solving Ill-Conditioned and Singular Linear Systems: A Tutorial on Regular
ization," SIAM Review 40, 636··666.
P.C. Hansen (1998). Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear
Inversion, SIAM Publications, Philadelphia, PA.
M.E. Gulliksson and P.-A. Wedin (2000). "The Use and Properties of Tikhonov Filter Matrices,"
M.E. Gulliksson, P.-A. Wedin, and Y. Wei (2000). "Perturbation Identities for Regularized Tikhonov
Inverses and Weighted Pseudoinverses," BIT 40, 513-523.
T. Kitagawa, S. Nakata, and Y. Hosoda (2001). "Regularization Using QR Factorization and the
Estimation of the Optimal Parameter," BIT 41, 1049-1058.
M.E. Kilmer and D.P. O'Leary. (2001). "Choosing Regularization Parameters in Iterative Methods
for Ill-Posed Problems," SIAM J. Matrix Anal. Applic. 22, 1204-1221.
A. N. Malyshev (2003). "A Unified Theory of Conditioning for Linear Least Squares and Tikhonov
Regularization Solutions," SIAM J. Matrix Anal. Applic. 24, 1186-1196.
M. Hanke (2006). "A Note on Tikhonov Regularization of Large Linear Problems," BIT 43, 449-451.
P.C. Hansen, J.C. Nagy, and D.P. OLeary (2006). Deblurring Images: Matrices, Spectra, and Filter
ing, SIAM Publications, Philadelphia, PA.
M.E. Kilmer, P.C. Hansen, and M.I. Espanol (2007). "A Projection-Based Approach to General-Form
Tikhonov Regularization," SIAM J. Sci. Comput. 29, 315-330.
T. Elfving and I. Skoglund (2009). "A Direct Method for a Regularized Least-Squares Problem,"
Num. Lin. Alg. Applic. 16, 649-675.
I. Hnetynkova and M. Plesinger (2009). "The Regularizing Effect of the Golub-Kahan Iterative Bidi
agonalization and revealing the Noise level in Data," BIT 49, 669-696.
P.C. Hansen (2010). Discrete Inverse Problems: Insight and Algorithms, SIAM Publications, Philadel
phia, PA.

6.2. Constrained Least Squares 313
6.2 Constrained Least Squares
In the least squares setting it is sometimes natural to minimize II Ax - b 112 over a
proper subset of IRn. For example, we may wish to predict b as best we can with Ax
subject to the constraint that x is a unit vector. Or perhaps the solution defines a
fitting function f(t) which is to have prescribed values at certain points. This can lead
to an equality-constrained least squares problem. In this section we show how these
problems can be solved using the QR factorization, the SVD, and the GSVD.
6.2.1 Least Squares Minimization Over a Sphere
Given A E IRmxn, b E IRm , and a positive a: E IR, we consider the problem
min 11 Ax - b 112 •
llx ll 2 :$ <>
(6.2.1)
This is an example of the LSQI (least squares with quadratic inequality constraint)
problem. This problem arises in nonlinear optimization and other application areas.
As we are soon to observe, the LSQI problem is related to the ridge regression problem
discussed in §6.1.4.
Suppose
r
A = UEVT = LaiuivT (6.2.2)
i=l
is the SVD of A which we assume to have rank r . If the unconstrained minimum norm
solution
satisfies II xLs 112 :::; a, then it obviously solves (6.2.1). Otherwise,
r ( T )2
2 ui b 2
II XLs 112 = L �
> 0: '
i=l i
(6.2.3)
and it follows that the solution to (6.2.1) is on the boundary of the constraint sphere.
Thus, we can approach this constrained optimization problem using the method of
Lagrange multipliers. Define the parameterized objective function ¢ by
and equate its gradient to zero. This gives a shifted normal equation system:
The goal is to choose A so that II x(A) 112 = a:. Using the SVD (6.2.2), this leads to the
problem of finding a zero of the function
n ( T )2
f(A) = II x(A) II; - 0:2 = L ;;uk
� - 0:2.
k=l k +

This is an example of a secular equation problem. From (6.2.3), f(O) > 0. Since
!'(>.) < 0 for >. � 0, it follows that f has a unique positive root >.+. It can be shown
that
p(>.) = II Ax(>.) - b II� = IIAxLS - b II� + t(�UI�r
It follows that x(>.+) solves (6.2.1).
(6.2.4)
Algorithm 6.2.1 Given A E IRmxn with m � n, b E IRm, and a > 0, the following
algorithm computes a vector x E IRn such that II Ax - b 112 is minimum subject to the
constraint that II x 112 ::::; a.
Compute the SVD A = UEVT, save v = [ V1 I·.·I Vn J , form b = urb,
and determine r = rank(A).
The SVD is the dominant computation in this algorithm.
6.2.2 More General Quadratic Constraints
A more general version of (6.2.1) results if we minimize II Ax - b 112 over an arbitrary
hyperellipsoid:
minimize II Ax - b 1'2 subject to II Bx - d 112 ::::; a. (6.2.5)
Here we are assuming that A E IRm1 xni, b E IRm1 , B E IRm2 xni, d E IRm2 , and a � 0.
Just as the SVD turns (6.2.1) into an equivalent diagonal problem, we can use the
GSVD to transform (6.2.5) into a diagonal problem. In particular, if the GSVD of A
and B is given by (6.1.22) and (6.2.23), then (6.2.5) is equivalent to
where
minimize II DAY - b 112 subject to 11 DBy - d 112 ::::; a
b = U[b, d = u'[d,
(6.2.6)

The simple form of the objective function and the constraint equation facilitate the
analysis. For example, if rank(B) = m2 < ni, then
n1
II DAy - b II� = L(aiYi - bi)2 + (6.2.7)
i=l
and
m2
II DsY - d II� = L(.BiYi - di) 2 + (6.2.8)
i=l
A Lagrange multiplier argument can be used to determine the solution to this trans
formed problem (if it exists).
6.2.3 Least Squares With Equality Constraints
We consider next the constrained least squares problem
min II Ax - b 112
Bx=d
(6.2.9)
where A E Rm1 xni with m1 � ni, B E Rm2 xni with m2 < ni, b E Rm1 , and d E Rm2 •
We refer to this as the LSE problem (least squares with equality constraints). By
setting a = 0 in (6.2.5) we see that the LSE problem is a special case of the LSQI
problem. However, it is simpler to approach the LSE problem directly rather than
through Lagrange multipliers.
For clarity, we assume that both A and B have full rank. Let
be the QR factorization of BT and set
AQ = [ Ai I A2 ]
It is clear that with these transformations (6.2.9) becomes
min II Aiy + A2z - b 112•
RTy=d
Thus, y is determined from the constraint equation RTy = d and the vector z is
obtained by solving the unconstrained LS problem
min II A2z - (b - Aiy) 112•
zERn1 -m2
Combining the above, we see that the following vector solves the LSE problem:

Algorithm 6.2.2 Suppose A E IRm1 xn, , B E IRm2 xni , b E IRm' , and d E 1Rm2• If
rank(A) = n1 and rank(B) = m2 < n1, then the following algorithm minimizes
II Ax - b 112 subject to the constraint Bx = d .
Compute the QR factorization BT = QR.
Solve R(l:m2, l:m2)T·y = d for y.
A = AQ
Find z so II A(:, m2 + l:n1)z - (b - A(:, l:m2)·y) 112 is minimized.
x = Q(:, l:m2)·y +Q(:,m2 + l:n1)·z .
Note that this approach to the LSE problem involves two QR factorizations and a
matrix multiplication. If A and/or B are rank deficient, then it is possible to devise a
similar solution procedure using the SVD instead of QR. Note that there may not be
a solution if rank(B) < m2. Also, if null(A) n null(B) =f. {O} and d E ran(B), then the
LSE solution is not unique.
6.2.4 LSE Solution Using the Augmented System
The LSE problem can also be approached through the method of Lagrange multipliers.
Define the augmented objective function
1 2 T
f(x, >.) = 211Ax - b 112 + >. (d - Bx),
and set to zero its gradient with respect to x:
ATAx - Arb - BT>. = 0.
Combining this with the equations r = b - Ax and Bx = d we obtain the symmetric
indefinite linear system
(6.2.10)
This system is nonsingular if both A and B have full rank. The augmented system
presents a solution framework for the sparse LSE problem.
6.2.5 LSE Solution Using the GSVD
Using the GSVD given by (6.1.22) and (6.1.23), we see that the LSE problem transforms
to
min _ II DAY - b 112
Du y=d
(6.2.11)
where b = U[b, d= U!d, and y = x-1x. It follows that if null(A) n null(B) = {O}
and X = [ X1 I · · · I Xn ] , then
m2 (di) ni (bi )
x = L � X; + L � X;
. !3i . ai
i=l i=m2 +l
(6.2.12)

solves the LSE problem.
6.2.6 LSE Solution Using Weights
An interesting way to obtain an approximate LSE solution is to solve the unconstrained
LS problem
(6.2.13)
for large )... (Compare with the Tychanov regularization problem (6.1.21).) Since
II[�B ]x - [Jxd JI[ � ll Ax - b ll� + �ll Bx - d ll',
we see that there is a penalty for discrepancies among the constraint equations. To
quantify this, assume that both A and B have full rank and substitute the GSVD
defined by (6.1.22) and (6.1.23) into the normal equation system
(ATA + >-.BTB)x = ATb + )..BTd.
This shows that the solution x(>-.) is given by x(>-.) = Xy()..) where y()..) solves
T T T- T -
(DADA + >-.D8D8)y = DAb+ )..D8d
with b = U[b and d= U{d. It follows that
and so from (6.2.13) we have
(6.2.14)
This shows that x(>-.) -+ x as ).. -+ oo . The appeal of this approach to the LSE problem
is that it can be implemented with unconstrained LS problem software. However, for
large values of ).. numerical problems can arise and it is necessary to take precautions.
See Powell and Reid (1968) and Van Loan (1982).
Problems
P6.2.1 Is the solution to (6.2.1} always unique?
P6.2.2 Let vo(x}, . . . , pn (x) be given polynomials and (xo, yo), . . . , (xm , Ym) be a given set of coordi
nate pairs with Xi E [a, b). It is desired to find a polynomial p(x) = L::=o akPk (x) such that
m
tf>(a) = L(p(xi) - yi)2
i=O

is minimized subject to the constraint that
where Zi = a + ih and b = a + Nh. Show that this leads to an LSQI problem of the form (6.2.5) with
d = O.
P6.2.3 Suppose Y = [ Yl I · · · I Yk ] E Rmxk has the property that
yTy = diag(d�, . . . , d%),
Show that if Y = QR is the QR factorization of Y, then R is diagonal with lrii l = �-
P6.2.4 (a) Show that if (ATA + >.I)x = ATb, .X > 0, and II x 1'2 = a, then z = (Ax - b)/.X solves
the dual equations (AAT + >.I)z = -b with II ATz 112 = a. (b) Show that if (AAT + >.I)z = -b,
II ATz 1'2 = a, then x = -ATz satisfies (ATA + >.I)x = ATb, II x 1'2 = a.
P6.2.5 Show how to compute y (if it exists) so that both (6.2.7) and (6.2.8) are satisfied.
P6.2.6 Develop an SVD version of Algorithm 6.2.2 that can handle the situation when A and/or B
are rank deficient.
P6.2.7 Suppose
A = [ �� ]
where Ai E Rnxn is nonsingular and A2 E R(m-n)xn. Show that
O'min(A) 2:: J1 + O'min(A2A}1)2 Um1n(A1) ·
P6.2.8 Suppose p ;::: m ;::: n and that A E Rmxn and B E Rmxp Show how to compute orthogonal
Q E Ir" xm and orthogonal V E Rnxn so that
where R E Fxn and S E Rmxm are upper triangular.
P6.2.9 Suppose r E Rm, y E Rn, and 6 > 0. Show how to solve the problem
min llEy - rll2
EeRm X n , ll Ell F $6
Repeat with "min" replaced by "max."
P6.2.10 Show how the constrained least squares problem
min II Ax - b ll2
B:x=d
A E Rmxn, B E wxn, rank(B) = p
can be reduced to an unconstrained least square problem by performing p steps of Gaussian elimination
on the matrix
[ � ] = [ �� B2 ]
A2 '
Explain. Hint: The Schur complement is of interest.
The LSQI problem is discussed in:
G.E. Forsythe and G.H. Golub (1965). "On the Stationary Values of a Second-Degree Polynomial on
the Unit Sphere," SIAM J. App. Math. 14, 1050-1068.
L. Elden (1980). "Perturbation Theory for the Least Squares Problem with Linear Equality Con
straints," SIAM J. Numer. Anal. 1 7, 338-350.
W. Gander (1981). "Least Squares with a Quadratic Constraint," Numer. Math. 36, 291-307.
L. Elden (1983). "A Weighted Pseudoinverse, Generalized Singular Values, and Constrained Least
Squares Problems,'' BIT 22 , 487-502.

G.W. Stewart (1984). "On the Asymptotic Behavior of Scaled Singular Value and QR Decomposi
tions," Math. Comput. 43, 483-490.
G.H. Golub and U. von Matt (1991). "Quadratically Constrained Least Squares and Quadratic Prob
lems," Nu.mer. Math. 59, 561-580.
T.F. Chan, J.A. Olkin, and D. Cooley (1992). "Solving Quadratically Constrained Least Squares
Using Black Box Solvers," BIT 32, 481-495.
Secular equation root-finding comes up in many numerical linear algebra settings. For an algorithmic
overview, see:
O.E. Livne and A. Brandt (2002). "N Roots of the Secular Equation in O(N) Operations," SIAM J.
For a discussion of the augmented systems approach to least squares problems, see:
A. Bjorck (1992). "Pivoting and Stability in the Augmented System Method," Proceedings of the 14th
Dundee Conference, D.F. Griffiths and G.A. Watson (eds.), Longman Scientific and Technical,
Essex, U.K.
A. Bjorck and C.C. Paige (1994). "Solution of Augmented Linear Systems Using Orthogonal Factor
izations," BIT 34, 1-24.
References that are concerned with the method of weighting for the LSE problem include:
M.J.D. Powell and J.K. Reid (1968). "On Applying Householder's Method to Linear Least Squares
Problems," Proc. IFIP Congress, pp. 122-26.
C. Van Loan (1985). "On the Method of Weighting for Equality Constrained Least Squares Problems,"
SIAM J. Nu.mer. Anal. 22, 851-864.
J.L. Barlow and S.L. Handy (1988). "The Direct Solution of Weighted and Equality Constrained
Least-Squares Problems," SIAM J. Sci. Stat. Comput. 9, 704-716.
J.L. Barlow, N.K. Nichols, and R.J. Plemmons (1988). "Iterative Methods for Equality Constrained
Least Squares Problems," SIAM J. Sci. Stat. Comput. 9, 892-906.
J.L. Barlow (1988). "Error Analysis and Implementation Aspects of Deferred Correction for Equality
Constrained Least-Squares Problems," SIAM J. Nu.mer. Anal. 25, 1340-1358.
J.L. Barlow and U.B. Vemulapati (1992). "A Note on Deferred Correction for Equality Constrained
Least Squares Problems," SIAM J. Nu.mer. Anal. 29, 249-256.
M. Gulliksson and P.-A. Wedin (1992). "Modifying the QR-Decomposition to Constrained and
Weighted Linear Least Squares," SIAM J. Matrix Anal. Applic. 13, 1298-1313.
M. Gulliksson (1994). "Iterative Refinement for Constrained and Weighted Linear Least Squares,"
BIT 34, 239-253.
G. W. Stewart (1997). "On the Weighting Method for Least Squares Problems with Linear Equality
Constraints," BIT 37, 961-967.
For the analysis of the LSE problem and related methods, see:
M. Wei (1992). "Perturbation Theory for the Rank-Deficient Equality Constrained Least Squares
Problem," SIAM J. Nu.mer. Anal. 29, 1462-1481.
M. Wei (1992). "Algebraic Properties ofthe Rank-Deficient Equality-Constrained and Weighted Least
Squares Problems," Lin. Alg. Applic. 161, 27-44.
M. Gulliksson (1995). "Backward Error Analysis for the Constrained and Weighted Linear Least
Squares Problem When Using the Weighted QR Factorization," SIAM J. Matrix. Anal. Applic.
13, 675-687.
M. Gulliksson (1995). "Backward Error Analysis for the Constrained and Weighted Linear Least
Squares Problem When Using the Weighted QR Factorization," SIAM J. Matrix Anal. Applic.
16, 675-687.
J. Ding and W. Hang (1998). "New Perturbation Results for Equality-Constrained Least Squares
Problems," Lin. Alg. Applic. 272, 181-192.
A.J. Cox and N.J. Higham (1999). "Accuracy and Stability of the Null Space Method for Solving the
Equality Constrained Least Squares Problem," BIT 39, 34-50.
A.J. Cox and N.J. Higham (1999). "Row-Wise Backward Stable Elimination Methods for the Equality
Constrained Least Squares Problem," SIAM J. Matrix Anal. Applic. 21, 313-326.
A.J. Cox and Nicholas J. Higham (1999). "Backward Error Bounds for Constrained Least Squares
Problems," BIT 39, 210-227.

M. Gulliksson and P-A. Wedin (2000). "Perturbation Theory for Generalized and Constrained Linear
Least Squares," Num. Lin. Alg. 7, 181·-195.
M. Wei and A.R. De Pierro (2000). "Upper Perturbation Bounds of Weighted Projections, Weighted
and Constrained Least Squares Problems," SIAM J. Matrix Anal. Applic. 21, 931-951.
E.Y. Bobrovnikova and S.A. Vavasis (2001). "Accurate Solution ofWeighted Least Squares by Iterative
Methods SIAM. J. Matrix Anal. Applic. 22, 1153-1174.
M. Gulliksson, X-Q.Jin, and Y-M. Wei (2002). "Perturbation Bounds for Constrained and Weighted
Least Squares Problems," Lin. Alg. Applic. 349, 221-232.
6.3 Total Least Squares
The problem of minimizing II Ax - b IJ2 where A E R.mxn and b E R.m can be recast as
follows:
min II r 112 •
b+r E ran(A)
(6.3.1)
In this problem, there is a tacit assumption that the errors are confined to the vector
of observations b. If error is also present in the data matrix A, then it may be more
natural to consider the problem
min ll [ E l r ] IJF ·
b+r E ran(A+E)
(6.3.2)
This problem, discussed by Golub and Van Loan (1980), is referred to as the total least
squares (TLS) problem. If a minimizing [ Eo I ro ] can be found for (6.3.2), then any x
satisfying (A + Eo)x = b + ro is called a TLS solution. However, it should be realized
that (6.3.2) may fail to have a solution altogether. For example, if
then for all € > 0, b E ran(A + E15). However, there is no smallest value of II [ E , r J llF
for which b + r E ran (A + E).
A generalization of (6.3.2) results if we allow multiple right-hand sides and use a
weighted Frobenius norm. In particular, if B E R.mxk and the matrices
D = diag(d1, . . . , dm),
T = diag(t1, . . . , tn+k)
are nonsingular, then we are led to an optimization problem of the form
min IJ D [ E I R] T IJF
B+R E ran(A+E)
(6.3.3)
where E E R.mxn and R E R.mxk. If [ Eo I Ro ] solves (6.3.3), then any X E :nrxk that
satisfies
(A + E0)X = (B + Ro)
is said to be a TLS solution to (6.3.3).
In this section we discuss some of the mathematical properties of the total least
squares problem and show how it can be solved using the SVD. For a more detailed
introduction, see Van Ruffel and Vanderwalle (1991).

6.3. Total Least Squares 321
6.3.1 Mathematical Background
The following theorem gives conditions for the uniqueness and existence of a TLS
solution to the multiple-right-hand-side problem.
Theorem 6.3.1. Suppose A E IRmxn and B E IRmxk and that D = diag(d1, . . . , dm)
and T = diag(t1, . . . ,tn+k) are nonsingular. Assume m ;::: n + k and let the SVD of
c = D[ A I B ]T = [ C1 I C2 l
n k
be specified by UTCV = diag(0'1, . . . , O'n+k) = E where U, V, and E are partitioned as
follows:
u = [ U1 I U2 ]
n k
V = E =
n k
IfO'n(C1) > O'n+l(C), then the matrix [ Eo I Ro ] defined by
D[ Eo I Ro ]T = -U2E2[Yi� I V2� ] (6.3.4)
solves {6.3.3). If T1 = diag(ti, . . . , tn) and T2 = diag(tn+l• . . . , tn+k), then the matrix
exists and is the unique TLS solution to (A + Eo)X = B + Ro.
Proof. We first establish two results that follow from the assumption O'n(C1) > O'n+l(C).
From the equation CV = UE we have
We wish to show that V22 is nonsingular. Suppose V22x = 0 for some unit 2-norm x.
It follows from
that 11 Vi2x lb = 1. But then
O'n+i(C) ;::: ll U2E2x ll2 = ll C1Vi2x lb ;::: O'n(C1) ,
a contradiction. Thus, the submatrix V22 is nonsingular. The second fact concerns the
strict separation ofO'n(C) and O'n+l(C). From Corollary 2.4.5, we have O'n(C) ;::: O'n(C1)
and so
O'n(C) ;::: O'n(C1) > O'n+i(C).
We are now set to prove the theorem. If ran(B + R) C ran(A + E), then there is
an X (n-by-k) so (A + E)X = B + R, i.e.,
{ D[ A I B ]T + D[ E I R ]T } r-1
[ -
�k ] = o . (6.3.5)

Thus, the rank of the matrix in curly brackets is at most equal to n. By following the
argument in the proof of Theorem 2.4.8, it can be shown that
n+k
ll D [ E I R] T ll! 2:: L ai(C)2•
i=n+I
Moreover, the lower bound is realized by setting [ E I R ] = [ Eo I Ro ]. Using the
inequality an(C) > an+l(C), we may infer that [ Eo I Ro ] is the unique minimizer.
To identify the TLS solution XTL5, we observe that the nullspace of
is the range of [ �� ] . Thus, from (6.3.5)
r-1 [-�k ] = [��: ]s
for some k- by- k matrix S. From the equations r1-1X = Vi2S and -T2-1 = Vi2S we
see that S = -V221r2-1 and so
X = T1Vi2S = -TiVi2V2;1r2-1 = XTLs· D
Note from the thin CS decomposition (Theorem 2.5. 2) that
II x 112 = II v; v;-1 112 =
i -ak(Vi2)2
T 12 22 2 ak(Vi2)2
where we define the "r-norm" on JRnxk by II z llr = II rl-lZT2 112·
If an(C1) = an+i(C), then the solution procedure implicit in the above proof is
problematic. The TLS problem may have no solution or an infinite number ofsolutions.
See §6.3.4 for suggestions as to how one might proceed.
6.3.2 Solving the Single Right Hand Side Case
We show how to maximize ak(Vi2) in the important k = 1 case. Suppose the singular
values of C satisfy an-p > an-p+I = · · · = an+l and let V = [ V1 I · · · I Vn+i ] be a
column partitioning of V. If Q is a Householder matrix such that
[ W
O : ]n
1 '
V(:, n + l - p:n + l)Q =
'"'
p
then the last column of this matrix has the largest (n + l)st component of all the
vectors in span{v
n+l-p, . . . ,Vn+i}· If a = 0, then the TLS problem has no solution.
Otherwise

6.3. Total Least Squares
Moreover,
and so
[I�i � lUT(D[ A l b ]T)V
[
I�p � l = E
D [ Eo l ro ] T = -D [ A l b ] T [: l[ zT i a ].
Overall, we have the following algorithm:
323
Algorithm 6.3.1 Given A E Rmxn (m > n), b E Rm, nonsingular D = diag(di. . . . , dm),
and nonsingular T = diag(t1, . . . , tn+i ), the following algorithm computes (if possible)
a vector xTLs E Rn such that (A+Eo)xTLs = (b+ro) and II D[ Eo I ro ]T IIF
is minimal.
Compute the SVD uT(D[ A I b]T)V = diag(a1 , . . . ' Un+i ) and save v.
Determine p such that a1 ;::: · · · ;::: Un-p > Un-p+l = · · · = Un+l ·
Compute a Householder P such that if V = VP, then V(n + 1, n - p + l:n) = 0.
if Vn+l,n+l =f 0
for i = l:n
Xi = -tiVi,n+i /(tn+iVn+i,n+i )
end
XTLS = X
end
This algorithm requires about 2mn2 + 12n3 fl.ops and most of these are associated with
the SVD computation.
6.3.3 A Geometric Interpretation
It can be shown that the TLS solution xTLs minimizes
(6.3.6)
where af is the ith row of A and bi is the ith component of b. A geometrical interpre
tation of the TLS problem is made possible by this observation. Indeed,
_ lafx - bil2
6i -
TT-2 + c2
is the square of the distance from
to the nearest point in the subspace
X 1 X n+l

where the distance in Rn+l is measured by the norm II z II = II Tz 112• The TLS problem
is essentially the problem of orthogonal regression, a topic with a long history. See
Pearson (1901) and Madansky (1959).
6.3.4 Variations of the Basic TLS Problem
We briefly mention some modified TLS problems that address situations when addi
tional constraints are imposed on the optimizing E and R and the associated TLS
solution.
In the restricted TLS problem, we are given A E Rmxn, B E Rmx k, P1 E Rmxq,
and P2 E Rn+kxr, and solve
min II P[[ E I R]P2 llF ·
B+R C ran(A+E)
(6.3.7)
We assume that q � m and r � n + k. An important application arises if some of the
columns of A are error-free. For example, if the first s columns of A are error-free, then
it makes sense to force the optimizing E to satisfy E(:, l:s) = 0. This goal is achieved
by setting P1 = Im and P2 = Im+k (:, s + l:n + k) in the restricted TLS problem.
If a particular TLS problem has no solution, then it is referred to as a nongeneric
TLS problem. By adding a constraint it is possible to produce a meaningful solution.
For example, let UT[ A I b JV = E be the SVD and let p be the largest index so
V(n + 1,p) ¥- 0. It can be shown that the problem
min ll ( E l r J llF
(A+E)x=b+r
[ E I r ]V(:,p+l:n+l)=O
(6.3.8)
has a solution [ Eo I ro ] and the nongeneric TLS solution satisfies (A + Eo)x + b + ro.
See Van Huffel (1992).
In the regularized TLS problem additional constraints are imposed to ensure that
the solution x is properly constrained/smoothed:
min II [ E I r J llF
(A+E)x=b+r {6.3.9)
llLxll2 $<5
The matrix L E Rnxn could be the identity or a discretized second-derivative operator.
The regularized TLS problem leads to a Lagrange multiplier system of the form
See Golub, Hansen, and O'Leary (1999) for more details. Another regularization ap
proach involves setting the small singular values of [A I b] to zero. This is the truncated
TLS problem discussed in Fierro, Golub, Hansen, and O'Leary (1997).
Problems
P6.3.1 Consider the TLS problem (6.3.2) with nonsingular D and T. (a) Show that if rank(A) < n,
then (6.3.2) has a solution if and only if b E ran(A). (b) Show that if rank(A) = n, then (6.3.2) has no

6.3. Total Least Squares
solution if ATD2b = 0 and ltn+1 l ll Db 1'2 ?: un (DAT1 ) where T1 = diag(ti . . . . , tn)·
P6.3.2 Show that if C = D[ A I b ]T = [ A1 I d ] and un (C) > un+1 (C), then XTL s satisfies
(AfA1 - O"n+i (C)2/)XTLs = Afd.
Appreciate this as a "negatively shifted" system of normal equations.
325
P6.3.3 Show how to solve (6.3.2) with the added constraint that the first p columns of the minimizing
E are zero. Hint: Compute the QR factorization of A(:, l:p).
P6.3.4 Show how to solve (6.3.3) given that D and T are general nonsingular matrices.
P6.3.5 Verify Equation (6.3.6).
P6.3.6 If A E Rmxn has full column rank and B E wxn has full row rank, show how to minimize
subject to the constraint that Bx = 0.
11 Ax - b 11�
f(x) =
1 + xTx
P6.3.7 In the data least squares problem, we are given A E Rmxn and b E Rm and minimize II E !IF
subject to the constraint that b E ran(A + E). Show how to solve this problem. See Paige and Strakos
(2002b).
Much of this section is based on:
G.H. Golub and C.F. Van Loan (1980). "An Analysis of the Total Least Squares Problem," SIAM J.
Numer. Anal. 1 7, 883-93.
The idea of using the SYD to solve the TLS problem is set forth in:
G.H. Golub and C. Reinsch (1970). "Singular Value Decomposition and Least Squares Solutions,"
Numer. Math. 14, 403-420.
G.H. Golub (1973). "Some Modified Matrix Eigenvalue Problems," SIAM Review 15, 318--334.
The most comprehensive treatment of the TLS problem is:
S. Van Huffel and J. Vandewalle (1991). The Total Least Squares Problem: Computational Aspects
and Analysis, SIAM Publications, Philadelphia, PA.
There are two excellent conference proceedings that cover just about everything you would like to
know about TLS algorithms, generalizations, applications, and the associated statistical foundations:
S. Van Huffel (ed.) (1996). Recent Advances in Total Least Squares Techniques and Errors in Variables
Modeling, SIAM Publications, Philadelphia, PA.
S. Van Huffel and P. Lemmerling (eds.) (2002) Total Least Squares and Errors-in-Variables Modeling:
Analysis, Algorithms, and Applications, Kluwer Academic, Dordrecht, The Netherlands.
TLS is but one approach to the errors-in-variables problem, a subject that has a long and important
history in statistics:
K. Pearson (1901). "On Lines and Planes of Closest Fit to Points in Space," Phil. Mag. 2, 559-72.
A. Wald (1940). "The Fitting of Straight Lines if Both Variables are Subject to Error,'' Annals of
Mathematical Statistics 11, 284-300.
G.W. Stewart (2002). "Errors in Variables for Numerical Analysts,'' in Recent Advances in Total Least
Squares Techniques and Errors-in- Variables Modelling, S. Van Huffel (ed.), SIAM Publications,
Philadelphia PA, pp. 3-10,
In certain settings there are more economical ways to solve the TLS problem than the Golub-Kahan
Reinsch SYD algorithm:
S. Van Huffel and H. Zha (1993). "An Efficient Total Least Squares Algorithm Based On a Rank
Revealing Two-Sided Orthogonal Decomposition," Numer. Al,q. 4, 101-133.
A. Bjorck, P. Heggerncs, and P. Matstoms (2000). "Methods for Large Scale Total Least Squares
Problems,'' SIAM J. Matrix Anal. Applic. 22, 413-429.

R. Guo and R.A. Renaut (2005). "Parallel Variable Distribution for Total Least Squares," Num. Lin.
Alg. 12, 859-876.
The condition of the TLS problem is analyzed in:
M. Baboulin and S. Gratton (2011). "A Contribution to the Conditioning of the Total Least-Squares
Problem," SIAM J. Matrix Anal. Applic. 32, 685-699.
Efforts to connect the LS and TLS paradigms have lead to nice treatments that unify the presentation
of both approaches:
B.D. Rao (1997). "Unified Treatment of LS, TLS, and Truncated SVD Methods Using a Weighted
TLS Framework," in Recent Advances in Total Least Squares Techniques and Errors-in- Variables
Modelling, S. Van Ruffel (ed.), SIAM Publications, Philadelphia, PA., pp. 11-20.
C.C. Paige and Z. Strakos (2002a). "Bounds for the Least Squares Distance Using Scaled Total Least
Squares," Numer. Math. 91, 93-115.
C.C. Paige and Z. Strakos (2002b). "Scaled Total Least Squares Fundamentals," Numer. Math. 91,
117-146.
X.-W. Chang, G.R. Golub, and C.C. Paige (2008). "Towards a Backward Perturbation Analysis for
Data Least Squares Problems," SIAM J. Matrix Anal. Applic. 30, 1281-1301.
X.-W. Chang and D. Titley-Peloquin (2009). "Backward Perturbation Analysis for Scaled Total
Least-Squares," Num. Lin. Alg. Applic. 16, 627-648.
For a discussion of the situation when there is no TLS solution or when there are multiple solutions,
see:
S. Van Ruffel and J. Vandewalle (1988). "Analysis and Solution of the Nongeneric Total Least Squares
Problem," SIAM J. Matrix Anal. Appl. 9, 360--372.
S. Van Ruffel (1992). "On the Significance of Nongeneric Total Least Squares Problems," SIAM J.
Matrix Anal. Appl. 13, 20-35.
M. Wei (1992). "The Analysis for the Total Least Squares Problem with More than One Solution,"
SIAM J. Matrix Anal. Appl. 13, 746-763.
For a treatment of the multiple right hand side TLS problem, see:
I. Rnetynkovii., M. Plesinger, D.M. Sima, Z. Strakos, and S. Van Ruffel (2011). "The Total Least
Squares Problem in AX � B: A New Classification with the Relationship to the Classical Works,"
If some of the columns of A are known exactly then it is sensible to force the TLS perturbation matrix
E to be zero in the same columns. Aspects of this constrained TLS problem are discussed in:
J.W. Demmel (1987). "The Smallest Perturbation of a Submatrix which Lowers the Rank and Con
strained Total Least Squares Problems," SIAM J. Numer. Anal. 24, 199-206.
S. Van Ruffel and J. Vandewalle (1988). "The Partial Total Least Squares Algorithm," J. Comput.
App. Math. 21, 333-342.
S. Van Ruffel and J. Vandewalle (1989). "Analysis and Properties of the Generalized Total Least
Squares Problem AX � B When Some or All Columns in A are Subject to Error," SIAM J.
S. Van Ruffel and R. Zha (1991). "The Restricted Total Least Squares Problem: Formulation, Algo
rithm, and Properties," SIAM J. Matrix Anal. Applic. 12, 292--309.
C.C. Paige and M. Wei (1993). "Analysis of the Generalized Total Least Squares Problem AX = B
when Some of the Columns are Free of Error," Numer. Math. 65, 177-202.
Another type of constraint that can be imposed in the TLS setting is to insist that the optimum
perturbation of A have the same structure as A. For examples and related strategies, see:
J. Kamm and J.G. Nagy (1998). "A Total Least Squares Method for Toeplitz Systems of Equations,"
BIT 38, 560-582.
P. Lemmerling, S. Van Ruffel, and B. De Moor (2002). "The Structured Total Least Squares Approach
for Nonlinearly Structured Matrices," Num. Lin. Alg. 9, 321-332.
P. Lemmerling, N. Mastronardi, and S. Van Ruffel (2003). "Efficient Implementation of a Structured
Total Least Squares Based Speech Compression Method," Lin. Alg. Applic. 366, 295-315.
N. Mastronardi, P. Lemmerling, and S. Van Ruffel (2004). "Fast Regularized Structured Total Least
Squares Algorithm for Solving the Basic Deconvolution Problem," Num. Lin. Alg. 12, 201-209.

6.4. Subspace Computations with the SVD 327
I. Markovsky, S. Van Ruffel, and R. Pintelon (2005). "Block-Toeplitz/Hankel Structured Total Least
A. Beck and A. Ben-Tai (2005). "A Global Solution for the Structured Total Least Squares Problem
with Block Circulant Matrices," SIAM J. Matrix Anal. Applic. 27, 238-255.
H. Fu, M.K. Ng, and J.L. Barlow (2006). "Structured Total Least Squares for Color Image Restora
tion," SIAM J. Sci. Comput. 28, 1100-1119.
As in the least squares problem, there are techniques that can be used to regularlize an otherwise
"wild" TLS solution:
R.D. Fierro and J.R. Bunch (1994). "Collinearity and Total Least Squares," SIAM J. Matrix Anal.
Applic. 15, 1167-1181.
R.D. Fierro, G.H. Golub, P.C. Hansen and D.P. O'Leary (1997). "Regularization by Truncated Total
Least Squares," SIAM J. Sci. Comput. 18, 1223-1241.
G.H. Golub, P.C. Hansen, and D.P. O'Leary (1999). "Tikhonov Regularization and Total Least
R.A. Renaut and H. Guo (2004). "Efficient Algorithms for Solution of Regularized Total Least
D.M. Sima, S. Van Ruffel, and G.H. Golub (2004). "Regularized Total Least Squares Based on
Quadratic Eigenvalue Problem Solvers," BIT 44, 793-812.
N. Mastronardi, P. Lemmerling, and S. Van Ruffel (2005). "Fast Regularized Structured Total Least
Squares Algorithm for Solving the Basic Deconvolution Problem," Num. Lin. Alg. Applic. 12,
201-209.
S. Lu, S.V. Pereverzev, and U. Tautenhahn (2009). "Regularized Total Least Squares: Computational
Aspects and Error Bounds," SIAM J. Matrix Anal. Applic. 31, 918-941.
Finally, we mention an interesting TLS problem where the solution is subject to a unitary constraint:
K.S. Arun (1992). "A Unitarily Constrained Total Least Squares Problem in Signal Processing,"
6.4 Subspace Computations with the SVD
It is sometimes necessary to investigate the relationship between two given subspaces.
How close are they? Do they intersect? Can one be "rotated" into the other? And so
on. In this section we show how questions like these can be answered using the singular
value decomposition.
6.4. 1 Rotation of Subspaces
Suppose A E IRmxp is a data matrix obtained by performing a certain set of experi
ments. If the same set of experiments is performed again, then a different data matrix,
B E IRmxp, is obtained. In the orthogonal Procrustes problem the possibility that B
can be rotated into A is explored by solving the following problem:
minimize IIA - BQ llF , subject to QTQ = Ip . (6.4.1)
We show that optimizing Q can be specified in terms of the SVD of BTA. The matrix
trace is critical to the derivation. The trace of a matrix is the sum of its diagonal
entries: n
tr(C) = L Cii ,
i= l
It is easy to show that if C1 and C2 have the same row and column dimension, then
(6.4.2)

Returning to the Procrustes problem (6.4.1), if Q E wxp is orthogonal, then
p
II A - BQ II! = L II A(:, k) - B·Q(:, k) II�
k=l
p
= L II A(:, k) II� + II BQ(:, k) II� - 2Q(:, k)TBTA(:, k)
k=l
p
= II A II! + II BQ II! - 2 L [QT(BTA)] kk
k=l
= II A II! + II B II! - 2tr(QT(BTA)).
Thus, (6.4.1) is equivalent to the problem
max tr(QTBTA) .
QTQ=fp
If UT(BTA)V = E = diag(a1, . . . , ap) is the SVD of BTA and we define the
orthogonal matrix Z by Z = VTQTU, then by using (6.4.2) we have
p p
tr(QTBTA) = tr(QTUEVT) = tr(ZE) = L ZiiO'i :::; LO'i .
i=l i=l
The upper bound is clearly attained by setting Z = Ip, i.e., Q = UVT.
Algorithm 6.4.1 Given A and B in nrxp, the following algorithm finds an orthogonal
Q E wxp such that II A - BQ IIF
is minimum.
C = BTA
Compute the SVD UTCV = E and save U and V.
Q = UVT
We mention that if B = Ip, then the problem (6.4.1) is related to the polar decom
position. This decomposition states that any square matrix A has a factorization of
the form A = QP where Q is orthogonal and P is symmetric and positive semidefi
nite. Note that if A = UEVT is the SVD of A, then A = (UVT)(VEVT) is its polar
decomposition. For further discussion, see §9.4.3.
6.4.2 Intersection of Nullspaces
Let A E nrxn and B E wxn be given, and consider the problem of finding an or
thonormal basis for null(A) n null(B). One approach is to compute the nullspace of the
matrix
c = [ � ]
since this is just what we want: Cx = 0 {:::} x E null(A) n null(B). However, a more
economical procedure results if we exploit the following theorem.

Theorem 6.4.1. Suppose A E IRmxn and let {z1 , . . . , Zt } be an orthonormal basis for
null(A). Define Z = [ z1 I · · · I Zt ] and let {w1, . . . , wq } be an orthonormal basis for
null(BZ) where B E wxn. If w = [ W1 I · . · I Wq ] , then the columns of zw form an
orthonormal basis for null(A) n null(B).
Proof. Since AZ = 0 and (BZ)W = 0, we clearly have ran(ZW) C null(A) n null(B).
Now suppose x is in both null(A) and null(B). It follows that x = Za for some
O =f a E IRt. But since 0 = Bx = BZa, we must have a = Wb for some b E IRq. Thus,
x = ZWb E ran(ZW). D
If the SVD is used to compute the orthonormal bases in this theorem, then we obtain
the following procedure:
Algorithm 6.4.2 Given A E IRmxn and B E wxn, the following algorithm computes
and integer s and a matrix Y = [ Y1 I · · · I Ys ] having orthonormal columns which span
null(A) n null(B). If the intersection is trivial, then s = 0.
Compute the SVD U'[AVA = diag(ai), save VA, and set r = rank(A).
if r < n
else
end
C = BVA(:, r + l:n)
Compute the SVD U'{CVc = diag('Yi), save Ve, and set q = rank(C).
if q < n - r
s = n - r - q
Y = VA(:, r + l:n)Vc(:, q + l:n - r)
else
s = O
end
s = O
The practical implementation of this algorithm requires an ability to reason about
numerical rank. See §5.4.1.
6.4.3 Angles Between Subspaces
Let F and G be subspaces in IRm whose dimensions satisfy
p = dim(F) 2: dim(G) = q 2: 1.
The principal angles {Oi}{=1 between these two subspaces and the associated principal
vectors {fi,gi}i=1 are defined recursively by
(6.4.3)
fT[f1,...,fk-1)=0 gT(g1,...,gk-iJ=O

Note that the principal angles satisfy 0 $ f}i $ · · · $ Bq $ 7f/2.. The problem of com
puting principal angles and vectors is oftentimes referred to as the canonical correlation
problem.
Typically, the subspaces F and G are matrix ranges, e.g.,
F = ran(A),
G = ran(B),
The principal vectors and angles can be computed using the QR factorization and the
SVD. Let A = QARA and B = Q8R8 be thin QR factorizations and assume that
q
QIQs = YEZT = L,aiyizT
i=l
is the SVD of QIQB E wxq. Since II QIQB 112 $ 1, all the singular values are between
0 and 1 and we may write O'i = cos(Bi), i = l:q. Let
QAY = [ fi l · · · l fp ] ,
QBz = [ gl I . . . I gq l
(6.4.4)
(6.4.5)
be column partitionings ofthe matrices QAY E IRnxp and Q8Z E IRnxq. These matrices
have orthonormal columns. If f E F and g E G are unit vectors, then there exist unit
vectors u E JR.P and v E IRq so that f = QAu and g = Q8v. Thus,
fTg = (QAuf(Qnv) = uT(QIQs)V = uT(YEZT)v
q
= (YTufE(ZTv) = L,ai(yfu)(zfv) .
i=l
(6.4.6)
This expression attains its maximal value of a1 = cos(81) by setting u = y1 and v = z1.
It follows that f = QAYl = Ji and v = Qnz1 = gl.
Now assume that k > 1 and that the first k - 1 columns of the matrices in (6.4.4)
and (6.4.5) are known, i.e., fi, . . . , fk-l and gi, . . . ,gk-l· Consider the problem of
maximizing JTg given that f = QAu and g = Qnv are unit vectors that satisfy
It follows from (6.4.6) that
!T [ Ji I · . · I fk-i l = 0,
gT [ gl I · . · I 9k-i l = o.
q q
JTg = L,ai(Yfu)(zfv) < ak L IYTul · lzTvi.
i=k i=k
This expression attains its maximal value of O'k = cos(Bk) by setting u = Yk and v = Zk·
It follows from (6.4.4) and (6.4.5) that f = QAyk = fk and g = Q8zk = gk. Combining
these observations we obtain

Algorithm 6.4.3 (Principal Angles and Vectors) Given A E 1Rmxp and B E 1Rmxq
(p :;:: q) each with linearly independent columns, the following algorithm computes the
cosines of the principal angles (}i ;:::: · · · ;:::: Oq between ran(A) and ran(B). The vectors
Ji, . . . , fq and 91, . . . , gq are the associated principal vectors.
Compute the thin QR factorizations A = QARA and B = QaRa.
C = QIQa
Compute the SVD yrcz = diag(cos(Bk)) .
QAY( : , l:q) = [ fi l · · · l fq ]
QaZ( : , l:q) = [ g1 l · · · l gq ]
The idea of using the SVD to compute the principal angles and vectors is due to Bjorck
and Golub (1973). The problem of rank deficiency in A and B is also treated in this
paper. Principal angles and vectors arise in many important statistical applications.
The largest principal angle is related to the notion of distance between equidimensional
subspaces that we discussed in §2.5.3. If p = q, then
dist(F, G) = J1 - cos(Bp)2
6.4.4 Intersection of Subspaces
In light of the following theorem, Algorithm 6.4.3 can also be used to compute an
orthonormal basis for ran(A) n ran(B) where A E 1Rmxp and B E 1Rmxq
Theorem 6.4.2. Let {cos(Bi)}{=1 and {fi, gi}{=1 be defined by Algorithm 6.4.3. If the
index s is defined by 1 = cos(B1) = · · · = cos(lls) > cos(lls+1), then
ran(A) n ran(B) = span{fi, . . . , fs} = span{g1, . . . , 9s}·
Proof. The proof follows from the observation that if cos(lli) = 1, then fi = 9i· D
The practical determination of the intersection dimension s requires a definition of
what it means for a computed singular value to equal 1. For example, a computed
singular value ai = cos(Bi) could be regarded as a unit singular value if ai ;:::: 1 - 8 for
some intelligently chosen small parameter 8.
Problems
P6.4.1 Show that if A and B are m-by-p matrices, with p � m, then
p
min II A - BQ II� = L(ui(A)2 - 2ui(BTA) + u; (B)2).
QTQ=lp
, i=l
P6.4.2 Extend Algorithm 6.4.2 so that it computes an orthonormal basis for null(A1) n · · · n null(As)
where each matrix Ai has n columns.
P6.4.3 Extend Algorithm 6.4.3 so that it can handle the case when A and B are rank deficient.
P6.4.4 Verify Equation (6.4.2).

P6.4.5 Suppose A, B E Rmxn and that A has full column rank. Show how to compute a symmetric
matrix X E Rnxn that minimizes II AX - B llF" Hint: Compute the SVD of A.
P6.4.6 This problem is an exercise in F-norm optimization. (a) Show that if C E Rmxn and e E Rm
is a vector of ones, then v = CTe/m minimizes II C - evT llF· (b) Suppose A E Rmxn and B E Rmxn
and that we wish to solve
min II A - (B + evT)Q llF
QTQ=ln , vERn
Show that Vopt = (A- B)Te/m and Qopt = UEVT solve this problem where BT(I-eeT/m)A = uvT
is the SVD.
P6.4.7 A 3-by-3 matrix H is ROPR matrix if H = Q + xyT where Q E wx 3 rotation and x, y E R3.
(A rotation matrix is an orthogonal matrix with unit determinant. "ROPR" stands for "rank-1
perturbation of a rotation.") ROPR matrices arise in computational photography and this problem
highlights some oftheir properties. (a) If H is a ROPR matrix, then there exist rotations U, V E wx3,
such that uTHV = diag(u1 , u2, CT3) satisfies u1 2'. u2 2'. lu3I· (b) Show that if Q E R'x3 is a rotation,
then there exist cosine-sine pairs (ci, Si) = (cos(9i), sin(9;)), i = 1:3 such that Q = Q(81 , 82, 83) where
[:
0 0
][-:
s2
:][
1 0
,:l
Q(8i , 82, 83) = c1 s1 c2 0 c3
-81 CJ 0 0 -s3 C3
[ �
s2c3 s2s3
-c1s2 c1c2c3 - s1s3 '"'"' + ' ' "' l
s1 s2 -s1c2c3 - cis3 -s1c2s3 + cic3
Hint: The Givens QR factorization involves three rotations. (c) Show that if
x,y E R3
then xyT must have the form
for some µ :;::: 0 and
[ c2 - µ 1 ] [
1 c2 - µ
CJ83 ] [ � ].
(d) Show that the second singular value of a ROPR matrix is 1.
P6.4.8 Let u. E R"xd be a matrix with orthonormal columns whose span is a subspace S that we
wish to estimate. Assume that Uc E Rn xd is a given matrix with orthonormal columns and regard
ran(Uc) as the "current" estimate of S. This problem examines what is required to get an improved
estimate of S given the availability of a vector v E S. (a) Define the vectors
w = u'[v,
and assume that each is nonzero. (a) Show that if
Z9 =
and
(cos(9) - 1 ) ( sin(9) )
v1 + v2
II v1 1111 w II II v2 1111 w II
Uo = (In + z9vT)Uc,
then UlUo = Id. Thus, UoUl is an orthogonal projection. (b) Define the distance function
distp(ran(V), ran(W)) = II vvT - wwT llF

6.4. Subspace Computations with the SVD
where V, W E E' xd have orthonormal columns and show
d
distp(ran(V), ran(W))2 = 2(d - II wTv II�) = 2 L(l - ui(WTV)2).
i=l
Note that dist(ran(V), ran(W))2 = 1 - u1(WTV)2. (c) Show that
d� = d� - 2 · tr(u.u'[(U9Uf - UcUJ"))
where <k = distp(ran(U.), ran(U9)) and de = distp(ran(U.), ran(Uc)). (d) Show that if
then
and
v1 . v2
y9 = cos(9)�
+ sm(9)
II v2 II '
d2 = d� + 2 (11 U'!viJI�
- II U'fy1:1 II�).
8
II v1 lb
(e) Show that if 9 minimizes this quantity, then
Sl.n(2B) (II Psv2f _ 11 Psv1 f) + (2B)
v'[Psva
II V2 lb II v1 lb
cos
II v1 11211 v2 112
References for the Procrustes problem include:
0, Ps = u.u'[.
333
B. Green (1952). "The Orthogonal Approximation of an Oblique Structure in Factor Analysis,"
Psychometrika 1 7, 429-40.
P. Schonemann (1966). "A Generalized Solution of the Orthogonal Procrustes Problem," Psychome
trika 31, 1-10.
R.J. Hanson and M.J. Norris (1981). "Analysis of Measurements Based on the Singular Value Decom
position," SIAM J. Sci. Stat. Comput. 2, 363-374.
N.J. Higham (1988). "The Symmetric Procrustes Problem," BIT 28, 133-43.
H. Park (1991). "A Parallel Algorithm for the Unbalanced Orthogonal Procrustes Problem," Parallel
Comput. 1 7, 913-923.
L.E. Andersson and T. Elfving (1997). "A Constrained Procrustes Problem," SIAM J. Matrix Anal.
Applic. 18, 124-139.
L. Elden and H. Park (1999). "A Procrustes Problem on the Stiefel Manifold," Numer. Math. 82,
599-619.
A.W. Bojanczyk and A. Lutoborski (1999). "The Procrustes Problem for Orthogonal Stiefel Matrices,"
SIAM J. Sci. Comput. 21, 1291-1304.
If B = I, then the Procrustes problem amounts to finding the closest orthogonal matrix. This
computation is related to the polar decomposition problem that we consider in §9.4.3. Here are some
basic references:
A. Bjorck and C. Bowie (1971). "An Iterative Algorithm for Computing the Best Estimate of an
Orthogonal Matrix," SIAM J. Numer. Anal. 8, 358-64.
N.J. Higham (1986). "Computing the Polar Decomposition with Applications,'' SIAM J. Sci. Stat.
Comput. 7, 1160-1174.
Using the SYD to solve the angles-between-subspaces problem is discussed in:
A. Bjorck and G.H. Golub (1973). "Numerical Methods for Computing Angles Between Linear Sub
spaces,'' Math. Comput. 27, 579-94.
L.M. Ewerbring and F.T. Luk (1989). "Canonical Correlations and Generalized SYD: Applications
and New Algorithms," J. Comput. Appl. Math. 27, 37-52.
G.H. Golub and H. Zha (1994). "Perturbation Analysis ofthe Canonical Correlations of Matrix Pairs,"

Z. Drmac (2000). "On Principal Angles between Subspaces of Euclidean Space," SIAM J. Matrix
Anal. Applic. 22, 173-194.
A.V. Knyazev and M.E. Argentati (2002). "Principal Angles between Subspaces in an A-Based Scalar
Product: Algorithms and Perturbation Estimates," SIAM J. Sci. Comput. 23, 2008-2040.
P. Strobach (2008). "Updating the Principal Angle Decomposition," Numer. Math. 110, 83-112.
In reduced-rank regression the object is to connect a matrix ofsignals to a matrix ofnoisey observations
through a matrix that has specified low rank. An svd-based computational procedure that involves
principal angles is discussed in:
L. Elden and B. Savas (2005). "The Maximum Likelihood Estimate in Reduced-Rank Regression,''
Num. Lin. Alg. Applic. 12, 731-741,
The SYD has many roles to play in statistical computation, see:
S.J. Hammarling (1985). "The Singular Value Decomposition in Multivariate Statistics,'' ACM
SIGNUM Newsletter 20, 2-25.
An algorithm for computing the rotation and rank-one matrix in P6.4.7 that define a given ROPR
matrix is discussed in:
R. Schreiber, z. Li, and H. Baker (2009). "Robust Software for Computing Camera Motion Parame
ters,'' J. Math. Imaging Vision 33, 1-9.
For a more details about the estimation problem associated with P6.4.8, see:
L. Balzano, R. Nowak, and B. Recht (2010). "Online Identification and Tracking of Subspaces from
Highly Incomplete Information," Proceedings of the Allerton Conference on Communication, Con
trol, and Computing 2010.
6.5 Updating Matrix Factorizations
In many applications it is necessary to refactor a given matrix A E lRmxn after it has
undergone a small modification. For example, given that we have the QR factorization
of a matrix A, we may require the QR factorization of the matrix A obtained from A
by appending a row or column or deleting a row or column. In this section we show
that in situations like these, it is much more efficient to "update" A's QR factorization
than to generate the required QR factorization of A from scratch. Givens rotations
have a prominent role to play. In addition to discussing various update-QR strategies,
we show how to downdate a Cholesky factorization using hyperbolic rotations and how
to update a rank-revealing ULV decomposition.
6.5.1 Rank-1 Changes
Suppose we have the QR factorization QR = A E lRnxn and that we need to compute
the QR factorization A = A + uvT = Q1R1 where u, v E lRn are given. Observe that
(6.5.1)
where w = QTu. Suppose rotations Jn-1, . . . , h , Ji are computed such that
where each Jk is a Givens rotation in planes k and k + 1. If these same rotations are
applied to R, then
(6.5.2)

6.5. Updating Matrix Factorizations
is upper Hessenberg. For example, in the n = 4 case we start with
and then update as follows:
w +- J[w
w +- J[w
Consequently,
R +- J[R
H +- J[R
x
x
0
0
x
x
x
0
[�
[�
[�
x
x
0
0
x
x
x
0
x
x
x
0
x
x
x
x
x
x
x
x
x
x
x
x
335
(6.5.3)
is also upper Hessenberg. Following Algorithm 5.2.4, we compute Givens rotations Gk,
k = l:n - 1 such that G';;__1 · · · GfH1 = R1 is upper triangular. Combining everything
we obtain the QR factorization A = A + uvT = Q1R1 where
Qi = QJn-1 · · · J1G1 · · · Gn-1·
A careful assessment of the work reveals that about 26n2 flops are required.
The technique readily extends to the case when A is rectangular. It can also
be generalized to compute the QR factorization of A + uvr where u E R.mxp and
V E R.nxp.
6.5.2 Appending or Deleting a Column
Assume that we have the QR factorization
(6.5.4)
and for some k, 1 :::;; k :::;; n, partition the upper triangular matrix R E R.mxn as follows:
k-1
R =
m-k
k-1 n-k

Now suppose that we want to compute the QR factorization of
A = [ a1 I . . · I ak-1 I ak+l I .
. · I an ] E Rmx(n-l) .
Note that A is just A with its kth column deleted and that
is upper Hessenberg, e.g.,
x x x x x
0 x x x x
0 0 x x x
H = 0 0 x x x m
= 7, n = 6, k = 3.
0 0 0 x x
0 0 0 0 x
0 0 0 0 0
Clearly, the unwanted subdiagonal elements hk+l,k• . . . , hn,n-1 can be zeroed by a
sequence of Givens rotations: G�_1 · · · GlH = Ri. Here, Gi is a rotation in planes
i and i + 1 for i = k:n - 1. Thus, if Qi = QGk · · · Gn-1 then A = QiR1 is the QR
factorization of A.
The above update procedure can be executed in O(n2) flops and is very useful
in certain least squares problems. For example, one may wish to examine the signif
icance of the kth factor in the underlying model by deleting the kth column of the
corresponding data matrix and solving the resulting LS problem.
Analogously, it is possible to update efficiently the QR factorization of a matrix
after a column has been added. Assume that we have (6.5.4) but now want the QR
factorization of
A = [ ai I . . . I ak I z I ak+l I . . . I an l
where z E Rm is given. Note that if w = QTz then
is upper triangular except for the presence of a "spike" in its (k + l)st column, e.g.,
x x x x x x
0 x x x x x
0 0 x x x x
.A +--- QTA = 0 0 0 x x x m
= 7, n = 5, k = 3.
0 0 0 x 0 x
0 0 0 x 0 0
0 0 0 x 0 0
It is possible to determine a sequence of Givens rotations that restores the triangular
form:

6.5. Updating Matrix Factorizations
- T-
A t- 16 A =
x
0
0
0
0
0
0
x x
x x
0 x
0 0
0 0
0 0
0 0
A +--
x x
x x
x x
x x
x 0
x 0
0 0
T-
J4 A =
This update requires O(mn) flops.
x
x
x
x
x
0
0
x x
0 x
0 0
0 0
0 0
0 0
0 0
- T-
A +-- J5 A =
x x x
x x x
x x x
0 x x
0 0 x
0 0 0
0 0 0
6.5.3 Appending or Deleting a Row
337
x x x x x x
0 x x x x x
0 0 x x x x
0 0 0 x x x
0 0 0 x 0 x
0 0 0 0 0 x
0 0 0 0 0 0
x
x
x
x
x
x
0
Suppose we have the QR factorization QR = A E R.m xn
and now wish to obtain the
QR factorization of
A = [ U:: ]
where w E R.n. Note that
- [ WT ]
diag(l, QT)A =
R = H
is upper Hessenberg. Thus, rotations J1, . . . , Jn can be determined so J'f: · · · J[H =
R1 is upper triangular. It follows that A = Q1R1 is the desired QR factorization,
where Q1 = diag(l, Q)J1 · · · J
n. See Algorithm 5.2.5.
No essential complications result if the new row is added between rows k and
k + 1 of A. Indeed, if
[ �� ] = QR,
and
[ 0 I
P = Ik 0
0 0
then
dffig(J, QT)P
[::l
0
l
0 '
Im-k
=
[u;l = H

is upper Hessenberg and we proceed as before.
Lastly, we consider how to update the QR factorization QR = A E IRmxn when
the first row of A is deleted. In particular, we wish to compute the QR factorization
of the submatrix Ai in
A = [ �� ] m�i ·
(The procedure is similar when an arbitrary row is deleted.) Let qT be the first row of
Q and compute Givens rotations G1, . . . , Gm-1 such that Gf · · · G?;,_1q = ae1 where
a = ±1. Note that
is upper Hessenberg and that
where Q1 E IR(m-l)x(m-l) is orthogonal. Thus,
from which we conclude that Ai = Q1Ri is the desired QR factorization.
6.5.4 Cholesky Updating and Oowndating
Suppose we are given a symmtetric positive definite matrix A E IRnxn and its Cholesky
factor G. In the Cholesky updating problem, the challenge is to compute the Cholesky
factorization .A = car where
(6.5.5)
Noting that
(6.5.6)
we can solve this problem by computing a product of Givens rotations Q = Q1 · · · Qn
so that
(6.5.7)
is upper triangular. It follows that A = RRT and so the updated Cholesky factor is
given by G = RT. The zeroing sequence that produces R is straight forward, e.g.,

6.5. Updating Matrix Factorizations 339
The Qk update involves only rows k and n + 1. The overall process is essentially
the same as the strategy we outlined in the previous subsection for updating the QR
factorization of a matrix when a row is appended.
The Cholesky downdating problem involves a different set of tools and a new set
of numerical concerns. We are again given a Cholesky factorization A = ccr and a
vector z E Rn. However, now the challenge is to compute the Cholesky factorization
A = (j(jT where
A = A - zzT (6.5.8)
is presumed to be positive definite. By introducing the notion of a hyperbolic rotation
we can develop a downdating framework that corresponds to the Givens-based updating
framework. Define the matrix S as follows
s
=
[; �1 l (6.5.9)
and note that
(6.5.10)
This corresponds to (6.5.6), but instead of computing the QR factorization (6.5.7), we
seek a matrix H E R(n+l)x(n+l) that satisfies two properties:
HSHT = S, (6.5.11)
R E Rnxn (upper triangular). (6.5.12)
If this can be accomplished, then it follows from
that the Cholesky factor of A = A-zzT is given by G = RT. A matrix H that satisfies
(6.5.11) is said to be S-orthogonal. Note that the product of S-orthogonal matrices is
also S-orthogonal.
An important subset of the S-orthogonal matrices are the hyperbolic rotations
and here is a 4-by-4 example:
0 0 l
0 -s
1 0 ,
0 c
c = cosh(B) , s = sinh(B).
The S-orthogonality of this matrix follows from cosh(B)2 - sinh(B)2 = 1. In general,
Hk E R(n+l)x(n+l) is a hyperbolic rotation if it agrees with In+l except in four loca
tions:
[Hk]k,n+l l =
[ cosh(B) - sinh(B) l·
[Hk]n+i,n+l - sinh(B) cosh(B)

Hyperbolic rotations look like Givens rotations and, not surprisingly, can be used to
introduce zeros into a vector or matrix. However, upon consideration of the equation
[_:
-
: l [:: l =
[� l·
c2 - s2 = 1
we see that the required cosh-sinh pair may not exist. Since we always have I cosh(O)I >
I sinh(O)I, there is no real solution to -sxi + cx2 = 0 if lx2I > lxil· On the other hand,
if lxil > lx2I, then {c, s} = {cosh(0), sinh(O)} can be computed as follows:
X2 1
T = Xi '
C =
Vl - T2 ' S = C·T. (6.5.13)
There are clearly numerical issues if lxi I is just slightly greater than lx2I· However,
it is possible to organize hyperbolic rotation computations successfully, see Alexander,
Pan, and Plemmons (1988).
Putting these concerns aside, we show how the matrix H in (6.5.12) can be
computed as a product of hyperbolic rotations H = Hi · · · Hn just as the transforming
Q in the updating problem is a product of Givens rotations. Consider the role of Hi
in the n = 3 case:
[_� � �
-
; l
T
[
9
�
i
::: :::l=
0 1 0 0 0 933
0 0 C Zi Z2 Z3
Since A = GGT - zzT is positive definite, [A]11 = 9�1 - z� > 0. It follows that
1911I > lzil which guarantees that the cosh-sinh computations (6.5.13) go through.
For the overall process to be defined, we have to guarantee that hyperbolic rotations
H2 , . . . , Hn can be found to zero out the bottom row in the matrix [ GT z jT. The
following theorem ensures that this is the case.
Theorem 6.5.1. If
and
A =
[: �l =
[�i
1
�1 l [.9�1 �� l
A � A
-
zzT � A - [: l [: r
are positive definite, then it is possible to determine c = cosh(O) and s = sinh(O) so
Moreover, the matrix A1 = G1Gf - w1wf is positive definite.
-Tl
91
Gf ·
wf

Proof. The blocks in A's Cholesky factor are given by
911 = ya, 91 = v/911, T 1
T
G1G1 = B - -vv .
a
(6.5.14)
Since A - zzT is positive definite, al l - z? = 9Ii - µ2 > 0 and so from (6.5.13) with
r = µ/911 we see that
c = s = µ
Ja - µ2 ·
Since w1 = -sg1 + cw it follows from (6.5.14) and (6.5.15) that
A1 = G1Gf - w1w[ = B - .!..vvT - (-sg1 + cw)(-sg1 + cw)T
Q:
c2 SC
= B - -vvT - c2wwT + -(vwT + wvT)
Q: ya
(6.5.15)
1 T Q:
T µ T T
= B - --vv - -- ww + -- (vw + wv ).
a _ µ2 a _ µ2 o: _ µ2
It is easy to verify that this matrix is precisely the Schur complement of o: in
- [Q: - µ2 VT - µwT l
A = A - zzT =
v - µw B - wwT
and is therefore positive definite. D
The theorem provides the key step in an induction proof that the factorization (6.5.12)
exists.
6.5.5 Updating a Rank-Revealing U LV Decomposition
We close with a discussion about updating a nullspace basis after one or more rows
have been appended to the underlying matrix. We work with the ULV decomposition
which is much more tractable than the SVD from the updating point of view. We
pattern our remarks after Stewart(1993).
A rank -revealing ULV decomposition of a matrix A E 1Rmxn has the form
(6.5.16)
where L11 E JI(xr and L22 E JR,(n-r)x(n-r) are lower triangular and JI L21 Jl2 and II L22 112
are small compared to O"min(L11 ). Such a decomposition can be obtained by applying
QR with column pivoting

followed by a QR factorization V{RT = LT. In this case the matrix V in (6.5.16) is
given by V = II Vi . The parameter r is the estimated rank. Note that if
r n-r r m-r
then the columns of V
2 define an approximate nullspace:
Our goal is to produce cheaply a rank-revealing ULV decomposition for the row
appended matrix
In particular, we show how to revise L, V, and possibly r in O(n2) flops. Note that
We illustrate the key ideas through an example. Suppose n = 7 and r = 4. By
permuting the rows so that the bottom row is just underneath L, we obtain
i 0 0 0 0 0 0
i i 0 0 0 0 0
i i i 0 0 0 0
i i i i 0 0 0
f f f f f 0 0
f f f f f f 0
f f f f f f f
w w w w y y y
The f entries are small while the i, w, and y entries are not. Next, a sequence of Givens
rotations G1, . . . , G1 are applied from the left to zero out the bottom row:
x 0 0 0 0 0 0
x x 0 0 0 0 0
x x x 0 0 0 0 [Lu
L:, ]
[�] x x x x 0 0 0
G11 · · · G51G61 L21
=
0 0
x x x x x
x x x x x x 0 WT YT
x x x x x x x
0 0 0 0 0 0 0
Because this zeroing process intermingles the (presumably large) entries of the bottom
row with the entries from each of the other rows, the lower triangular form is typi
cally not rank revealing. However, and this is l.<ey, we can restore the rank-revealing
structure with a combination of condition estimation and Givens zero chasing.

Let us assume that with the added row, the new nullspace has dimension 2. With
a. reliable condition estimator we produce a unit 2-norm vector p such that
(See §3.5.4). Rotations {Ui,iH}�=l can be found such that
U[., Uft, Ulr, U� U� U�p = e1 = lr(: , 7).
Applying these rotations to L produces a lower Hessenberg matrix
Applying more rotations from the right restores H to a lower triangular form:
It follows that
�4 = W� %�� = V� %��
has approximate norm CTmin(L). Thus, we obtain a lower triangular matrix of the form
x 0 0 0 0 0 0
x x 0 0 0 0 0
x x x 0 0 0 0
L+ = x x x x 0 0 0
x x x x x 0 0
x x x x x x 0
f f f f f f f
We can repeat the condition estimation and zero chasing on the leading 6-by-6 portion.
Assuming that the nullspace of the augmented matrix has dimension two, this produces
another row of small numbers:
x 0 0 0 0 0 0
x x 0 0 0 0 0
x x x 0 0 0 0
x x x x 0 0 0
x x x x x 0 0
f f f f f f 0
f f f f f f f
This illustrates how we can restore any lower triangular matrix to rank-revealing form.
Problems
P6.5.1 Suppose we have the QR factorization for A E wxn and now wish to solve
min II (A + uvT)x - b ll2
zeRn

where u, b E Rm and v E Rn are given. Give an algorithm for solving this problem that requires
O(mn) flops. Assume that Q must be updated.
P6.5.2 Suppose
A = [ c; ] , c E Rn, B E R(m-l)xn
has full column rank and m > n. Using the Sherman-Morrison-Woodbury formula show that
1 1 II (ATA)- 1 c 11�
-
-(- :5 ---
+
T( T -1 .
Umin B) Um in (A) 1 - C A A) C
P6.5.3 As a function of x1 and x2, what is the 2-norm of the hyperbolic rotation produced by (6.5.13)?
P6.5.4 Assume that
A = [ : : J. p = � < l
Umin(R) '
where R and E are square. Show that if
Q = [ Qu
Q21
is orthogonal and
then II H1 112 :5 Pll H 1'2·
P6.5.5 Suppose A E wxn and b E Rm with m ;::: n. In the indefinite least squares (ILS) problem,
the goal is to minimize
<P(x) = (b - Ax)T J(b - Ax),
where
p + q = m.
It is assumed that p ;::: 1 and q ;::: 1. (a) By taking the gradient of q,, show that the ILS problem has
a unique solution if and only if ATSA is positive definite. (b) Assume that the ILS problem has a
unique solution. Show how it can be found by computing the Cholesky factorization of QfQ1 - QfQ2
where
A = [ �� ] .
is the thin QR factorization. (c) A matrix Q E Rmxm is S-orthogonal if QSQT = S If
Q =
p q
is S-orthogonal, then by comparing blocks in the equation QTSQ = S we have
Qf1Qu = Ip + Qf1Q21 , QftQ12 = Qf1Q22, Qf2Q22 = lq + Qf2Q12.
Thus, the singular values of Qu and Q22 are never smaller than 1. Assume that p ;::: q. By analogy
with how the CS decomposition is established in §2.5.4, show that there exist orthogonal matrices U1,
U2, Vi and V2 such that
0
lp- q
0
where D = diag(d1 , . . . , dp) with d; ;::: 1, i = l:p. This is the hyperbolic CS decomposition and details
can be found in Stewart and Van Dooren (2006).

The seminal matrix factorization update paper is:
P.E. Gill, G.H. Golub, W. Murray, and M.A. Saunders (1974). "Methods for Modifying Matrix
Factorizations," Math. Comput. 28, 505-535.
Initial research into the factorization update problem was prompted by the development of quasi
Newton methods and the simplex method for linear programming. In these venues, a linear system
must be solved in step k that is a low-rank perturbation of the linear system solved in step k - 1, see:
R.H. Bartels (1971). "A Stabilization of the Simplex Method,'' Numer. Math. 16, 414 ·434.
P.E. Gill, W. Murray, and M.A. Saunders (1975). "Methods for Computing and Modifying the LDV
Factors of a Matrix,'' Math. Comput. 29, 1051-1077.
D. Goldfarb (1976). "Factored Variable Metric Methods for Unconstrained Optimization," Math.
Comput. 30, 796-811.
J.E. Dennis and R.B. Schnabel (1983). Numerical Methods for Unconstrained Optimization and
Nonlinear Equations, Prentice-Hall, Englewood Cliffs, NJ.
W.W. Hager (1989). "Updating the Inverse of a Matrix," SIAM Review 31, 221-239.
S.K. Eldersveld and M.A. Saunders (1992). "A Block-LU Update for Large-Scale Linear Program
ming," SIAM J. Matrix Anal. Applic. 13, 191-201.
Updating issues in the least squares setting are discussed in:
J. Daniel, W.B. Gragg, L. Kaufman, and G.W. Stewart (1976). "Reorthogonaization and Stable
Algorithms for Updating the Gram-Schmidt QR Factorization," Math. Comput. 30, 772-795.
S. Qiao (1988). "Recursive Least Squares Algorithm for Linear Prediction Problems," SIAM J. Matrix
Anal. Applic. 9, 323-328.
A. Bjorck, H. Park, and L. Elden (1994). "Accurate Downdating of Least Squares Solutions," SIAM
S.J. Olszanskyj, J.M. Lebak, and A.W. Bojanczyk (1994). "Rank-k Modification lilethods for Recur
sive Least Squares Problems," Numer. Al9. 7, 325-354.
L. Elden and H. Park (1994). "Block Downdating of Least Squares Solutions," SIAM J. Matrix Anal.
Applic. 15, 1018-1034.
Kalman filtering is a very important tool for estimating the state of a linear dynamic system in the
presence of noise. An illuminating, stable implementation that involves updating the QR factorization
of an evolving block banded matrix is given in:
C.C. Paige and M.A. Saunders (1977). "Least Squares Estimation of Discrete Linear Dynamic Systems
Using Orthogonal Transformations,'' SIAM J. Numer. Anal. 14, 180 193.
The Cholesky downdating literature includes:
G.W. Stewart (1979). "The Effects of Rounding Error on an Algorithm for Downdating a Cholesky
Factorization," J. Inst. Math. Applic. 23, 203-213.
A.W. Bojanczyk, R.P. Brent, P. Van Dooren, and F.R. de Hoog (1987). "A Note on Downdating the
Cholesky Factorization," SIAM J. Sci. Stat. Comput. 8, 210-221.
C.-T. Pan (1993). "A Perturbation Analysis ofthe Problem ofDowndating a Cholesky Factorization,"
Lin. Alg. Applic. 183, 103-115.
L. Elden and H. Park (1994). "Perturbation Analysis for Block Downdating of a Cholesky Decompo
sition,'' Numer. Math. 68, 457-468.
M.R. Osborne and L. Sun (1999). "A New Approach to Symmetric Rank-One Updating,'' IMA J.
Numer. Anal. 19, 497-507.
E.S. Quintana-Orti and R.A. Van Geijn (2008). "Updating an LU Factorization with Pivoting," ACM
Trans. Math. Softw. 35(2), Article 11.
Hyperbolic tranformations have been successfully used in a number of settings:
G.H. Golub (1969). "Matrix Decompositions and Statistical Computation," in Statistical Computa
tion, ed., R.C. Milton and J.A. Nelder, Academic Press, New York, pp. 365-397.
C.M. Rader and A.O. Steinhardt (1988). "Hyperbolic Householder Transforms,'' SIAM J. Matrix

S.T. Alexander, C.T. Pan, and R.J. Plemmons (1988). "Analysis of a Recursive Least Squares Hy
perbolic Rotation Algorithm for Signal Processing," Lin. Alg. and Its Applic. 98, 3-40.
G. Cybenko and M. Berry (1990). "Hyperbolic Householder Algorithms for Factoring Structured
A.W. Bojanczyk, R. Onn, and A.O. Steinhardt (1993). "Existence of the Hyperbolic Singular Value
Decomposition," Lin. Alg. Applic. 185, 21-30.
S. Chandrasekaran, M. Gu, and A.H. Sayad (1998). "A Stable and Efficient Algorithm for the Indefinite
Linear Least Squares Problem," SIAM J. Matrix Anal. Applic. 20, 354-362.
A.J. Bojanczyk, N.J. Higham, and H. Patel (2003a). "Solving the Indefinite Least Squares Problem
by Hyperbolic QR Factorization," SIAM J. Matrix Anal. Applic. 24, 914-931.
A. Bojanczyk, N.J. Higham, and H. Patel (2003b). "The Equality Constrained Indefinite Least Squares
Problem: Theory and Algorithms," BIT 43, 505-517.
M. Stewart and P. Van Dooren (2006). "On the Factorization of Hyperbolic and Unitary Transforma-
tions into Rotations," SIAM J. Matrix Anal. Applic. 27, 876-890.
N.J. Higham (2003). "J-Orthogonal Matrices: Properties and Generation," SIAM Review 45, 504-519.
High-performance issues associated with QR updating are discussed in:
B.C. Gunter and R.A. Van De Geijn (2005). "Parallel Out-of-Core Computation and Updating of the
QR Factorization," ACM Trans. Math. Softw. 31, 60-78.
Updating and downdating the ULV and URV decompositions and related topics are covered in:
C.H. Bischof and G.M. Shroff (1992). "On Updating Signal Subspaces," IEEE Trans. Signal Proc.
40, 96-105.
G.W. Stewart (1992). "An Updating Algorithm for Subspace Tracking," IEEE Trans. Signal Proc.
40, 1535-1541.
G.W. Stewart (1993). "Updating a Rank-Revealing ULV Decomposition," SIAM J. Matrix Anal.
Applic. 14, 494-499.
G.W. Stewart (1994). "Updating URV Decompositions in Parallel," Parallel Comp. 20, 151-172.
H. Park and L. Elden (1995). "Downdating the Rank-Revealing URV Decomposition," SIAM J.
J.L. Barlow and H. Erbay (2009). "Modifiable Low-Rank Approximation of a Matrix," Num. Lin.
Alg. Applic. 16, 833--860.
Other interesting update-related topics include the updating of condition estimates, see:
W.R. Ferng, G.H. Golub, and R.J. Plemmons (1991). "Adaptive Lanczos Methods for Recursive
Condition Estimation," Numerical Algorithms 1, 1-20.
G. Shroff and C.H. Bischof (1992). "Adaptive Condition Estimation for Rank-One Updates of QR
Factorizations," SIAM J. Matrix Anal. Applic. 13, 1264-1278.
D.J. Pierce and R.J. Plemmons (1992). "Fast Adaptive Condition Estimation," SIAM J. Matrix Anal.
Applic. 13, 274--291.
and the updating of solutions to constrained least squares problems:
K. Schittkowski and J. Stoer (1979). "A Factorization Method for the Solution of Constrained Linear
Least Squares Problems Allowing for Subsequent Data changes," Numer. Math. 31, 431-463.
A. Bjorck (1984). "A General Updating Algorithm for Constrained Linear Least Squares Problems,"
SIAM J. Sci. Stat. Comput. 5, 394-402.
Finally, we mention the following paper concerned with SVD updating:
M. Moonen, P. Van Dooren, and J. Vandewalle (1992). "A Singular Value Decomposition Updating

Chapter 7
U nsymmetric Eigenvalue
Problems
7. 1 Properties and Decompositions
7.2 Perturbation Theory
7.3 Power Iterations
7. 4 The Hessenberg and Real Schur Forms
7 .5 The Practical QR Algorithm
7.6 Invariant Subspace Computations
7.7 The Generalized Eigenvalue Problem
7 .8 Hamiltonian and Product Eigenvalue Problems
7.9 Pseudospectra
Having discussed linear equations and least squares, we now direct our attention
to the third major problem area in matrix computations, the algebraic eigenvalue prob
lem. The unsymmetric problem is considered in this chapter and the more agreeable
symmetric case in the next.
Our first task is to present the decompositions of Schur and Jordan along with
the basic properties of eigenvalues and invariant subspaces. The contrasting behavior
of these two decompositions sets the stage for §7.2 in which we investigate how the
eigenvalues and invariant subspaces of a matrix are affected by perturbation. Condition
numbers are developed that permit estimation of the errors induced by roundoff.
The key algorithm of the chapter is the justly famous QR algorithm. This proce
dure is one of the most complex algorithms presented in the book and its development
is spread over three sections. We derive the basic QR iteration in §7.3 as a natural
generalization of the simple power method. The next two sections are devoted to mak
ing this basic iteration computationally feasible. This involves the introduction of the
Hessenberg decomposition in §7.4 and the notion of origin shifts in §7.5.
The QR algorithm computes the real Schur form of a matrix, a canonical form
that displays eigenvalues but not eigenvectors. Consequently, additional computations
347

348 Chapter 7. Unsymmetric Eigenvalue Problems
usually must be performed if information regarding invariant subspaces is desired. In
§7.6, which could be subtitled, "What to Do after the Real Schur Form is Calculated,''
we discuss various invariant subspace calculations that can be performed after the QR
algorithm has done its job.
The next two sections are about Schur decomposition challenges. The generalized
eigenvalue problem Ax = >.Bx is the subject of §7.7. The challenge is to compute the
Schur decomposition of B-1A without actually forming the indicated inverse or the
product. The product eigenvalue problem is similar, only arbitrarily long sequences of
products are considered. This is treated in §7.8 along with the Hamiltonian eigenprob
lem where the challenge is to compute a Schur form that has a special 2-by-2 block
structure.
In the last section the important notion of pseudospectra is introduced. It is
sometimes the case in unsymmetric matrix problems that traditional eigenvalue analysis
fails to tell the "whole story" because the eigenvector basis is ill-conditioned. The
pseudospectra framework effectively deals with this issue.
We mention that it is handy to work with complex matrices and vectors in the
more theoretical passages that follow. Complex versions of the QR factorization, the
singular value decomposition, and the CS decomposition surface in the discussion.
Reading Notes
Knowledge of Chapters 1-3 and §§5.1-§5.2 are assumed. Within this chapter
there are the following dependencies:
§7.1 -+ §7.2 -+ §7.3 -+ §7.4 -+ §7.5 -+ §7.6 -+ §7.9
.!. �
§7.7 §7.8
Excellent texts for the dense eigenproblem include Chatelin (EOM), Kressner (NMSE),
Stewart (MAE), Stewart and Sun (MPA), Watkins (MEP), and Wilkinson (AEP).
7.1 Properties and Decompositions
In this section the background necessary to develop and analyze the eigenvalue algo
rithms that follow are surveyed. For further details, see Horn and Johnson (MA).
7.1.1 Eigenvalues and Invariant Subspaces
The eigenvalues of a matrix A E <Cnxn are the n roots of its characteristic polynomial
p(z) = det(z/ - A). The set of these roots is called the spectrum of A and is denoted
by
>.(A) = { z : det(z/ - A) = 0 }.
If >.(A) = {>.i, . . . , >..n}, then
det(A)
and
tr(A) = >.1 + · · · + An

7.1. Properties and Decompositions 349
where the trace function, introduced in §6.4.1, is the sum of the diagonal entries, i.e.,
n
tr(A) = La;;.
i= l
These characterizations of the determinant and the trace follow by looking at the
constant term and the coefficient of zn-I in the characteristic polynomial.
Four other attributes associated with the spectrum of A E <Cnxn include the
Spectral Radius : p(A) = max IAI, (7.1.1)
.>.E .>.(A)
Spectral Abscissa : o:(A) = max Re(A), (7.1.2)
.>.E.>.(A)
Numerical Radius : r(A) max {lxHAxl : II x lb = 1 }, (7.1.3)
.>.E .>.(A)
Numerical Range : W(A) = {xHAx : II x 112 = 1 }. (7.1.4)
The numerical range, which is sometimes referred to as the field of valnes, obviously
includes A(A). It can be shown that W(A) is convex.
If A E A(A), then the nonzero vectors x E <Cn that satisfy Ax = AX are eigenvec
tors. More precisely, x is a right eigenvector for A if Ax = AX and a left eigenvector if
xHA = AXH. Unless otherwise stated, "eigenvector" means "right eigenvector."
An eigenvector defines a 1-dimensional subspace that is invariant with respect to
premultiplication by A. A subspace S � <Cn with the property that
x E S =} Ax E S
is said to be invariant (for A). Note that if
AX = XB,
then ran(X) is invariant and By = AY =:} A(Xy) = A(Xy). Thus, if X has full column
rank, then AX = XB implies that A(B) � A(A). If X is square and nonsingular, then
A and B = x-1AX are similar, X is a similarity transformation, and A(A) = A(B).
7.1.2 Decoupling
Many eigenvalue computations involve breaking the given problem down into a collec
tion of smaller eigenproblems. The following result is the basis for these reductions.
Lemma 7.1.1. If T E <Cnxn is partitioned as follows,
T =
then A(T) = A(T11) U A(T22).
p <J
] :

Proof. Suppose
Tx =
[Tn T12 l [X1 l A
[X1 l
0 T22 X2 X2
where X1 E <CP and x2 E <Cq. If x2 =f. 0, then T22X2 = AX2 and so A E A(T22). If X2 = 0,
then Tnx1 = AX1 and so A E A(Tn). It follows that A(T) c A(Tn) U A(T22). But since
both A(T) and A(Tn) U A(T22) have the same cardinality, the two sets are equal. 0
7.1.3 Basic Unitary Decompositions
By using similarity transformations, it is possible to reduce a given matrix to any one of
several canonical forms. The canonical forms differ in how they display the eigenvalues
and in the kind of invariant subspace information that they provide. Because of their
numerical stability we begin by discussing the reductions that can be achieved with
unitary similarity.
Lemma 7.1.2. If A E <Cnxn, B E <Cpxp, and X E <Cnxp satisfy
AX = XB, rank(X) = p,
then there exists a unitary Q E <Cnxn such that
QHAQ = T =
and A(Tn) = A(A) n A(B).
Proof. Let
x = Q
[�1 l·
p n-p
(7.1.5)
(7.1.6)
be a QR factorization of X. By substituting this into (7.1.5) and rearranging we have
[ T11 T12
] [�
1 l [�1 lB
T21 T22
where
QHAQ [ Tn T12
] n:p
T21 T22
p n-p
By using the nonsingularity of R1 and the equations T21R1 = 0 and T11R1 = Rl B,
we can conclude that T21 = 0 and A(T11) = A(B). The lemma follows because from
Lemma 7.1.1 we have A(A) = A(T) = A(T11) u A(T22). D
Lemma 7.1.2 says that a matrix can be reduced to block triangular form us
ing unitary similarity transformations if we know one of its invariant subspaces. By
induction we can readily establish the decomposition of Schur (1909).

Theorem 7.1.3 (Schur Decomposition). If A E a::nxn, then there exists a unitary
Q E <Cnxn such that
(7.1.7)
where D = diag(A1 , . . . , An) and N E a::nxn is strictly upper triangular. Furthermore,
Q can be chosen so that the eigenvalues Ai appear in any order along the diagonal.
Proof. The theorem obviously holds if n = 1. Suppose it holds for all matrices of
order n - 1 or less. If Ax = AX and x f:. 0, then by Lemma 7.1.2 (with B = (A)) there
exists a unitary U such that
U"AU =
[ A w" ] 1
0 C n-1
1 n-1
By induction there is a unitary fJ such that fJHCU is upper triangular. Thus, if
Q = U·diag{l, U), then Q"AQ is upper triangular. D
If Q = [ Q1 I · · · I Qn ] is a column partitioning of the unitary matrix Q in (7.1.7),
then the Qi are referred to as Schur vectors. By equating columns in the equations
AQ = QT, we see that the Schur vectors satisfy
k-1
Aqk = AkQk + L nikQi, k = l:n. (7.1.8)
i=l
From this we conclude that the subspaces
sk = span{qi, . . . ' Qk}, k = l:n,
are invariant. Moreover, it is not hard to show that if Qk = [ q1 I · · · I Qk ] , then
A(QffAQk) = {A1 , . . . , Ak}· Since the eigenvalues in (7.1.7) can be arbitrarily ordered,
it follows that there is at least one k-dimensional invariant subspace associated with
each subset of k eigenvalues. Another conclusion to be drawn from (7.1.8) is that the
Schur vector Qk is an eigenvector if and only if the kth column of N is zero. This
turns out to be the case for k = 1:n whenever AHA = AAH. Matrices that satisfy this
property are called normal.
Corollary 7.1.4. A E a::nxn is normal if and only if there exists a unitary Q E a::nxn
such that Q"AQ = diag(A1, . . . , An)·
Proof. See P7.1.1. D
Note that if Q"AQ = T = diag(Ai) + N is a Scliur decomposition of a general n-by-n
matrix A, then II N llF is independent of the choice of Q:
n
II N II� = II A II� - L IAil2 = Ll2(A).
i=l
This quantity is referred to as A's departure from normality. Thus, to make T "more
diagonal," it is necessary to rely on nonunitary similarity transformations.

7.1.4 Nonunitary Reductions
To see what is involved in nonunitary similarity reduction, we consider the block diag
onalization of a 2-by-2 block triangular matrix.
Lemma 7.1.5. Let T E <Cnxn be partitioned as follows:
T =
p q
] : .
Define the linear transformation <f>:<Cpxq -+ <Cpxq by
where X E <Cpxq. Then </> is nonsingular if and only if ..(T11 ) n ..(T22) = 0. If </> is
nonsingular and Y is defined by
y
=
[Ip Z l
0 Iq
where </>(Z) = -T12, then y
-
1TY = diag(T11, T22).
Proof. Suppose </>(X) = 0 for X -f. 0 and that
UHXV =
[ Er 0 ] r
0 0 p-r
r q-r
is the SYD of X with Er = diag(ai), r = rank(X). Substituting this into the equation
T11X = XT22 gives
where U8Tn U = (Aij) and V8T22V = (Bij)· By comparing blocks in this equation
it is clear that A21 = 0, B12 = 0, and ..(An) = ..(Bu). Consequently, Au and Bu
have an eigenvalue in common and that eigenvalue is in ..(Tu) n ..(T22). Thus, if </>
is singular, then T11 and T22 have an eigenvalue in common. On the other hand, if
A E ..(Tu) n ..(T22), then WC have eigenvector equations Tux = AX and y8T22 = ..y8.
A calculation shows that <f>(xyH) = 0 confirming that </> is singular.
Finally, if </> is nonsingular, then </>(Z) = -T12 has a solution and
y-iTY
= [Ip -Z l [Tu Ti2 ] [Ip Z l [Tu
0 Iq 0 T22 0 Iq 0
has the required block diagonal form. 0
TuZ - ZT22 + Ti2 ]
T22
By repeatedly applying this lemma, we can establish the following more general result.

Theorem 7.1.6 (Block Diagonal Decomposition). Suppose
(7.1.9)
0
is a Schur decomposition ofA E <Cnxn and that the Tii are square. IfA(Tii)nA(Tjj) = 0
whenever i -:/:- j, then there e:tists a nonsingular matrix Y E <Cnxn such that
(QY)-1A(QY) = diag(Tu, . . . , Tqq}· (7.1.10}
Proof. See P7.l.2. D
If each diagonal block Tii is associated with a distinct eigenvalue, then we obtain
Corollary 7.1.7. If A E <C"xn, then there exists a nonsingular X such that
(7.1.11)
where A1, . . . , Aq are distinct, the integers ni, . . . , nq satisfy nl + · · · +nq = n, and each
Ni is strictly upper triangular.
A number of important terms are connected with decomposition (7.1.11). The
integer ni is referred to as the algebraic multiplicity of Ai· If ni = 1, then Ai is said
to be simple. The geometric multiplicity of Ai equals the dimensions of null(Ni), i.e.,
the number of linearly independent eigenvectors associated with Ai· If the algebraic
multiplicity of Ai exceeds its geometric multiplicity, then Ai is said to be a defective
eigenvalue. A matrix with a defective eigenvalue is referred to as a defective matrix.
Nondefectivc matrices are also said to be diagonalizable.
Corollary 7.1.8 (Diagonal Form). A E <Cnxn is nondefective if and only if there
exi,sts a nonsingular X E <Cnxn such that
x-1AX = diag(A1 , . . . , An)· (7.1.12)
Proof. A is nondefective if and only if there exist independent vectors x1 . . . Xn E <Cn
and scalars A1 , . . . , An such that Axi = AiXi for i = 1:n. This is equivalent to the
existence of a nonsingular X = [ Xt I · · · I Xn ] E <Cnxn such that AX = XD where
D = diag(Ai , . . . , An)· D
Note that if yfl is the ith row of x-1, then yfA = AiYf. Thus, the columns of x-H
are left eigenvectors and the columns of X are right eigenvectors.
If we partition the matrix X in (7.1.11),
x = [ X1 I . . . I Xq ]
nl nq

then (Cfl = ran(X1) EB • • • EB ran{Xq), a direct sum of invariant subspaces. If the bases
for these subspaces are chosen in a special way, then it is possible to introduce even
more zeroes into the upper triangular portion of x-1AX.
Theorem 7.1.9 (Jordan Decomposition). If A E ccnxn, then there exists a non
singular X E ccnxn such that x-1AX = diag(J1 , . . . , Jq) where
Ai 1
0 Ai
Ji =
0 0
and n1 + · · · + nq = n.
Proof. See Horn and Johnson (MA, p. 330) D
0
E ccn;Xn;
1
Ai
The Ji are referred to as Jordan blocks. The number and dimensions of the Jordan
blocks associated with each distinct eigenvalue are unique, although their ordering
along the diagonal is not.
7.1.5 Some Comments on Nonunitary Similarity
The Jordan block structure of a defective matrix is difficult to determine numerically.
The set of n-by-n diagonalizable matrices is dense in ccnxn, and thus, small changes in
a defective matrix can radically alter its Jordan form. We have more to say about this
in §7.6.5.
A related difficulty that arises in the eigenvalue problem is that a nearly defective
matrix can have a poorly conditioned matrix of eigenvectors. For example, any matrix
X that diagonalizes
A = [l + e 1 l·
0 1 - e
has a 2-norm condition of order 1/e.
0 < € « 1, (7.1.13)
These observations serve to highlight the difficulties associated with ill-conditioned
similarity transformations. Since
fl{x-1AX) = x-1AX + E, (7.1.14)
where
(7.1.15)
it is clear that large errors can be introduced into an eigenvalue calculation when we
depart from unitary similarity.

7.
1. 6 Singular Values and Eigenvalues
Since the singular values of A and its Schur decomposition QHAQ = diag(>.i) + N are
the same, it follows that
From what we know about the condition of triangular matrices, it may be the case that
max
l �i,j�n
See §5.4.3. This is a reminder that for nonnormal matrices, eigenvalues do not have
the "predictive power" of singular values when it comes to Ax = b sensitivity matters.
Eigenvalues of nonnormal matrices have other shortcomings, a topic that is the focus
of §7.9.
Problems
P7.1.l (a) Show that if T E ccnxn is upper triangular and normal, then T is diagonal. (b) Show that
if A is normal and QHAQ = T is a Schur decomposition, then T is diagonal. (c) Use (a) and (b) to
complete the proof of Corollary 7.1.4.
P7.1.2 Prove Theorem 7.1.6 by using induction and Lemma 7.1.5.
P7.1.3 Suppose A E ccnxn has distinct eigenvalues. Show that if QHAQ = T is its Schur decomposi
tion and AB = BA, then QHBQ is upper triangular.
P7.1.4 Show that if A and BH are in (Cmxn with m 2 n, then
>.(AB) = >.(BA) u { 0, . . . , 0 }.
"'-.--'
m-n
P7.l.5 Given A E (Cnxn, use the Schur decomposition to show that for every E > 0, there exists a
diagonalizable matrix B such that II A - B 112 � f. This shows that the set of diagonalizable matrices
is dense in ccnxn and that the Jordan decomposition is not a continuous matrix decomposition.
P7.1.6 Suppose Ak -+ A and that Q{!AkQk = Tk is a Schur decomposition of Ak· Show that {Qk}
has a converging subsequence {Qk; } with the property that
Jim Qk. = Q
i-HX> 1.
where QHAQ = T is upper triangular. This shows that the eigenvalues of a matrix are continuous
functions of its entries.
P7.1.7 Justify (7.1.14) and (7.1.15).
P7.1.8 Show how to compute the eigenvalues of
M = [ � g ];
k j
where A, B, C, and D are given real diagonal matrices.
P7.l.9 Use the Jordan decomposition to show that if all the eigenvalues of a matrix A are strictly
less than unity, then limk-+oo Ak = 0.
P7.l.10 The initial value problem
x(t)
iJ(t)
y(t),
-x(t),
x(O) = 1,
y(O) = 0,
has solution x(t) = cos(t) and y(t) = sin(t). Let h > 0. Here are three reasonable iterations that can
be used to compute approximations Xk :::::: x(kh) and Yk :::::: y(kh) assuming that xo = 1 and Yk = 0:

Method 1:
Method 2:
Method 3:
Express each method in the form
Xk + hyk,
Yk - hxk+l •
Xk + hYk+l •
Yk - hxk+l ·
[ :::� ] Ah [ :: ]
where Ah is a 2-by-2 matrix. For each case, compute A(Ah) and use the previous problem to discuss
Iim xk and lim yk as k -+ oo .
P7.1.ll If J E Jldxd is a Jordan block, what is K.oo(J)?
P7.1.12 Suppose A, B E <Cnxn. Show that the 2n-by-2n matrices
[ AB
M1 = B � ]
are similar thereby showing that A(AB) = A(BA).
0
BA
P7.1.13 Suppose A E E'xn. We say that B E E'xn is the Drazin inverse of A if (i) AB = BA, (ii)
BAB = B, and (iii) the spectral radius of A-ABA is zero. Give a formula for B in terms of the Jordan
decomposition of A paying particular attention to the blocks associated with A's zero eigenvalues.
P7.1.14 Show that if A E Rnxn, then p(A) � (u1 · · · <Tn)l/n where <Ti , . . . , <Tn are the singular values
of A.
P7.1.15 Consider the polynomial q(x) = det(In + xA) where A E Rnxn. We wish to compute the
coefficient of x2. (a) Specify the coefficient in terms of the eigenvalues A1 , . • . , An of A. (b) Give a
simple formula for the coefficient in terms of tr(A) and tr(A2).
P7.1.16 Given A E R2x2, show that there exists a nonsingular X E R2x2 so x-1AX = AT. See
Dubrulle and Parlett (2007).
For additional discussion about the linear algebra behind theeigenvalue problem, see Horn and Johnson
(MA) and:
L. Mirsky (1963). An Introduction to Linear Algebra, Oxford University Press, Oxford, U.K.
M. Marcus and H. Mine (1964). A Survey of Matrix Theory and Matrix Inequalities, Allyn and Bacon,
Boston.
R. Bellman (1970). Introduction to Matrix Analysis, second edition, McGraw-Hill, New York.
I. Gohberg, P. Lancaster, and L. Rodman (2006). Invariant Subspaces of Matrices with Applications,
For a general discussion about the similarity connection between a matrix and its transpose, see:
A.A. Dubrulle and B.N. Parlett (2010). "Revelations of a Transposition Matrix," J. Comp. and Appl.
Math. 233, 1217-1219.
The Schur decomposition originally appeared in:
I. Schur (1909). "On the Characteristic Roots of a Linear Substitution with an Application to the
Theory of Integral Equations." Math. Ann. 66, 488-510 (German).
A proof very similar to ours is given in:
H.W. Turnbull and A.C. Aitken (1961). An Introduction to the Theory of Canonical Forms, Dover,
New York, 105.

7.2. Perturbation Theory 357
7.2 Perturbation Theory
The act of computing eigenvalues is the act of computing zeros of the characteristic
polynomial. Galois theory tells us that such a process has to be iterative if n > 4 and
so errors arise because of finite termination. In order to develop intelligent stopping
criteria we need an informative perturbation theory that tells us how to think about
approximate eigenvalues and invariant subspaces.
7.2.1 Eigenvalue Sensitivity
An important framework for eigenvalue computation is to produce a sequence of sim
ilarity transformations {Xk} with the property that the matrices x;;1AXk are pro
gressively "more diagonal." The question naturally arises, how well do the diagonal
elements of a matrix approximate its eigenvalues?
Theorem 7.2.1 (Gershgorin Circle Theorem). If x-1AX = D + F where D =
diag(d1, . . . , dn) and F has zero diagonal entries, then
n
.X(A) c LJ Di
where Di
n
{z E <D : iz - dil < L lfijl}.
j=l
i=l
Proof. Suppose .X E .X(A) and assume without loss of generality that .X =f. di for
i = l:n. Since (D - .XI) + F is singular, it follows from Lemma 2.3.3 that
for some k, 1 � k � n. But this implies that .X E Dk. D
It can also be shown that if the Gcrshgorin disk Di is isolated from the other disks,
then it contains precisely one eigenvalue of A. Sec Wilkinson (AEP, pp. 71ff.).
For some methods it is possible to show that the computed eigenvalues are the
exact eigenvalues of a matrix A+E where E is small in norm. Consequently, we should
understand how the eigenvalues of a matrix can be affected by small perturbations.
Theorem 7.2.2 (Bauer-Fike). Ifµ is an eigenvalue ofA + E E <Dnxn and x-1AX =
D = diag(.X1, . . . , An), then
min l.X - µj < Kp(X) ll E lip
AEA(A)
where II llP denotes any of the p-norms.
Proof. If µ E .X(A), then the theorem is obviously true. Otherwise if the matrix
x-1 (A + E - µI)X is singular, then so is I + (D - µJ)-1(X-1EX). Thus, from

Lemma 2.3.3 we obtain
Since (D - µJ)-1 is diagonal and the p-norm of a diagonal matrix is the absolute value
of the largest diagonal entry, it follows that
II (D - µJ)-1 llP = max -
IA
1
I '
AEA(A) - µ
completing the proof. D
An analogous result can be obtained via the Schur decomposition:
Theorem 7.2.3. Let QHAQ = D + N be a Schur decomposition of A E <Cnxn as in
(7.1. 7). If µ E A(A + E) and p is the smallest positive integer such that INIP = 0, then
where
Proof. Define
min IA - µI :::::; max{9, 91/P}
AEA(A)
p-1
9 = ll E ll2 L ll N ll; ·
k=O
The theorem is clearly true if o = 0. If o > 0, then I - (µI - A)-1E is singular and by
Lemma 2.3.3 we have
1 :::::; II (µI -A)-
1E 112 :::::; II (µI - A)-1 11211 E 112
= II ((µI - D) - N)-1 11211 E 112 .
(7.2.1)
Since (µI - D)-1 is diagonal and INIP = 0, it follows that ((µI -D)-1N)P = 0. Thus,
and so
If o > 1, then
p-1
((µI - D) - N)-1 = L ((µI - D)-1N)
k (µI - D)-1
k=O
t
p-l
(II N ll2)k
II ((µI - D) - N)-1 112 < - L --
0 k=O 0
p-1
II (µI - D) - N)-1 112 < � L II N 11;
k=O

7.2. Perturbation Theory
and so from (7.2.1), 8 ::; 0. If 8 :S 1, then
p-1
II (µI - D) - N)-
1 112 ::; :p L II N 11;.
k=O
By using (7.2.1) again we have 8P ::; () and so 8 ::; ma.x{0, ()1/P}. 0
359
Theorems 7.2.2 and 7.2.3 suggest that the eigenvalues of a nonnormal matrix may be
sensitive to perturbations. In particular, if 11:2(X) or II N 11�-l is large, then small
changes in A can induce large changes in the eigenvalues.
7.2.2 The Condition of a Simple Eigenvalue
Extreme eigenvalue sensitivity for a matrix A cannot occur if A is normal. On the
other hand, nonnormality does not necessarily imply eigenvalue sensitivity. Indeed, a
nonnormal matrix can have a mixture of well-conditioned and ill-conditioned eigen
values. For this reason, it is beneficial to refine our perturbation theory so that it is
applicable to individual eigenvalues and not the spectrum as a whole.
To this end, suppose that A is a simple eigenvalue of A E <Cnxn and that x and
y satisfy Ax = AX and yHA = AYH with II x 112 = II y 112 = 1. If yHAX = J is the
Jordan decomposition with YH = x-1 , then y and x are nonzero multiples of X(:, i)
and Y(:, i) for some i. It follows from 1 = Y(:, i)HX(:, i) that yHx # 0, a fact that we
shall use shortly.
Using classical results from function theory, it can be shown that in a neighbor
hood of the origin there exist differentiable x(i::) and A(t:) such that
where A(O) = A and x(O) = x. By differentiating this equation with respect to i:: and
setting i:: = 0 in the result, we obtain
Ax(O) + Fx �(O)x + Ax(O).
Applying yH to both sides ofthis equation, dividing by yHx, and taking absolute values
gives
The upper bound is attained if F = yxH. For this reason we refer to the reciprocal of
(7.2.2)
as the condition of the eigenvalue A.
Roughly speaking, the above analysis shows that O(i::) perturbations in A can
induce i::/s(A) changes in an eigenvalue. Thus, if s(A) is small, then A is appropriately
regarded as ill-conditioned. Note that s(A) is the cosine of the angle between the left
and right eigenvectors associated with A and is unique only if A is simple.

A small s(-) implies that A is near a matrix having a multiple eigenvalue. In
particular, if , is distinct and s(-) < 1, then there exists an E such that , is a repeated
eigenvalue of A + E and
II E 112 < s(-)
II A 112 - y'l - s(-)2
This result is proved by Wilkinson (1972).
7.2.3 Sensitivity of Repeated Eigenvalues
If , is a repeated eigenvalue, then the eigenvalue sensitivity question is more compli
cated. For example, if
and
then -(A + EF) = {l ± y'ffl}. Note that if a =F 0, then it follows that the eigenvalues
of A + €F are not differentiable at zero; their rate of change at the origin is infinite. In
general, if , is a defective eigenvalue of A, then O(t-:) perturbations in A can result in
0(€1/P) perturbations in , if , is associated with a p-dimensional Jordan block. See
Wilkinson (AEP, pp. 77ff.) for a more detailed discussion.
7.2.4 Invariant Subspace Sensitivity
A collection of sensitive eigenvectors can define an insensitive invariant subspace pro
vided the corresponding cluster of eigenvalues is isolated. To be precise, suppose
is a Schur decomposition of A with
Q
r 11 - T
r n - r
(7.2.3)
(7.2.4)
It is clear from our discussion of eigenvector perturbation that the sensitivity of the
invariant subspace ran(Q1) depends on the distance between -(T11 ) and -(T22). The
proper measure of this distance turns out to be the smallest singular value of the linear
transformation X -+ T1 1X - XT22· (Recall that this transformation figures in Lemma
7.1.5.) In particular, if we define the separation between the matrices T11 and T22 by
min
X�O
then we have the following general result:
II TuX - XT22 llF
11 x 11F
(7.2.5)

7.2. Perturbation Theory 361
Theorem 7.2.4. Suppose that (7.2.3} and (7.2.4) hold and that .for any matrix
E E ccnxn we partition QHEQ as follows:
r
Ifsep(T11, T22) > 0 and
II E llF
(i +
511 T12 llF )
sep(Tn, T22)
then there exists a P E <C(n-r)xr with
n-r
II II 4 II E21 llF
p F ::;
sep(Tn, T22)
such that the columns of Q1 = (Q1 + Q2P)(I + pHP)-1!2 are an orthonorwal basis
for a subspace invariant for A + E.
Proof. This result is a slight recasting of Theorem 4.11 in Stewart (1973) which should
be consulted for proof detail::;. See also Stewart and Sun (MPA, p. 230). The matrix
(I + pHP)-112 is the inverse of the square root of the symmetric positive definite
matrix I + pHP. Sec §4.2.4. 0
Corollary 7.2.5. If the assumptions in Theorem 7.2.4 hold, then
dist(ran(Q1), ran(Q1)) ::; 4 11
(
�21 1;.
)
.
sep 11 , 22
Proof. Using the SVD of P, it can be shown that
II P(I + pHP)-112 112 ::; II p 112 ::; II p llp· (7.2.6)
Since the required distance is the 2-norm of Q!jQ1 = P(I + pHP)-1!2, the proof is
complete. 0
Thus, the reciprocal of sep(T11, T22) can be thought of as a condition number that
measures the sensitivity of ran(Qi ) as an invariant subspace.
7.2.5 Eigenvector Sensitivity
If we set r = 1 in the preceding subsection, then the analysis addresses the issue of
eigenvector sensitivity.
Corollary 7.2.6. Suppose A, E E ccnxn and that Q = [ Ql I Q2 ] E ccnxn is unitary
with QI E ccn. Assume
1 n-1 1 n-1

{Thus, qi is an eigenvector.) If a = Umin(T22 - >..!) > 0 and
then there exists p E <Cn-t with
ll P ll2 $ 4�
a
such that Qi = (qi +Q2p)/JI + pHp is a unit 2-norm eigenvectorfor A+ E. Moreover,
dist(span{qt}, span{qi}) $ 4�.
a
Proof. The result follows from Theorem 7.2.4, Corollary 7.2.5, and the observation
that if Tu = A, then sep(T11, T22) = Umin(T22 - >..!). D
Note that Umin(T22 - .M) roughly measures the separation of A from the eigenvalues of
T22. We have to say "roughly" because
and the upper bound can be a gross overestimate.
That the separation of the eigenvalues should have a bearing upon eigenvector
sensitivity should come as no surprise. Indeed, if A is a nondefective, repeated eigen
value, then there are an infinite number of possible eigenvector bases for the associated
invariant subspace. The preceding analysis merely indicates that this indeterminancy
begins to be felt as the eigenvalues coalesce. In other words, the eigenvectors associated
with nearby eigenvalues are "wobbly."
Problems
P7.2.1 Suppose QHAQ = 'diag(A1) + N is a Schur decomposition of A E <Cnxn and define 11(A) =
II AHA - AAH IIF" The upper and lower bounds in
11(A)2 2
Jn3 - n
611 A II� � II N llF �
�11(A)
are established by Henrici (1962) and Eberlein (1965), respectively. Verify these results for the case
n = 2.
P7.2.2 Suppose A E <Cnxn and x-1AX = diag(A1, . . . , An) with distinct Ai· Show that ifthe columns
of X have unit 2-norm, then 11:p(X)2 = n(l/s(A1)2 + · · · + 1/s(An)2).
P7.2.3 Suppose QHAQ = diag(Ai) + N is a Schur decomposition of A and that x-1AX = diag (Ai)·
Show 11:2(X)2 � 1 + (II N ll F/11 A llF)2. See Loizou (1969).
P7.2.4 If x-1AX = diag (Ai) and IA1 I � . . . � IAnl, then
ui(A)
11:2(X)
� !Ail � 1t2(X)ui(A) .
Prove this result for the n = 2 case. See Ruhe (1975).
P7.2.5 Show that if A = [ � � ] and a =F b, then s(a) = s(b) = (1 + lc/(a - b)l2)-112.

7.2. Perturbation Theory
P7.2.6 Suppose
A = [ �
and that .>. r/. >.(T22). Show that if a = sep(.>., T22), then
1
s(.>.) =
J1 + 11 (T22 - >.I) - 1v II�
where s(.>.) is defined in (7.2.2).
363
::;
Ja2
+ 11 v 11�
a
P7.2.7 Show that the condition of a simple eigenvalue is preserved under unitary similarity transfor
mations.
P7.2.8 With the same hypothesis as in the Bauer-Fike theorem (Theorem 7.2.2), show that
P7.2.9 Verify (7.2.6).
min l.X - µI ::; 11 IX- 1 1 IEI IXI llv·
.>.E.>.(A)
P7.2.10 Show that if B E ccmxm and c E ccnxn, then sep(B, C) is less than or equal to I>- - µI for
all .>. E .>.(B) and µ E .>.(C).
Many of the results presented in this section may be found in Wilkinson (AEP), Stewart and Sun
(MPA) as well as:
F.L. Bauer and C.T. Fike (1960). "Norms and Exclusion Theorems," Numer. Math. 2, 123-44.
A.S. Householder (1964). The Theory of Matrices in Numerical Analysis. Blaisdell, New York.
R. Bhatia (2007). Perturbation Bounds for Matrix Eigenvalues, SIAM Publications, Philadelphia,
PA.
Early papers concerned with the effect of perturbations on the eigenvalues of a general matrix include:
A. Ruhe (1970). "Perturbation Bounds for Means of Eigenvalues and Invariant Subspaces," BIT 10,
343-54.
A. Ruhe (1970). "Properties of a Matrix with a Very Ill-Conditioned Eigenproblem," Numer. Math.
15, 57-60.
J.H. Wilkinson (1972). "Note on Matrices with a Very Ill-Conditioned Eigenproblem," Numer. Math.
19, 176-78.
W. Kahan, B.N. Parlett, and E. Jiang (1982). "Residual Bounds on Approximate Eigensystems of
Nonnormal Matrices," SIAM J. Numer. Anal. 19, 470-484.
J.H. Wilkinson (1984). "On Neighboring Matrices with Quadratic Elementary Divisors," Numer.
Math. 44, 1-21.
Wilkinson's work on nearest defective matrices is typical of a growing body of literature that is
concerned with "nearness" problems, see:
A. Ruhe (1987). "Closest Normal Matrix Found!," BIT 27, 585-598.
J.W. Demmel (1987). "On the Distance to the Nearest Ill-Posed Problem," Numer. Math. 51,
251-289.
J.W. Demmel (1988). "The Probability that a Numerical Analysis Problem is Difficult," Math. Com
put. 50, 449-480.
N.J. Higham (1989). "Matrix Nearness Problems and Applications," in Applications ofMatrix Theory,
M.J.C. Gover and S. Barnett (eds.), Oxford University Press, Oxford, 1-27.
A.N. Malyshev (1999). "A Formula for the 2-norm Distance from a Matrix to the Set of Matrices with
Multiple Eigenvalues," Numer. Math. 83, 443-454.
J.-M. Gracia (2005). "Nearest Matrix with Two Prescribed Eigenvalues," Lin. Alg. Applic. 401,
277-294.
An important subset of this literature is concerned with nearness to the set of unstable matrices. A
matrix is unstable if it has an eigenvalue with nonnegative real part. Controllability is a related notion,
see:

C. Van Loan (1985). "How Near is a Stable Matrix to an Unstable Matrix?," Contemp. Math. 47,
465-477.
J.W. Demmel (1987). "A Counterexample for two Conjectures About Stability," IEEE Trans. Autom.
Contr. AC-32, 340-342.
R. Byers (1988). "A Bisection Method for Measuring the distance of a Stable Matrix to the Unstable
Matrices," J. Sci. Stat. Comput. 9, 875-881.
J.V. Burke and M.L. Overton (1992). "Stable Perturbations of Nonsymmetric Matrices," Lin. Alg.
Applic. 1 71, 249-273.
C. He and G.A. Watson (1998). "An Algorithm for Computing the Distance to Instability," SIAM J.
M. Gu, E. Mengi, M.L. Overton, J. Xia, and J. Zhu (2006). "Fast Methods for Estimating the Distance
to Uncontrollability," SIAM J. Matrix Anal. Applic. 28, 477-502.
Aspects of eigenvalue condition are discussed in:
C. Van Loan (1987). "On Estimating the Condition of Eigenvalues and Eigenvectors," Lin. Alg.
Applic. 88/89, 715-732.
C.D. Meyer and G.W. Stewart (1988). "Derivatives and Perturbations of Eigenvectors,'' SIAM J.
Numer. Anal. 25, 679-691.
G.W. Stewart and G. Zhang (1991). "Eigenvalues of Graded Matrices and the Condition Numbers of
Multiple Eigenvalues," Numer. Math. 58, 703-712.
J.-G. Sun (1992). "On Condition Numbers of a Nondefcctive Multiple Eigenvalue," Numer. Math.
61, 265-276.
S.M. Rump (2001). "Computational Error Bounds for Multiple or Nearly Multiple Eigenvalues,'' Lin.
Alg. Applic. 324, 209-226.
The relationship between the eigenvalue condition number, the departure from normality, and the
condition of the eigenvector matrix is discussed in:
P. Henrici (1962). "Bounds for Iterates, Inverses, Spectral Variation and Fields of Values of Non
normal Matrices," Numer. Math. 4, 24 40.
P. Eberlein (1965). "On Measures of Non-Normality for Matrices," AMS Monthly 72, 995-996.
R.A. Smith (1967). "The Condition Numbers of the Matrix Eigenvalue Problem," Numer. Math. 10
232-240.
G. Loizou (1969). "Nonnormality and Jordan Condition Numbers of Matrices,'' J. ACM 16, 580-640.
A. van der Sluis (1975). "Perturbations of Eigenvalues of Non-normal Matrices," Commun. ACM 18,
30-36.
S.L. Lee (1995). "A Practical Upper Bound for Departure from Normality," SIAM J. Matrix Anal.
Applic. 16, 462 468.
Gershgorin's theorem can be used to derive a comprehensive perturbation theory. The theorem itself
can be generalized and extended in various ways, see:
R.S. Varga (1970). "Minimal Gershgorin Sets for Partitioned Matrices," SIAM .J. Numer. Anal. 7,
493-507.
R.J. Johnston (1971). "Gershgorin Theorems for Partitioned Matrices," Lin. Alg. Applic. 4, 205-20.
R.S. Varga and A. Krautstengl (1999). "On Gergorin-type Problems and Ovals of Cassini," ETNA 8,
15-20.
R.S. Varga (2001). "Gergorin-type Eigenvalue Inclusion Theorems and Their Sharpness," ETNA 12,
113-133.
C. Beattie and l.C.F. Ipsen (2003). "Inclusion Regions for Matrix Eigenvalues," Lin. Alg. Applic.
358, 281-291.
In our discussion, the perturbations to the A-matrix are general. More can be said when the pertur
bations are structured, see:
G.W. Stewart (2001). "On the Eigensystems of Graded Matrices," Numer. Math. 90, 349-370.
J. Moro and F.M. Dopico (2003). "Low Rank Perturbation of Jordan Structure," SIAM J. Matrix
Anal. Applic. 25, 495-506.
R. Byers and D. Kressner (2004). "On the Condition of a Complex Eigenvalue under Real Perturba
tions," BIT 44, 209-214.
R. Byers and D. Kressner (2006). "Structured Condition Numbers for Invariant Subspaces," SIAM J.

7.3. Power Iterations 365
An absolute perturbation bound comments on the difference between an eigenvalue .>. and its pertur
bation 5.. A relative perturbation bound examines the quotient l.X - 5.1/l.XI, something that can be
very important when there is a concern about a small eigenvalue. For results in this direction consult:
R.-C. Li (1997). "Relative Perturbation Theory. III. More Bounds on Eigenvalue Variation," Lin.
Alg. Applic. 266, 337-345.
S.C. Eisenstat and l.C.F. Ipsen (1998). "Three Absolute Perturbation Bounds for Matrix Eigenvalues
Imply Relative Bounds," SIAM J. Matrix Anal. Applic. 20, 149-158.
S.C. Eisenstat and l.C.F. Ipsen (1998). "Relative Perturbation Results for Eigenvalues and Eigenvec
tors of Diagonalisable Matrices," BIT 38, 502-509.
I.C.F. Ipsen (1998) . "Relative Perturbation Results for Matrix Eigenvalues and Singular Values," Acta
Numerica, 7, 151-201.
l.C.F. Ipsen (2000). "Absolute and Relative Perturbation Bounds for Invariant Subspaces ofMatrices,"
I.C.F. Ipsen (2003). "A Note on Unifying Absolute and Relative Perturbation Bounds," Lin. Alg.
Applic. 358, 239-253.
Y. Wei, X. Li, F. Bu, and F. Zhang (2006). "Relative Perturbation Bounds for the Eigenvalues of
Diagonalizable and Singular Matrices-Application to Perturbation Theory for Simple Invariant
Subspaces," Lin. Alg. Applic. 419, 765-771.
The eigenvectors and invariant subspaces of a matrix also "move" when there are perturbations.
Tracking these changes is typically more challenging than tracking changes in the eigenvalues, see:
T. Kato (1966). Perturbation Theory for Linear Operators, Springer-Verlag, New York.
C. Davis and W.M. Kahan (1970). "The Rotation of Eigenvectors by a Perturbation, III," SIAM J.
Numer. Anal. 7, 1-46.
G.W. Stewart (1971). "Error Bounds for Approximate Invariant Subspaces of Closed Linear Opera
tors," SIAM. J. Numer. Anal. 8, 796-808.
G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with Certain Eigen-
value Problems," SIAM Review 15, 727-764.
J. Xie (1997). "A Note on the Davis-Kahan sin(28) Theorem," Lin. Alg. Applic. 258, 129-135.
S.M. Rump and J.-P.M. Zemke (2003). "On Eigenvector Bounds," BIT 43, 823-837.
Detailed analyses of the function sep(.,.) and the map X -t AX + XAT are given in:
J. Varah (1979). "On the Separation of Two Matrices," SIAM ./. Numer. Anal. 16, 216·-22.
R. Byers and S.G. Nash (1987). "On the Singular Vectors of the Lyapunov Operator," SIAM J. Alg.
Disc. Methods 8, 59-66.
Suppose that we are given A E <Cnxn and a unitary U0 E <Cnxn. Recall from §5.2.10 that
the Householder QR factorization can be extended to complex matrices and consider
the following iteration:
To = U/!AUo
for k = 1, 2, . . .
end
Tk-1 = UkRk
Tk = RkUk
(QR factorization)
Since Tk = RkUk = Uf!(UkRk)Uk = Uf!Tk-1Uk it follows by induction that
Tk = (UoU1 · · · Uk)HA(UoU1 · · · Uk)·
(7.3.1)
(7.3.2)
Thus, each Tk is unitarily similar to A. Not so obvious, and what is a central theme
of this section, is that the Tk almost always converge to upper triangular form, i.e.,
(7.3.2) almost always "converges" to a Schur decomposition of A.

Iteration (7.3.1) is called the QR iteration, and it forms the backbone of the most
effective algorithm for computing a complete Schur decomposition of a dense general
matrix. In order to motivate the method and to derive its convergence properties, two
other eigenvalue iterations that are important in their own right are presented first:
the power method and the method of orthogonal iteration.
7.3. 1 The Power Method
Suppose A E <Cnxn and x-1AX = diag(A1 , . . . ' An) with x = [ X1 I · . · I Xn ] . Assume
that
Given a unit 2-norm q<0> E <Cn, the power method produces a sequence of vectors q(k)
as follows:
for k = 1, 2, . . .
z(k) = Aq(k-1)
end
q(k) = z(k)/II z(k) 112
A(k) = [qCk)]HAq(k)
(7.3.3)
There is nothing special about usingthe 2-norm for normalization except that it imparts
a greater unity on the overall discussion in this section.
Let us examine the convergence properties of the power iteration. If
and a1 =I- 0, then
Since q(k) E span{AkqC0>} we conclude that
It is also easy to verify that
(7.3.4)
(7.3.5)
Since A1 is larger than all the other eigenvalues in modulus, it is referred to as a
dominant eigenvalue. Thus, the power method converges if A1 is dominant and if q<0>
has a component in the direction of the corresponding dominant eigenvector X1. The
behavior of the iteration without these assumptions is discussed in Wilkinson (AEP, p.
570) and Parlett and Poole (1973).

In practice, the usefulness of the power method depends upon the ratio l-X2l/l-Xil,
since it dictates the rate of convergence. The danger that q<0> is deficient in x1 is less
worrisome because rounding errors sustained during the iteration typically ensure that
subsequent iterates have a component in this direction. Moreover, it is typically the
case in applications that one has a reasonably good guess as to the direction of x1.
This guards against having a pathologically small coefficient a1 in (7.3.4).
Note that the only thing required to implement the power method is a procedure
for matrix-vector products. It is not necessary to store A in an n-by-n array. For
this reason, the algorithm is of interest when the dominant eigenpair for a large sparse
matrix is required. We have much more to say about large sparse eigenvalue problems
in Chapter 10.
Estimates for the error 1-X(k) - -Xi i can be obtained by applying the perturbation
theory developed in §7.2.2. Define the vector
rCk) = Aq(k) _
,xCk)q(k)
and observe that (A + E(k))q(k) = ,X(k)q(k) where E(k) = -r(k)[q(k)]H. Thus ,X(k) is
an eigenvalue of A + E(k) and
I ,x(k) - A1 I =
If we use the power method to generate approximate right and left dominant eigen
vectors, then it is possible to obtain an estimate of s(.Xi). In particular, if wCk) is a
unit 2-norm vector in the direction of (AH)kwC0>, then we can use the approximation
s(.X1) � I wCk)Hq(k) 1.
7.3.2 Orthogonal Iteration
A straightforward generalization of the power method can be used to compute higher
dimensional invariant subspaces. Let r be a chosen integer satisfying 1 :::::; r :::::; n.
Given A E <Cnxn and an n-by-r matrix Q0 with orthonormal columns, the method of
orthogonal iteration generates a sequence of matrices {Qk} � <Cnxr and a sequence of
eigenvalue estimates {A�k), . . . , A�k)} as follows:
for k = 1, 2, . . .
end
Zk = AQk-1
QkRk = Zk (QR factorization)
( HAQ ) { (k) (k)}
.X Qk k = -X1 , . • . , Ar
(7.3.6)
Note that if r = 1, then this is just the power method (7.3.3). Moreover, the se
quence {Qkei} is precisely the sequence of vectors produced by the power iteration
with starting vector q<0> = Qoe1.
In order to analyze the behavior of this iteration, suppose that
(7.3.7)

is a Schur decomposition of A E <Cnxn. Assume that 1 :::; r < n and partition Q and T
as follows:
Q = [ Qo; I Q{3 l
r n-r
T =
[ T11 T12 ] r
0 T22 n-r
r n-r
(7.3.8)
If IAr I > IAr+1 1, then the subspace Dr(A) ran(Qo: ) is referred to as a dominant
invariant subspace. It is the unique invariant subspace associated with the eigenval
ues ..1 , . . . , Ar· The following theorem shows that with reasonable assumptions, the
subspaces ran(Qk) generated by (7.3.6) converge to Dr(A) at a rate proportional to
l.Ar+i/Arlk·
Theorem 7.3.1. Let the Schur decomposition of A E <Cnxn be given by {7.3. 7) and
{7.3.8) with n � 2. Assume that I.Ari > l.Ar+1 I and that µ � 0 satisfies
(1 + µ) I.Ar i > II N llF·
Suppose Qo E <Cnxr
has orthonormal columns and that dk is defined by
dk = dist(Dr(A), ran(Qk)), k � 0.
If
do < 1,
then the matrices Qk generated by (7.3.6) satisfy
[I.A I + �]
dk < (l + µ)n-2 · (l + ll T12 llF )·
r+
l l + µ
-
sep(Tu , T22)
l.A,. I _�
l + µ
k
do
J1 - d6
.
Proof. The proof is given in an appendix at the end of this section. D
(7.3.9)
(7.3.10)
The condition (7.3.9) ensures that the initial matrix Qo is not deficient in certain
eigendirections. In particular, no vector in the span of Qo's columns is orthogonal to
Dr(AH). The theorem essentially says that if this condition holds and if µ is chosen
large enough, then
dist(Dr(A), ran(Qk)) � c
I,�:1 lk
where c
depends on sep(T11 , T22) and A's departure from normality.
It is possible to accelerate the convergence in orthogonal iteration using a tech
nique described in Stewart (1976). In the accelerated scheme, the approximate eigen
value ,(k) satisfies
•
i = l:r.
(Without the acceleration, the right-hand side is l.Ai+i /.Ailk.) Stewart's algorithm in
volves computing the Schur decomposition of the matrices QIAQk every so often. The
method can be very useful in situations where A is large and sparse and a few of its
largest eigenvalues are required.

7.3.3 The QR Iteration
We now derive the QR iteration (7.3.1) and examine its convergence. Suppose r = n
in (7.3.6) and the eigenvalues of A satisfy
Partition the matrix Q in (7.3.7) and Qk in (7.3.6) as follows:
If
dist(Di(AH), span{qi0>, . . . , q�0>}) < 1,
then it follows from Theorem 7.3.1 that
i = l:n,
d. ( { (k) (k)} { } ) 0
1st span q1 , . . . , qi , span Q1, • • . , Qi --+
for i = l:n. This implies that the matrices Tk defined by
Tk = Q{!AQk
{7.3.11)
are converging to upper triangular form. Thus, it can be said that the method oforthog
onal iteration computes a Schur decomposition provided the original iterate Q0 E a:::nxn
is not deficient in the sense of (7.3.11).
The QR iteration arises naturally by considering how to compute the matrix Tk
directly from its predecessor Tk-l · On the one hand, we have from (7.3.6) and the
definition of Tk-1 that
Tk-1 = Qf:_1AQk-1 = Qf:_1(AQk-i) = (Qf:_1 Qk)Rk.
On the other hand,
Tk = Q{!AQk = (Q{!AQk-1)(Qf:_1Qk) = Rk(Qf!-1Qk)·
Thus, Tk is determined by computing the QR factorization ofTk-l and then multiplying
the factors together in reverse order, precisely what is done in (7.3.1).
Note that a single QR iteration is an O(n3) calculation. Moreover, since con
vergence is only linear {when it exists), it is clear that the method is a prohibitively
expensive way to compute Schur decompositions. Fortunately these practical difficul
ties can be overcome as we show in §7.4 and §7.5.
7.3.4 LR Iterations
We conclude with some remarks about power iterations that rely on the LU factoriza
tion rather than the QR factorizaton. Let Go E a:::nxr have rank r. Corresponding to
{7.3.1) we have the following iteration:
for k = 1, 2, . . .
Zk = AGk-1 (7.3.12)
(LU factorization)

Suppose r = n and that we define the matrices Tk by
(7.3.13)
It can be shown that if we set Lo = Go, then the Tk can be generated as follows:
To = L01AL0
for k = 1, 2, . . .
end
Tk-1 = LkRk
Tk = RkLk
(LU factorization) (7.3.14)
Iterations (7.3.12) and (7.3.14) are known as treppeniteration a.nd the LR iteration,
respectively. Under reasonable assumptions, the Tk converge to upper triangular form.
To successfully implement either method, it is necessary to pivot. See Wilkinson (AEP,
p. 602).
Appendix
In order to establish Theorem 7.3.1 we need the following lemma that bounds powers
of a matrix and powers of its inverse.
Lemma 7.3.2. Let QHAQ = T = D + N be a Schur decomposition of A E ccnxn
where D is diagonal and N strictly upper triangular. Let Amax and Amin denote the
largest and smallest eigenvalues ofA in absolute value. Ifµ 2'.: 0, then for all k 2'.: 0 we
have
(7.3.15)
If A is nonsingular and µ 2'.: 0 satisfies (1 + µ)!Amini > II N ll F' then for all k 2'.: 0 we
also have
(7.3.16)
Proof. For µ 2'.: 0, define the diagonal matrix A by
A = diag (1, (1 + µ), (1 + µ)2, . . . , (1 + µ)n-1)
and note that 11:2(A) = (1 + µ)n-1. Since N is strictly upper triangular, it is easy to
verify that
and thus
II Ak 112 = II Tk 112 = II A-1(D + ANA-l)kA 112
� 11:2(A) {II D 112 + II ANA-l ll2)
k � (1 + µ)n-l (iAmaxl + 11
1
:1;)k

7.3. Power Iterations
On the other hand, if A is nonsingular and (1 + µ)!Amini > II N ll F, then
Using Lemma 2.3.3 we obtain
completing the proof of the lemma. 0
371
Proof of Theorem 7. 9. 1 . By induction it is easy to show that the matrix Qk in
(7.3.6) satisfies
a QR factorization of AkQ0. By substituting the Schur decomposition (7.3.7)-(7.3.8)
into this equation we obtain
(7.3.17)
where
Our goal is to bound II wk 112 since by the definition ofsubspace distance given in §2.5.3
we have
II wk 112 = dist(Dr(A), ran(Qk)). (7.3.18)
Note from the thin CS decomposition (Theorem 2.5.2) that
1 = d% + O"min(Vk)
2
• (7.3.19)
Since Tu and T22 have no eigenvalues in common, Lemma 7.1.5 tells us that the
Sylvester equation Tu X - XT22 = -T12 has a solution X E <Crx(n-r) and that
It follows that
[Ir X i-i[Tu Ti2 ] [Ir X l =
[Tn 0 l
0 In-r 0 T22 0 In-r 0 T22
.
By substituting this into (7.3.17) we obtain
[Tf1 O
k
l [Vo - XWo l _
_
[Vk - xwk l
UTk
(Rk . . . R1 ),
0 T22 Wo n .
(7.3.20)

i.e.,
Tf1(Vo - XWo) = (Vk - XWk)(Rk · · · R1),
T�Wo = Wk(Rk · · · R1).
(7.3.21)
(7.3.22)
The matrix I+ XXH is Hermitian positive definite and so it has a Cholesky factoriza
tion
It is clear that
O"min(G) � 1.
If the matrix Z E <Cn x (n-r) is defined by
then it follows from the equation AHQ = QTH that
AH(Qa - QpXH) = (Qa - QpXH)T{f.
(7.3.23)
(7.3.24)
(7.3.25)
Since zHZ = Ir and ran(Z) = ran(Qa - QpXH), it follows that the columns of Z are
an orthonormal basis for Dr(AH). Using the CS decomposition, (7.3.19), and the fact
that ran(Qp) = Dr(AH).L, we have
O"min(ZTQo)2 = 1 - dist(Dr(AH), Qo)2 = 1 - II QfJQo II
= O"min(Q�Qo)2 = O"min(Vo)2 = 1 - d� > 0.
This shows that
is nonsingular and together with (7.3.24) we obtain
(7.3.26)
Manipulation of (7.3.19) and (7.3.20) yields
Wk = T4°2Wo(Rk · · · R1)-1 = T4°2Wo(Vo - XWo)-1Ti"/(Vk - XWk)·
The verification of (7.3.10) is completed by taking norms in this equation and using
(7.3.18), (7.3.19), (7.3.20), (7.3.26), and the following facts:
II T;2 ll2 $ (1 + µ)n- r-l (IAr+i l + II N llF/(1 + µ))k ,
II T1J.k 112 5 (1 + µr-1I (IAr l - II N llF/(1 + µ))k ,
II vk - xwk 112 5 II Vk 112 + II x 11211 wk 112 5 1 + II T12 llF/sep(T11, T22).

The bounds for 11 T�2 112 and 11 T1}k 112 follow from Lemma 7.3.2.
Problems
P7.3.l Verify Equation (7.3.5).
373
P7.3.2 Suppose the eigenvalues of A E R"'xn satisfy l>-11 = l>-21 > l>-31 � · · · � l>-nl and that >.1 and
>.2 are complex conjugates of one another. Let S = span{y, z} where y, z E Rn satisfy A(y + iz) =
>.1(y + iz). Show how the power method with a real starting vector can be used to compute an
approximate basis for S.
P7.3.3 Assume A E Rnxn has eigenvalues >.1 , ...,An that satisfy
>. = >.1 = >.2 = >.3 = >.4 > l>-sl � · · · � l>-nl
where >. is positive. Assume that A has two Jordan blocks of the form.
[ � � ] .
Discuss the convergence properties of the power method when applied to this matrix and how the
convergence might be accelerated.
P7.3.4 A matrix A is a positive matrix if aii > 0 for all i and j. A vector v E Rn is a positive
vector if Vi > 0 for all i. Perron's theorem states that if A is a positive square matrix, then it has
a unique dominant eigenvalue equal to its spectral radius p(A) and there is a positive vector x so
that Ax = p(A) - x. In this context, x is called the Perron vector and p(A) is called the Perron root.
Assume that A E R"'xn is positive and q E Rn is positive with unit 2-norm. Consider the following
implementation of the power method (7.3.3):
z = Aq, >. = qTz
while II z -
>.q 112 > o
q= z, q= q/ll q112, z = Aq, >. = qTz
end
(a) Adjust the termination criteria to guarantee (in principle) that the final >. and q satisfy Aq = >.q,
where II A- A 112 :5 o and A is positive. (b) Applied to a positive matrix A E Rnxn, the Collatz
Wielandt formula states that p(A) is the maximum value of the function f defined by
f(x) = min Yi
1:5i:5n Xi
where x E Rn is positive and y = Ax. Does it follow that f(Aq) � f(q)? In other words, do the
iterates {q(kl} in the power method have the property that /(qCkl) increases monotonically to the
Perron root, assuming that q<0J is positive?
P7.3.5 (Read the previous problem for background.) A matrix A is a nonnegative matrix if a;j � 0
for all i and j. A matrix A E Rnxn is reducible if there is a permutation P so that pTAP is block
triangular with two or more square diagonal blocks. A matrix that is not reducible is irreducible.
The Perron-F'robenius theorem states that if A is a square, nonnegative, and irreducible, then p(A),
the Perron root, is an eigenvalue for A and there is a positive vector x, the Perron vector, so that
Ax = p(A)·x. Assume that A1, A2, A3 E Rnxn are each positive and let the nonnegative matrix A be
defined by
A �
[1, �' �'l
(a) Show that A is irreducible. (b) Let B = A1A2A3. Show how to compute the Perron root and
vector for A from the Perron root and vector for B. (c) Show that A has other eigenvalues with
absolute value equal to the Perron root. How could those eigenvalues and the associated eigenvectors
be computed?
P7.3.6 (Read the previous two problems for background.) A nonnegative matrix P E Rnxnis stochas
tic if the entries in each column sum to 1. A vector v E Rn is a probability vector if its entries are
nonnegative and sum to 1. (a) Show that if P E E'xn is stochastic and v E Rn is a probability vec
tor, then w = Pv is also a probability vector. (b) The entries in a stochastic matrix P E Rnxn can

be regarded as the transition probabilities associated with an n-state Markov Chain. Let Vj be the
probability of being in state j at time t = tcurrent · In the Markov model, the probability of being in
state i at time t = tnext is given by
Wi i = l:n,
i.e., w = Pv. With the help of a biased coin, a surfer on the World Wide Web randomly jumps from
page to page. Assume that the surfer is currently viewing web page j and that the coin comes up
heads with probability a. Here is how the surfer determines the next page to visit:
Step 1. A coin is tossed.
Step 2. If it comes up heads and web page j has at least one outlink, then the next page to visit is
randomly selected from the list of outlink pages.
Step 3. Otherwise, the next page to visit is randomly selected from the list of all possible pages.
Let P E Rnxn be the matrix of transition probabilities that define this random process. Specify P in
terms of a, the vector of ones e, and the link matrix H E Rnxn defined by
h;j = { 1
0
if there is a link on web page j to web page i
otherwise
Hints: The number of nonzero components in H(:, j) is the number of outlinks on web page j. P is a
convex combination of a very sparse sparse matrix and a very dense rank-1 matrix. (c) Detail how the
power method can be used to determine a probability vector x so that Px = x. Strive to get as much
computation "outside the loop" as possible. Note that in the limit we can expect to find the random
surfer viewing web page i with probability x; . Thus, a case can be made that more important pages
are associated with the larger components of x. This is the basis of Google PageRank. If
then web page ik has page rank k.
P7.3.7 (a) Show that if X E
<Cnxn is nonsingular, then
II A llx = II x-1Ax 112
defines a matrix norm with the property that
ll AB llx ::; ll A llxll B llx·
(b) Show that for any f > 0 there exists a nonsingular X E <Cnxn such that
II A llx = II x-1AX 112 ::; p(A) + f
where p(A) is A's spectral radius. Conclude that there is a constant M such that
II Ak
112 ::; M(p(A) + f)k
for all non-negative integers k. (Hint: Set X = Q diag(l, a, . . . , an-l) where QHAQ = D + N is A's
Schur decomposition.)
P7.3.8 Verify that (7.3.14) calculates the matrices Tk defined by (7.3.13).
P7.3.9 Suppose A E <Cnxn is nonsingular and that Qo E <Cnxp has orthonormal columns. The fol
lowing iteration is referred to as inverse orthogonal iteration.
for k = 1, 2, . . .
end
Solve AZk = Qk-
1 for Zk E <Cnxp
Zk = QkRk (QR factorization)
Explain why this iteration can usually be used to compute the p smallest eigenvalues of A in absolute
value. Note that to implement this iteration it is necessary to be able to solve linear systems that
involve A. If p = 1, the method is referred to as the inverse power method.

For an excellent overview of the QR iteration and related procedures, see Watkins (MEP), Stewart
(MAE), and Kressner (NMSE). A detailed, practical discussion of the power method is given in
Wilkinson (AEP, Chap. 10). Methods are discussed for accelerating the basic iteration, for calculating
nondominant eigenvalues, and for handling complex conjugate eigenvalue pairs. The connections
among the various power iterations are discussed in:
B.N. Parlett and W.G. Poole (1973). "A Geometric Theory for the QR, LU, and Power Iterations,"
The QR iteration was concurrently developed in:
J.G.F. Francis (1961). "The QR Transformation: A Unitary Analogue to the LR Transformation,"
Comput. J. 4, 265-71, 332-334.
V.N. Kublanovskaya (1961). "On Some Algorithms for the Solution of the Complete Eigenvalue
Problem," USSR Comput. Math. Phys. 3, 637-657.
As can be deduced from the title of the first paper by Francis, the LR iteration predates the QR
iteration. The former very fundamental algorithm was proposed by:
H. Rutishauser (1958). "Solution of Eigenvalue Problems with the LR Transformation," Nat. Bur.
Stand. Appl. Math. Ser. 49, 47-81.
More recent, related work includes:
B.N. Parlett (1995). "The New qd Algorithms,'' Acta Numerica 5, 459-491.
C . Ferreira and B.N. Parlett (2009). "Convergence of the LR Algorithm for a One-Point Spectrum
Tridiagonal Matrix,'' Numer. Math. 113, 417-431.
Numerous papers on the convergence and behavior of the QR iteration have appeared, see:
J.H. Wilkinson (1965). "Convergence of the LR, QR, and Related Algorithms,'' Comput. J. 8, 77-84.
B.N. Parlett (1965). "Convergence of the Q-R Algorithm," Numer. Math. 7, 187-93. (Correction in
Numer. Math. 10, 163-164.)
B.N. Parlett (1966). "Singular and Invariant Matrices Under the QR Algorithm,'' Math. Comput.
20, 611-615.
B.N. Parlett (1968). "Global Convergence ofthe Basic QR Algorithm on Hessenberg Matrices,'' Math.
Comput. 22, 803-817.
D.S. Watkins (1982). "Understanding the QR Algorithm,'' SIAM Review 24, 427-440.
T. Nanda (1985). "Differential Equations and the QR Algorithm," SIAM J. Numer. Anal. 22,
310-321.
D.S. Watkins (1993). "Some Perspectives on the Eigenvalue Problem," SIAM Review 35, 430-471.
D.S. Watkins (2008). "The QR Algorithm Revisited," SIAM Review 50, 133-145.
D.S. Watkins (2011). "Francis's Algorithm,'' AMS Monthly 118, 387-403.
A block analog of the QR iteration is discussed in:
M. Robbe and M. Sadkane (2005). "Convergence Analysis of the Block Householder Block Diagonal
ization Algorithm,'' BIT 45, 181-195.
The following references are concerned with various practical and theoretical aspects of simultaneous
iteration:
H. Rutishauser (1970). "Simultaneous Iteration Method for Symmetric Matrices,'' Numer. Math. 1 6,
205-223.
M. Clint and A. Jennings (1971). "A Simultaneous Iteration Method for the Unsymmetric Eigenvalue
Problem,'' J. Inst. Math. Applic. 8, 111-121.
G.W. Stewart (1976). "Simultaneous Iteration for Computing Invariant Subspaces of Non-Hermitian
Matrices,'' Numer. Math. 25, 123-136.
A. Jennings (1977). Matrix Computation for Engineers and Scientists, John Wiley and Sons, New
York.
Z. Bai and G.W. Stewart (1997). "Algorithm 776: SRRIT: a Fortran Subroutine to Calculate the
Dominant Invariant Subspace of a Nonsymmetric Matrix," ACM TI-ans. Math. Softw. 23, 494-
513.

Problems P7.3.4-P7.3.6 explore the relevance of the power method to the problem of computing the
Perron root and vector of a nonnegative matrix. For further background and insight, see:
A. Berman and R.J. Plemmons (1994). Nonnegative Matrices in the Mathematical Sciences, SIAM
Publications,Philadelphia, PA.
A.N. Langville and C.D. Meyer (2006). Google 's PageRank and Beyond, Princeton University Press,
Princeton and Oxford. .
The latter volume is outstanding in how it connects the tools of numerical linear algebra to the design
and analysis of Web browsers. See also:
W.J. Stewart (1994). Introduction to the Numerical Solution of Markov Chains, Princeton University
Press, Princeton, NJ.
M.W. Berry, Z. Drma.C, and E.R. Jessup (1999). "Matrices, Vector Spaces, and Information Retrieval,"
A.N. Langville and C.D. Meyer (2005). "A Survey of Eigenvector Methods for Web Information
Retrieval," SIAM Review 47, 135-161.
A.N. Langville and C.D. Meyer (2006). "A Reordering for the PageRank Problem" , SIAM J. Sci.
Comput. 27, 2112-2120.
A.N. Langville and C.D. Meyer (2006). "Updating Markov Chains with an Eye on Google's PageR
ank," SIAM J. Matrix Anal. Applic. 27, 968-987.
7.4 The Hessenberg and Real Schur Forms
In this and the next section we show how to make the QR iteration (7.3.1) a fast,
effective method for computing Schur decompositions. Because the majority of eigen
value/invariant subspace problems involve real data, we concentrate on developing the
real analogue of (7.3.1) which we write as follows:
Ho = UJ'AUo
for k = 1, 2, . . .
end
Hk-1 = UkRk
Hk = RkUk
(QR factorization) (7.4.1)
Here, A E IRnxn, each Uk E IRnxn is orthogonal, and each Rk E IRnxn is upper trian
gular. A difficulty associated with this real iteration is that the Hk can never converge
to triangular form in the event that A has complex eigenvalues. For this reason, we
must lower our expectations and be content with the calculation of an alternative
decomposition known as the real Schur decomposition.
In order to compute the real Schur decomposition efficiently we must carefully
choose the initial orthogonal similarity transformation Uo in (7.4.1). In particular, if
we choose U0 so that Ho is upper Hessenberg, then the amount of work per iteration
is reduced from O(n3) to O(n2). The initial reduction to Hessenberg form (the Uo
computation) is a very important computation in its own right and can be realized by
a sequence of Householder matrix operations.
7.4.1 The Real Schur Decomposition
A block upper triangular matrix with either 1-by-1 or 2-by-2 diagonal blocks is upper
quasi-triangular. The real Schur decomposition amounts to a real reduction to upper
quasi-triangular form.

7.4. The Hessenberg and Real Schur Forms 377
Theorem 7.4.1 (Real Schur Decomposition). If A E JR"'x", then there exists an
orthogonal Q E JR"'xn such that
(7.4.2)
0
where each Rii is either a 1-by-1 matrix or a 2-by-2 matrix having complex conjugate
eigenvalues.
Proof. The complex eigenvalues of A occur in conjugate pairs since the characteristic
polynomial det(zl - A) has real coefficients. Let k be the number of complex conjugate
pairs in ,(A). We prove the theorem by induction on k. Observe first that Lemma
7.1.2 and Theorem 7.1.3 have obvious real analogs. Thus, the theorem holds if k = 0.
Now suppose that k 2'.: 1. If , = r + iµ E ,(A) and µ -/:- 0, then there exist vectors y
and z in IR"(z -/:- 0) such that A(y + iz) = ('y + ip.)(y + iz), i.e.,
A [ y z ] = [ y z ] [ "/ µ
]·
-µ "/
The assumption that µ -/:- 0 implies that y and z span a 2-dimensional, real invariant
subspace for A. It then follows from Lemma 7.1.2 that an orthogonal U E IR.nxn exists
such that
urAU =
2 n-2
where ,(T11) = {,, 5.}. By induction, there exists an orthogonal U so [JTT22U has the
required structure. The theorem follows by setting Q = U. diag(h U). D
The theorem shows that any real matrix is orthogonally similar to an upper quasi
triangular matrix. It is clear that the real and imaginary parts of the complex eigen
values can be easily obtained from the 2-by-2 diagonal blocks. Thus, it can be said
that the real Schur decomposition is an eigenvalue-revealing decomposition.
7.4.2 A Hessenberg QR Step
We now turn our attention to the efficient execution of a single QR step in (7.4.1).
In this regard, the most glaring shortcoming associated with (7.4.1) is that each step
requires a full QR factorh::ation costing O(n3) flops. Fortunately, the amount of work
per iteration can be reduced by an order of magnitude if the orthogonal matrix U0 is
judiciously chosen. In particular, if U[AUo = Ho = (hij) is upper Hessenberg (hij = 0,
i > j + 1), then each subsequent Hk requires only O(n2) flops to calculate. To sec this
we look at the computations H = QR and H+ = RQ when H is upper Hessenberg.
As described in §5.2.5, we can upper triangularize H with a sequence of n - 1 Givens
rotations: QTH ::: G'f,.'_1 . . · GfH = R. Here, Ci = G(i, i + 1, Bi)· For the n = 4 case
there are three Givens premultiplications:

[�
x x
�l [�
x x
�l [�
x x x x
-t -t
x x x x
0 x 0 x
x x
�l [�
x x
0
-t
x
0 x
x x
x x
0 x
0 0 :l
x
.
x
See Algorithm 5.2.5. The computation RQ = R(G1 · · · Gn-l) is equally easy to imple-
ment. In the n = 4 case there are three Givens post-multiplications:
[�
x
x
0
0
x
x
x
0
x
x
0
0
x
x
x
0
Overall we obtain the following algorithm:
x
x
x
0
x
x
x
0
x
x
x
0
x
x
x
x
Algorithm 7.4.1 If H is an n-by-n upper Hessenberg matrix, then this algorithm
overwrites H with H+ = RQ where H = QR is the QR factorization of H.
for k = l:n - 1
end
[ Ck , Bk J = givens(H(k, k), H(k + 1, k))
H(k:k + 1, k:n) = [ Ck Bk ]TH(k:k + 1, k:n)
-Sk Ck
for k = l:n - 1
end
H(l:k + 1, k:k + 1) = H(l:k + 1, k:k + 1) [ Ck Sk l
-Sk Ck
Let Gk = G(k, k+1, fh) be the kth Givens rotation. It is easy to confirm that the matrix
Q = G1 · · · Gn-1 is upper Hessenberg. Thus, RQ = H+ is also upper Hessenberg. The
algorithm requires about 6n2 flops, an order of magnitude more efficient than a full
matrix QR step (7.3.1).
7.4.3 The Hessenberg Reduction
It remains for us to show how the Hessenberg decomposition
u;rAUo = H, UJ'Uo = I (7.4.3)
can be computed. The transformation Uo can be computed as a product ofHouseholder
matrices P1, . . . , P.. -2 . The role of Pk is to zero the kth column below the subdiagonal.
In the n = 6 case, we have
x x x x x x x x x x x x
x x x x x x
� 0 x x x x x
:';
x x x x x x 0 x x x x x

x x x x x x x x x x x x x x x x x x
x x x x x x x x x x x x x x x x x x
0 x x x x x
� 0 x x x x x
� 0 x x x x x
0 0 x x x x 0 0 x x x x 0 0 x x x x
0 0 x x x x 0 0 0 x x x 0 0 0 x x x
0 0 x x x x 0 0 0 x x x 0 0 0 0 x x
In general, after k - 1 steps we have computed k - 1 Householder matrices P1 , • . . , Pk-l
such that
[ Bu
B21
0
k-1
k-1
n-k
n-k
is upper Hessenberg through its first k - 1 columns. Suppose Pk is an order-(n - k)
Householder matrix such that PkB32 is a multiple of e�n-k). If Pk = diag(Jk, Pk), then
is upper Hessenberg through its first k columns. Repeating this for k = l:n - 2 we
obtain
Algorithm 7.4.2 (Householder Reduction to Hessenberg Form) Given A E Rnxn,
the following algorithm overwrites A with H = UJ'AU0 where H is upper Hessenberg
and Uo is a product of Householder matrices.
for k = l:n - 2
end
[v, .BJ = house(A(k + l:n, k))
A(k + l:n, k:n) = (I - ,BvvT)A(k + l:n, k:n)
A(l:n, k + l:n) = A(l:n, k + l:n)(I - ,BvvT)
This algorithm requires 10n3/3 flops. If U0 is explicitly formed, an additional 4n3/3
flops are required. The kth Householder matrix can be represented in A(k + 2:n, k).
See Martin and Wilkinson (1968) for a detailed description.
The roundoff properties of this method for reducing A to Hessenberg form are
v.!'lry desirable. Wilkinson (AEP, p. 351) states that the computed Hessenberg matrix
H satisfies
fl = QT(A + E)Q,
where Q is orthogonal and II E llF :5 cn2ull A llF with c a small constant.

7 .4.4 Level-3 Aspects
The Hessenberg reduction (Algorithm 7.4.2) is rich in level-2 operations: half gaxpys
and half outer product updates. We briefly mention two ideas for introducing level-3
computations into the process.
The first involves a block reduction to block Hessenberg form and is quite straight
forward. Suppose (for clarity) that n = rN and write
r n-r
Suppose that we have computed the QR factorization A21 = Q1R1 and that Q1 is in
WY form. That is, we have Wi, Y1 E JR(n-r) xr such that Q1 = I + W1Y{. (See §5.2.2
for details.) If Q1 = diag(Ir, Q1) then
Notice that the updates of the (1,2) and (2,2) blocks arc rich in levcl-3 operations given
that Q1 is in WY form. This fully illustrates the overall process as QfAQ1 is block
upper Hessenberg through its first block column. We next repeat the computations on
the first r columns of Q[A22Q1. After N - 1 such steps we obtain
where each Hij is r-by-r and U0 = Q1 · · • QN-2 with each Qi in WY form. The overall
algorithm has a level-3 fraction of the form 1 - 0(1/N). Note that the subdiagonal
blocks in H are upper triangular and so the matrix has lower bandwidth r. It is possible
to reduce H to actual Hessenberg form by using Givens rotations to zero all but the
first subdiagonal.
Dongarra, Hammarling, and Sorensen (1987) have shown how to proceed directly
to Hessenberg form using a mixture of gaxpys and level-3 updates. Their idea involves
minimal updating after each Householder transformation is generated. For example,
suppose the first Householder P1 has been computed. To generate P2 we need just the
second column of P1AP1, not the full outer product update. To generate P3 we need
just the thirrd column of P2P1AP1P2, etc. In this way, the Householder matrices can
be determined using only gaxpy operations. No outer product updates are involved.
Once a suitable number of Householder matrices are known they can be aggregated
and applied in level-3 fashion.
For more about the challenges of organizing a high-performance Hessenberg re
duction, see Karlsson (2011).

7.4.5 Important Hessenberg Matrix Properties
The Hessenberg decomposition is not unique. If Z is any n-by-n orthogonal matrix
and we apply Algorithm 7.4.2 to zTAZ, then QTAQ = H is upper Hessenberg where
Q = ZUo. However, Qe1 = Z(Uoe1) = Ze1 suggesting that H is unique once the
first column of Q is specified. This is essentially the case provided H has no zero
subdiagonal entries. Hessenberg matrices with this property arc said to be unreduced.
Here is important theorem that clarifies these issues.
Theorem 7.4.2 { Implicit Q Theorem ). Suppose Q = [ Q1 I · · · I Qn ] and V =
[ v1 I · · · I Vn ] are orthogonal matrices with the property that the matrices QTAQ = H
and VTAV = G are each upper Hessenberg where A E 1Rnxn. Let k denote the smallest
positive integerfor which hk+l,k = 0, with the convention that k = n ifH is unreduced.
If Q1 = v1, then Qi = ±vi and lhi,i-1I = lgi,i-1 I for i = 2:k. Moreover, if k < n, then
gk+l,k = 0.
Proof. Define the orthogonal matrix W = [ w1 I · · · I Wn ] = VTQ and observe that
GW = WH. By comparing column i - 1 in this equation for i = 2:k we see that
i-1
hi,i-lwi = Gwi-1 - L hj,i-1Wj·
j=l
Since w1 = e1, it follows that [ w1 I · · · I wk ] is upper triangular and so for i = 2:k we
have Wi = ±In(:, i) = ±e;. Since Wi = vrQi and hi,i-1 = wTGwi-l it follows that
Vi = ±qi and
lhi,i-11 = lq[AQi-11 = Iv[Avi-1 I = lgi,i-1 1
for i = 2:k. If k < n, then
9k+1,k = ef+lGek = ±ef+lGWek = ±ef+1WHek
k
= ±ef+l L hikWei
i=l
k
= ± Lhikek+1ei = 0,
i=l
The gist of the implicit Q theorem is that if QTAQ = H and zTAZ = G are each unre
duced upper Hessenberg matrices and Q and Z have the same first column, then G and
H are "essentially equal" in the sense that G = v-1HD where D = diag(±l, . . . , ±1).
Our next theorem involves a new type of matrix called a Krylov matrix. If
A E 1Rnxn and v E 1Rn, then the Krylov matrix K(A,v,j) E 1Rnxj is defined by
K(A, v,j) = [ v I Av I · . · I Aj-lv ] .
It turns out that there is a connection between the Hessenberg reduction QTAQ = H
and the QR factorization of the Krylov matrix K(A, Q(:, 1), n).
Theorem 7.4.3. Suppose Q E 1Rnxn is an orthogonal matrix and A E 1Rnxn. Then
QTAQ = H is an unreduced upper Hessenberg matrix ifand only ifQTK(A, Q(:, 1), n) =
R is nonsingular and upper triangular.

Proof. Suppose Q E JRnxn is orthogonal and set H = QTAQ. Consider the identity
IfH is an unreduced upper Hessenberg matrix, then it is clear that R is upper triangular
with rii = h21h32 · · · hi,i-l for i = 2:n. Since rn = 1 it follows that R is nonsingular.
To prove the converse, suppose R is upper triangular and nonsingular. Since
R(:, k + 1) = HR(:, k) it follows that H(:, k) E span{ ei, . . . , ek+l }. This implies that
H is upper Hessenberg. Since rnn = h21h32 · · · hn,n-1 -::/:- 0 it follows that H is also
unreduced. D
Thus, there is more or less a correspondence between nonsingular Krylov matrices and
orthogonal similarity reductions to unreduced Hessenberg form.
Our last result is about the geometric multiplicity ofan eigenvalue ofan unreduced
upper Hessenberg matrix.
Theorem 7.4.4. If >. is an eigenvalue of an unreduced upper Hessenberg matrix
H E JRnxn, then its geometric multiplicity is 1.
Proof. For any >. E <C we have rank(A - >.I) � n - 1 because the first n - 1 columns
of H - >.I are independent. D
7.4.6 Companion Matrix Form
Just as the Schur decomposition has a nonunitary analogue in the Jordan decomposi
tion, so does the Hessenberg decomposition have a nonunitary analog in the companion
matrix decomposition. Let x E JRn and suppose that the Krylov matrix K = K(A, x, n)
is nonsingular. If c = c(O:n - 1) solves the linear system Kc = -Anx, then it follows
that AK = KC where C has the form
0 0 0 -eo
1 0 0 -C1
c = 0 1 0 -C2 (7.4.4)
0 0 1 -Cn-1
The matrix C is said to be a companion matrix. Since
it follows that if K is nonsingular, then the decomposition K-1AK = C displays A's
characteristic polynomial. This, coupled with the sparseness of C, leads to "companion
matrix methods" in various application areas. These techniques typically involve:
Step 1. Compute the Hessenberg decomposition UJ'AUo = H.
Step 2. Hope H is unreduced and set Y = [e1 I He1 1 - . - I Hn-1e1 ] .
Step 3. Solve YC = HY for C.

Unfortunately, this calculation can be highly unstable. A is similar to an unreduced
Hessenberg matrix only if each eigenvalue has unit geometric multiplicity. Matrices
that have this property are called nonderogatory. It follows that the matrix Y above
can be very poorly conditioned if A is close to a derogatory matrix.
A full discussion of the dangers associated with companion matrix computation
can be found in Wilkinson (AEP, pp. 405ff.).
Problems
P7.4.1 Suppose A E Fxn and z E Rn. Give a detailed algorithm for computing an orthogonal Q
such that QTAQ is upper Hessenberg and QTz is a multiple of e1. Hint: Reduce z first and then apply
Algorithm 7.4.2.
P7.4.2 Develop a similarity reduction to Hessenberg form using Gauss transforms with pivoting. How
many flops are required. See Businger (1969).
P7.4.3 In some situations, it is necessary to solve the linear system (A+ zl)x = b for many different
values of z E R and b E Rn. Show how this problem can be efficiently and stably solved using the
Hessenberg decomposition.
P7.4.4 Suppose H E Rnxn is an unreduced upper Hessenberg matrix. Show that there exists a
diagonal matrix D such that each subdiagonal element of v-1HD is equal to 1. What is 11:2(D)?
P7.4.5 Suppose W, Y E Rnxn and define the matrices C and B by
C = W + iY, B =
[ � -
; ].
Show that if .>. E .>.(C) is real, then .>. E .>.(B). Relate the corresponding eigenvectors.
P7.4.6 Suppose
A = [ � � ]
is a real matrix having eigenvalues .>.±iµ, where µ is nonzero. Give an algorithm that stably determines
c = cos(8) and s = sin(8) such that
where 0t/3 = -µ2•
P7.4.7 Suppose (.>., x) is a known eigenvalue-eigenvector pair for the upper Hessenberg matrix H E Fx
n.
Give an algorithm for computing an orthogonal matrix P such that
pTHP - [ .>. wT ]
- 0 Hi
where H1 E R(n-l)x(n-l) is upper Hessenberg. Compute P as a product of Givens rotations.
P7.4.8 Suppose H E Rnxn has lower bandwidth p. Show how to compute Q E R"xn, a product of
Givens rotations, such that QTHQ is upper Hessenberg. How many flops are required?
P7.4.9 Show that if C is a companion matrix with distinct eigenvalues .>.i . . . . ,.>.n, then vcv-1 =
diag(.>.1, . . . , .>.n) where
.>.�-1l
.>.n-1
2
.>.::-1
The real Schur decomposition was originally presented in:
F.D. Murnaghan and A. Wintner (1931). "A Canonical Form for Real Matrices Under Orthogonal
Transformations," Proc. Nat. Acad. Sci. 1 7, 417-420.

A thorough treatment of the reduction to Hessenberg form is given in Wilkinson (AEP, Chap. 6), and
Algol procedures appear in:
R.S. Martin and J.H. Wilkinson (1968). "Similarity Reduction of a General Matrix to Hcssenberg
Form," Nu.mer. Math. 12, 349- 368.
Givens rotations can also be used to compute the Hessenberg decomposition, see:
W. Rath (1982). "Fast Givens Rotations for Orthogonal Similarity," Nu.mer. Math. 40, 47-56.
The high-performance computation of the Hessenberg reduction is a major challenge because it is a
two-sided factorization, see:
J.J. Dongarra, L. Kaufman, and S. Hammarling (1986). "Squeezing the Most Out of Eigenvalue
Solvers on High Performance Computers,'' Lin. Alg. Applic. 77, 113-136.
J.J. Dongarra, S. Hammarling, and D.C. Sorensen (1989). "Block Reduction of Matrices to Condensed
Forms for Eigenvalue Computations,'' .!. ACM 27, 215--227.
M.W. Berry, J.J. Dongarra, and Y. Kim (1995). "A Parallel Algorithm for the Reduction of a Non
symmetric Matrix to Block Upper Hessenberg Form," Parallel Comput. 21, 1189- 1211.
G. Quintana-Orti and R. Van De Geijn (2006). "Improving the Performance of Reduction to Hessen
berg Form,'' ACM nuns. Math. Sojtw. 32, 180-194.
S. Tomov, R. Nath, and J. Dongarra (2010). "Accelerating the Reduction to Upper Hessenberg,
Tridiagonal, and Bidiagonal Forms Through Hybrid GPU-Based Computing," Parallel Compv.t.
36, 645-654.
L. Karlsson (2011). "Scheduling of Parallel Matrix Computations and Data Layout Conversion for
HPC and Multicore Architectures," PhD Thesis, University of Umea.
Reaching the Hessenberg form via Gauss transforms is discussed in:
P. Businger (1969). "Reducing a Matrix to Hessenberg Form,'' Math- Comput. 23, 819-821.
G.W. Howell and N. Diaa (2005). "Algorithm 841: BHESS: Gaussian Reduction to a Similar Banded
Hessenberg Form,'' ACM nuns. Math. Softw. 31, 166-185.
Some interesting mathematical properties of the Hessenberg form may be found in:
B.N. Parlett (1967). "Canonical Decomposition of Hessenberg Matrices,'' Math. Comput. 21, 223-
227.
Although the Hessenberg decomposition is largely appreciated as a "front end" decomposition for the
QR iteration, it is increasingly popular as a cheap alternative to the more expensive Schur decom
position in certain problems. For a sampling of applications where it has proven to be very useful,
consult:
W. Enright (1979). "On the Efficient and Reliable Numerical Solution of Large Linear Systems of
O.D.E.'s,'' IEEE nuns. Av.tom. Contr. AC-24, 905-908.
G.H. Golub, S. Nash and C. Van Loan (1979). "A Hcssenberg-Schur Method for the Problem AX +
XB = C,'' IEEE nuns. Av.tom. Contr. AC-24, 909-913.
A. Laub (1981). "Efficient Multivariable Frequency Response Computations,'' IEEE nuns. Av.tom.
Contr. AC-26, 407-408.
C.C. Paige (1981). "Properties of Numerical Algorithms Related to Computing Controllability,'' IEEE
nuns. Auto. Contr. A C-26, 130-138.
G. Miminis and C.C. Paige (1982). "An Algorithm for Pole Assignment of Time Invariant Linear
Systems,'' Int. J. Contr. 35, 341-354.
C. Van Loan (1982). "Using the Hessenberg Decomposition in Control Theory," in Algorithms and
Theory in Filtering and Control , D.C. Sorensen and R.J. Wets (eds.), Mathematical Programming
Study No. 18, North Holland, Amsterdam, 102-111.
C.D. Martin and C.F. Van Loan (2006). "Solving Real Linear Systems with the Complex Schur
Decomposition,'' SIAM J. Matrix Anal. Applic. 29, 177-183.
The advisability of posing polynomial root problems as companion matrix eigenvalue problem is dis
cussed in:
A. Edelman and H. Murakami (1995). "Polynomial Roots from Companion Matrix Eigenvalues,"
Math. Comput. 64, 763--776.

7.5. The Practical QR Algorithm
7.5 The Practical QR Algorithm
We return to the Hessenberg QR iteration, which we write as follows:
H = UJ'AUo
for k = 1, 2, . . .
end
H = UR
H = RU
(Hessenberg reduction)
(QR factorization)
385
(7.5.1)
Our aim in this section is to describe how the H's converge to upper quasi-triangular
form and to show how the convergence rate can be accelerated by incorporating shifts.
7.5.1 Deflation
Without loss of generality we may assume that each Hessenberg matrix H in (7.5.1) is
unreduced. If not, then at some stage we have
H =
[ H
o
n H12 ] p
H22 n-p
P n-p
where 1 :::; p < n and the problem decouples into two smaller problems involving H11
and H22. The term deflation is also used in this context, usually when p = n - 1 or
n - 2.
In practice, decoupling occurs whenever a subdiagonal entry in H is suitably
small. For example, if
(7.5.2)
for a small constant c, then hp+ I ,p can justifiably be set to zero because rounding errors
of order ull H II arc typically present throughout the matrix anyway.
7.5.2 The Shifted QR Iteration
Let µ E lR and consider the iteration:
H = UJ'AUo
for k = 1, 2, . . .
(Hessenberg reduction)
Determine a scalar µ.
H - µl = UR
H = RU + µl
end
The scalar µ is referred to as a shift . Each matrix H generated in (7.5.3) is similar to
A, since
RU + µ/ UTHU.

If we order the eigenvalues Ai of A so that
IA1 - µI � · · · � IAn - µI,
and µ is fixed from iteration to iteration, then the theory of §7.3 says that the pth
subdiagonal entry in H converges to zero with rate
I
Ap+i - µ lk
Ap - µ
Of course, if Ap = Ap+l • then there is no convergence at all. But if, for example, µ
is much closer to An than to the other eigenvalues, then the zeroing of the (n, n - 1)
entry is rapid. In the extreme case we have the following:
Theorem 7.5.1. Let µ be an eigenvalue of an n- by-n unreduced Hessenberg matrix
H. If
H = RU + µI,
where H - µI = UR is the QR factorization of H - µI, then hn,n-1 = 0 and hnn = µ.
Proof. Since H is an unreduced Hessenberg matrix the first n - 1 columns of H - µI
are independent, regardless of µ. Thus, if UR = (H - µI) is the QR factorization then
rii =i 0 for i = l:n - l. But if H - µI is singular, then ru · · · Tnn = 0 . Thus, Tnn = 0
and H(n, :) = [ O, . . . , 0, µ ]. D
The theorem says that if we shift by an exact eigenvalue, then in exact arithmetic
deflation occurs in one step.
7.5.3 The Single-Shift Strategy
Now let us consider varying µ from iteration to iteration incorporating new information
about A(A) as the subdiagonal entries converge to zero. A good heuristic is to regard
hnn as the best approximate eigenvalue along the diagonal. If we shift by this quantity
during each iteration, we obtain the single-shift QR iteration:
for k = 1, 2, . . .
µ = H(n, n)
H - µl = UR
H = RU + µl
end
If the (n, n - 1) entry converges to zero, it is likely to do so at a quadratic rate. To see
this, we borrow an example from Stewart (IMC, p. 366). Suppose H is an unreduced
upper Hessenberg matrix of the form
H =
x
x
0
0
0
x
x
x
0
0
x
x
x
x
0
x
x
x
x
f

and that we perform one step of the single-shift QR algorithm, i.e.,
UR = H - hnn
fl = RU + hnnl.
387
After n - 2 steps in the orthogonal reduction of H - hnnl to upper triangular form we
obtain a matrix with the following structure:
It is not hard to show that
hn,n-l
x
x
0
0
0
x x
x x
x x
0 a
0 €
-a2 + €2 ·
If we assume that € « a, then it is clear that the new (n, n - 1) entry has order €2,
precisely what we would expect of a quadratically converging algorithm.
7.5.4 The Double-Shift Strategy
Unfortunately, difficulties with (7.5.4) can be expected if at some stage the eigenvalues
a1 and a2 of
m = n-l, (7.5.5)
are complex for then hnn would tend to be a poor approximate eigenvalue.
A way around this difficulty is to perform two single-shift QR steps in succession
using a1 and a2 as shifts:
H - a1I = U1R1
H1 = RiU1 + a1I
H1 - a2I = U2R2
H2 = R2U2 + a2I
These equations can be manipulated to show that
where M is defined by
M = (H - a1I)(H :- a2I).
Note that M is a real matrix even if G's eigenvalues are complex since
M = H2 - sH + tI
where
s hmm +
hnn = tr(G) E JR
(7.5.6)
(7.5.7)
(7.5.8)

and
t = aia2 = hmmhnn - hmnhnm = det(G) E R..
Thus, (7.5.7) is the QR factorization of a real matrix and we may choose U1 and U2 so
that Z = U1U2 is real orthogonal. It then follows that
is real.
Unfortunately, roundoff error almost always prevents an exact return to the real
field. A real H2 could be guaranteed if we
• explicitly form the real matrix M = H2 - sH + tI,
• compute the real QR factorization M = ZR, and
• set H2 = zrHz.
But since the first of these steps requires O(n3) flops, this is not a practical course of
action.
7.5.5 The Double-Implicit-Shift Strategy
Fortunately, it turns out that we can implement the double-shift step with O(n2) flops
by appealing to the implicit Q theorem of §7.4.5. In particular we can effect the
transition from H to H2 in O(n2) flops if we
• compute Me1, the first column of M;
• determine a Householder matrix Po such that P0(Mei) is a multiple of e1 ;
• compute Householder matrices P1, . . • , Pn-2 such that if
then z'[HZ1 is upper Hessenberg and the first columns of Z and Z1 arc the same.
Under these circumstances, the implicit Q theorem permits us to conclude that, if
zrHz and Z'[HZ1 are both unreduced upper Hessenberg matrices, then they are
essentially equal. Note that if these Hessenberg matrices are not unreduced, then we
can effect a decoupling and proceed with smaller unreduced subproblems.
Let us work out the details. Observe first that Po can be determined in 0(1)
flops since Me1 = [x, y, z, 0, . . . ' o]T where
x = h�1 + h12h21 - sh11 + t,
y = h21 (h11 + h22 - s),
z = h21h32.
Since a similarity transformation with Po only changes rows and columns 1, 2, and 3,
we see that

PoHPo
x x x x x x
x x x x x x
x x
x x
0 0
0 0
x x
x x
0 x
0 0
x
x
x
x
x
x
x
x
389
Now the mission of the Householder matrices P1, . . . , Pn-2 is to restore this matrix to
upper Hessenberg form. The calculation proceeds as follows:
x x x x x x
x x x x x x
x x x x x x
x x x x x x
0 0 0 x x x
0 0 0 0 x x
x x x x x x
x x x x x x
0 x x x x x
0 x x x x x
0 x x x x x
0 0 0 0 x x
x x x x x x
x x x x x x
0 x x x x x
0 0 x x x x
0 0 x x x x
0 0 x x x x
x
x
0
0
0
0
x x x x x
x x x x x
x x x x x
0 x x x x
0 0 x x x
0 0 x x x
x x
x x
0 x
0 0
0 0
0 0
x x
x x
x x
x x
0 x
0 0
x
x
x
x
x
x
x
x
x
x
x
x
Each Pk is the identity with a 3-by-3 or 2-by-2 Householder somewhere along its diag
onal, e.g.,
1 0
0 x
0 x
0 x
0 0
0 0
0 0 0 0
x x 0 0
x x 0 0
x x 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 x
0 0 0 x
0 0 0 x
0 0
0 0
0 0
x x
x x
x x
1 0 0
0 1 0
0 0 x
0 0 x
0 0 x
0 0 0
0 0 0
0 0 0
x x 0
x x 0
x x 0
0 0 1
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 x
0 0 0 x
0 0 0 0
0
0
0
0
x
x
0
0
0
0
x
x
The applicability of Theorem 7.4.3 (the implicit Q theorem) follows from the
observation that Pke1 = e1 for k = l:n - 2 and that Po and Z have the same first
column. Hence, Z1e1 = Zei, and we can assert that Z1 essentially equals Z provided
that the upper Hessenberg matrices zrHZ and Z[HZ1 are each unreduced.

The implicit determination of H2 from H outlined above was first described by
Francis (1961) and we refer to it as a Francis QR step. The complete Francis step is
summarized as follows:
Algorithm 7.5.1 (Francis QR step) Given the unreduced upper Hessenberg matrix
H E Rnxn whose trailing 2-by-2 principal submatrix has eigenvalues ai and a2, this
algorithm overwrites H with zTHZ, where Z is a product of Householder matrices
and zT(H - a1I)(H - a2I) is upper triangular.
m = n - 1
{Compute first column of (H - ail)(H - a2I)}
s = H(m, m) + H(n, n)
t = H(m, m) ·H(n, n) - H(m, n) ·H(n, m)
x = H(l, l) ·H(l, 1) + H(l, 2) ·H(2, 1) - s·H(l, 1) + t
y = H(2, l) · (H(l, 1) + H(2, 2) - s)
z = H(2, l)·H(3, 2)
for k = O:n - 3
end
[v, .BJ = house([x y zjT)
q = max{l, k}.
H(k + l:k + 3, q:n) = (I - ,BvvT) ·H(k + l:k + 3, q:n)
r = min{k + 4, n}
H(l:r, k + l:k + 3) = H(l:r, k + l:k + 3) · (! - ,BvvT)
x = H(k + 2, k + 1)
y = H(k + 3, k + 1)
if k < n - 3
z = H(k + 4, k + l)
end
[v, .BJ = house([ x y JT)
H(n - l:n, n - 2:n) = (I - .BvvT) ·H(n - l:n, n - 2:n)
H(l:n, n - l:n) = H(l:n, n - l:n) · (I - .BvvT)
This algorithm requires 10n2 flops. If Z is accumulated into a given orthogonal matrix,
an additional 10n2 flops are necessary.
7.5.6 The Overall Process
Reduction of A to Hessenberg form using Algorithm 7.4.2 and then iteration with
Algorithm 7.5.1 to produce the real Schur form is the standard means by which the
dense unsymmetric eigenproblem is solved. During the iteration it is necessary to
monitor the subdiagonal elements in H in order to spot any possible decoupling. How
this is done is illustrated in the following algorithm:

7.5. The Practical QR Algorithm 391
Algorithm 7.5.2 (QR Algorithm) Given A E IRnxn and a tolerance tol greater than
the unit roundoff, this algorithm computes the real Schur canonical form QTAQ = T.
If Q and T are desired, then T is stored in H. If only the eigenvalues are desired, then
diagonal blocks in T are stored in the corresponding positions in H.
Use Algorithm 7.4.2 to compute the Hessenberg reduction
H = UJ'AUo where Uo=P1 · · · Pn-2·
If Q is desired form Q = P1 · · · Pn-2· (See §5.1.6.)
until q = n
Set to zero all subdiagonal elements that satisfy:
Jhi,i-1! � tol·(Jhiil + Jhi-l,i-11).
Find the largest nonnegative q and the smallest non-negative p such that
end
H12 H13 p
H
[ T H22 H23
l n-p-q
0 H33 q
p n- p-q q
where H33 is upper quasi-triangular and H22 is unreduced.
if q < n
Perform a Francis QR step on H22: H22 = zTH22Z.
if Q is required
Q = Q · diag(Jp, Z, Iq)
H12 = H12Z
end .
end
H23 = zTH23
Upper triangularize all 2-by-2 diagonal blocks in H that have real
eigenvalues and accumulate the transformations (if necessary).
This algorithm requires 25n3 flops if Q and T are computed. If only the eigenvalues
are desired, then 10n3 flops are necessary. These flops counts are very approximate
and are based on the empirical observation that on average only two Francis iterations
are required before the lower 1-by-1 or 2-by-2 decouples.
The roundoff properties of the QR algorithm are what one would expect of any
orthogonal matrix technique. The computed real Schur form T is orthogonally similar
to a matrix near to A, i.e.,
QT(A + E)Q = T
where QTQ = I and II E 112 � ull A 112. The computed Q is almost orthogonal in the
sense that QTQ = I + F where II F 112 � u.
The order of the eigenvalues along T is somewhat arbitrary. But as we discuss
in §7.6, any ordering can be achieved by using a simple procedure for swapping two
adjacent diagonal entries.

392
7.5.7 Balancing
Chapter 7. Unsymmetric Eigenvalue Problems
Finally, we mention that if the elements of A have widely v-ctrying magnitudes, then A
should be balanced before applying the QR algorithm. This is an O(n2) calculation in
which a diagonal matrix D is computed so that if
v-1AD � ( c, l · · · ( c,. ] �
[:rl
then II ri lloo � 11 Ci lloo for i = l:n. The diagonal matrix D is chosen to have the form
D = diag(,Bi1 ' . . . ' ,Bin)
where .B is the floating point base. Note that D-1AD can be calculated without
roundoff. When A is balanced, the computed eigenvalues are usually more accurate
although there are exceptions. See Parlett and Reinsch (1969) and Watkins(2006).
Problems
P7.5.1 Show that if fl = QTHQ is obtained by performing a single-shift QR step with
then lh2d :5 jy
2xl/((w - z)
2 + y2].
H = [ w x ]
y z '
P7.5.2 Given A E R2X2, show how to compute a diagonal D E R2X2 so that II v-1AD ll F is minimized.
P7.5.3 Explain how the single-shift QR step H -µI = UR, fl = RU+µI can be carried out implicitly.
That is, show how the transition from H to fl can be carried out without subtracting the shift µ from
the diagonal of H.
P7.5.4 Suppose H is upper Hessenberg and that we compute the factorization PH = LU via Gaussian
elimination with partial pivoting. (See Algorithm 4.3.4.) Show that Hi = U(PTL) is upper Hessenberg
and similar to H. (This is the basis of the modified LR algorithm.)
P7.5.5 Show that if H = Ho is given and we generate the matrices Hk via Hk - µkl = UkRk , Hk+l
= RkUk + µkl, then (U1 · · · Uj)(Ri · · · R1 ) = (H - µi i) · · · (H - µjl).
Historically important papers associated with the QR iteration include:
H. Rutishauser (1958). "Solution of Eigenvalue Problems with the LR Transformation," Nat. Bur.
Stand. App. Math. Ser. 49, 47·-81.
J.G.F. Francis (1961). "The QR Transformation: A Unitary Analogue to the LR Transformation,
Parts I and II" Comput. J. 4, 265-72, 332-345.
V.N. Kublanovskaya (1961). "On Some Algorithms for the Solution of the Complete Eigenvalue
Problem," Vychisl. Mat. Mat. Fiz 1(4), 555-570.
R.S. Martin and J.H. Wilkinson (1968). "The Modified LR Algorithm for Complex Hessenberg Ma
trices," Numer. Math. 12, 369-376.
R.S. Martin, G. Peters, and J.H. Wilkinson (1970). "The QRAlgorithm for Real Hessenberg Matrices,"
Numer. Math. 14, 219-231.
For a general insight, we r_ecommend:
D.S. Watkins (1982). "Understanding the QR Algorithm," SIAM Review 24, 427-440.
D.S. Watkins (1993). "Some Perspectives on the Eigenvalue Problem," SIAM Review 35, 430-471.

D.S. Watkins (2008}. ''The QR Algorithm Revisited," SIAM Review 50, 133-145.
D.S. Watkins (2011}. "Francis's Algorithm," Amer. Math. Monthly 118, 387-403.
393
Papers concerned with the convergence of the method, shifting, deflation, and related matters include:
P.A. Businger (1971}. "Numerically Stable Deflation of Hessenberg and Symmetric Tridiagonal Ma
trices, BIT 11, 262-270.
D.S. Watkins and L. Elsner (1991). "Chasing Algorithms for the Eigenvalue Problem," SIAM J.
D.S. Watkins and L. Elsner (1991}. "Convergence of Algorithms of Decomposition Type for the
Eigenvalue Problem," Lin. Alg. Applic. 149, 19-47.
J. Erxiong (1992). "A Note on the Double-Shift QL Algorithm," Lin. Alg. Applic. 1 71, 121-132.
A.A. Dubrullc and G.H. Golub (1994). "A Multishift QR Iteration Without Computation of the
Shifts," Nu.mer. Algorithms 1, 173--181.
D.S. Watkins (1996}. "Forward Stability and Transmission of Shifts in the QR Algorithm," SIAM J.
D.S. Watkins (1996}. "The Transmission of Shifts and Shift Blurring in the QR algorithm," Lin. Alg.
Applic. 241-9, 877-896.
D.S. Watkins (1998}. "Bulge Exchanges in Algorithms of QR Type," SIAM J. Matrix Anal. Applic.
19, 1074-1096.
R. Vandebril (2011}. "Chasing Bulges or Rotations? A Metamorphosis of the QR-Algorithm" SIAM.
Aspects of the balancing problem are discussed in:
E.E. Osborne (1960}. "On Preconditioning of Matrices," J. ACM 7, 338-345.
B.N. Parlett and C. Reinsch (1969). "Balancing a Matrix for Calculation of Eigenvalues and Eigen
vectors," Nu.mer. Math. 1.<J, 292-304.
D.S. Watkins (2006}. "A Case Where Balancing is Harmful," ETNA 29, 1-4.
Versions of the algorithm that are suitable for companion matrices are discussed in:
D.A. Bini, F. Daddi, and L. Gemignani (2004). "On the Shifted QR iteration Applied to Companion
Matrices," ETNA 18, 137-152.
M. Van Barcl, R. Vandebril, P. Van Dooren, and K. Frederix (2010). "Implicit Double Shift QR
Algorithm for Companion Matrices," Nu.mer. Math. 116, 177-212.
Papers that arc concerned with the high-performance implementation of the QR iteration include:
Z. Bai and J.W. Demmel (1989). "On a Block Implementation of Hessenberg Multishift QR Iteration,"
Int. J. High Speed Comput. 1, 97-112.
R.A. Van De Geijn (1993). "Deferred Shifting Schemes for Parallel QR Methods," SIAM J. Matrix
Anal. Applic. 14, 180-194.
D.S. Watkins (1994). "Shifting Strategies for the Parallel QR Algorithm," SIAM J. Sci. Comput. 15,
953-958.
G. Henry and R. van de Geijn (1996). "Parallelizing the QR Algorithm for the Unsymmetric Algebraic
Eigenvalue Problem: Myths and Reality," SIAM J. Sci. Comput. 11, 870-883.
Z. Bai, J. Demmel, .J. Dongarra, A. Petitet, H. Robinson, and K. Stanley (1997). "The Spectral
Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers," SIAM J.
Sci. Comput. 18, 1446--1461.
G. Henry, D.S. Watkins, and J. Dongarra (2002). "A Parallel Implementation of the Nonsymmetric
QR Algorithm for Distributed Memory Architectures," SIAM J. Sci. Comput. 24, 284-311.
K. Braman, R. Byers, and R. Mathias (2002). "The Multishift QR Algorithm. Part I: Maintaining
Well-Focused Shifts and Level 3 Performance," SIAM J. Matrix Anal. Applic. 29, 929-947.
K. Braman, R. Byers, and R. Mathias (2002). "The Multishift QR Algorithm. Part II: Aggressive
Early Deflation," SIAM J. Matrix Anal. Applic. 29, 948-973.
M.R. Fahey (2003). "Algorithm 826: A Parallel Eigenvalue Routine for Complex Hessenberg Matri
ces," ACM TI-ans. Math. Softw. 29, 326- 336.
D. Kressner (2005}. "On the Use of Larger Bulges in the QR Algorithm," ETNA 20, 50-63.
D. Kressner (2008}. "The Effect of Aggressive Early Deflation on the Convergence of the QR Algo
rithm," SIAM J. Matri'C Anal. Applic. 90, 805-821.

7.6 Invariant Subspace Computations
Several important invariant subspace problems can be solved once the real Schur de
composition QTAQ = T has been computed. In this section we discuss how to
• compute the eigenvectors associated with some subset of A(A),
• compute an orthonormal basis for a given invariant subspace,
• block-diagonalize A using well-conditioned similarity transformations,
• compute a basis of eigenvectors regardless of their condition, and
• compute an approximate Jordan canonical form of A.
Eigenvector/invariant subspace computation for sparse matrices is discussed in §7.3.1
and §7.3.2 as well as portions of Chapters 8 and 10.
7.6.1 Selected Eigenvectors via Inverse Iteration
Let q(O) E JRn be a given unit 2-norm vector and assume that A - µI E JRnxn is non
singular. The following is referred to as inverse iteration:
for k = 1, 2, . . .
end
Solve (A - µl)z(k) = q(k-l).
q(k) = z(k)/II z(k) 112
A(k) = q(k)TAq(k)
Inverse iteration is just the power method applied to (A - µJ)-1 .
(7.6.1)
To analyze the behavior of (7.6.1), assume that A has a basis of eigenvectors
{x1 , . . . , Xn} and that Axi = AiXi for i = l:n. If
n
q(O) = L /3iXi
i=l
then q(k) is a unit vector in the direction of
(A - 1)-k (0) - � f3i .
µ q
- 6 (Ai - µ)kXi·
Clearly, if µ is much closer to an eigenvalue Aj than to the other eigenvalues, then q(k)
is rich in the direction of Xj provided /3j =f 0.
A sample stopping criterion for (7.6.1) might be to quit as soon as the residual
satisfies
II r(k) lloo < cull A lloo (7.6.2)

7.6. Invariant Subspace Computations 395
where c is a constant of order unity. Since
with Ek = -r<klq(k)T, it follows that (7.6.2) forces µ and q(k) to be an exact eigenpair
for a nearby matrix.
Inverse iteration can be used in conjunction with Hessenberg reduction and the
QR algorithm as follows:
Step 1. Compute the Hessenberg decomposition UJAUo = H.
Step 2. Apply the double-implicit-shift Francis iteration to H without accumulating
transformations.
Step 3. Foreach computed eigenvalue , whose corresponding eigenvector x is sought,
apply (7.6.1) with A = H and µ = , to produce a vector z such that Hz ::::::: µz.
Step 4. Set x = Uoz.
Inverse iteration with H is very economical because we do not have to accumulate
transformations during the double Francis iteration. Moreover, we can factor matrices
of the form H - ,/ in O(n2) flops, and (3) only one iteration is typically required to
produce an adequate approximate eigenvector.
This last point is perhaps the most interesting aspect of inverse iteration and re
quires some justification since , can be comparatively inaccurate if it is ill-conditioned.
Assume for simplicity that , is real and let
n
H - M = I:(1'iUivT = uEvr
i=l
be the SVD of H - ,/. From what we said about the roundoff properties of the QR
algorithm in §7.5.6, there exists a matrix E E Rnxn such that H + E - ,/ is singular
and II E 112 :=::::: ull H 112- It follows that (1'n :=::::: U(1'1 and
II {H - :XJ)vn 112 :=::::: U(1'1 ,
i.e., Vn is a good approximate eigenvector. Clearly if the starting vector q<0> has the
expansion
then
n
q(O) = Lf'iUi
i=l
is "rich" in the direction Vn· Note that if s(,) ::::::: lu;'vn l is small, then z<1> is rather
deficient in the direction Un · This explains (heuristically) why another step of inverse
iteration is not likely to produce an improved eigenvector approximate, especially if ,
is ill-conditioned. For more details, see Peters and Wilkinson (1979).

396
7.6.2
Ordering Eigenvalues in the Real Schur Form
Recall that the real Schur decomposition provides information about invariant sub
spaces. If
and
[T
0
11 �2
12
2
]P
q
QTAQ = T =
.L '
p
q
then the first p columns of Q span the unique invariant subspace associated with
A(Tn). (See §7.1.4.) Unfortunately, the Francis iteration supplies us with a real Schur
decomposition Q'{:AQF = TF in which the eigenvalues appear somewhat randomly
along the diagonal of TF. This poses a problem if we want an orthonormal basis for
an invariant subspace whose associated eigenvalues are not at the top of TF's diagonal.
Clearly, we need a method for computing an orthogonal matrix Qv such that Q�TFQv
is upper quasi-triangular with appropriate eigenvalue ordering.
A look at the 2-by-2 case suggests how this can be accomplished. Suppose
and that we wish to reverse the order of the eigenvalues. Note that
TFx = A2X
where
x =
[A2t�1 l·
Let Qv be a Givens rotation such that the second component of Q�xis zero. If
then
(QTAQ)e1 = Q�TF(Qve1) = A2Q�(Qve1) = A2e1.
The matrices A and QTAQ have the same Frobenius norm and so it follows that the
latter must have the following form:
Q AQ = .
T [A2 ±t12
l
0 A1
The swapping gets a little more complicated if T has 2-by-2 blocks along its diagonal.
See Ruhe (1970) and Stewart (1976) for details.
By systematically interchanging adjacent pairs of eigenvalues (or 2-by-2 blocks),
we can move any subset of A(A)to the top ofT's diagonal. Here is the overall procedure
for the case when there are no 2-by-2 bumps:

Algorithm 7.6.1 Given an orthogonal matrix Q E 1Rnxn, an upper triangular matrix
T = QTAQ, and a subset tl = {A1, . . . , Ap} of A(A), the following algorithm computes
an orthogonal matrix QD such that Q�TQv = S is upper triangular and {s11 , . . . , Spp}
= /:l. The matrices Q and T are overwritten by QQv and S, respectively.
while {tn , . . . , tpp} -f. tl
for k = l:n - 1
end
end
if tkk ¢ tl and tk+l,k+l E tl
end
[ c, s ] = givens(T(k, k + 1), T(k + 1, k + 1) - T(k, k))
T(k:k + 1, k:n) = [ c 8
]T
T(k:k + 1, k:n)
-s c
T(l:k + 1, k:k + 1) = T(l:k + 1, k:k + 1) [ - � � ]
Q(l:n, k:k + 1) = Q(l:n, k:k + 1) [ -
� � ]
This algorithm requires k(12n) flops, where k is the total number of required swaps.
The integer k is never greater than (n - p)p.
Computation of invariant subspaces by manipulating the real Schur decomposi
tion is extremely stable. If Q = [ q1 I · · · I Qn ] denotes the computed orthogonal matrix
Q, then II QTQ - I 112 � u and there exists a matrix E satisfying II E 112 � ull A 112
such that (A + E)qi E span{q1, • . .
, tJv} for i = l:p.
7.6.3 Block Diagonalization
Let
[Tf'
T12 . . . T,,
l
n1
T22 . . . T2q n2
T (7.6.3)
0 Tqq nq
n1 n2 nq
be a partitioning of some real Schur canonical form QTAQ = T E 1Rn xn such that
A(Tn), . . . , A(Tqq) are disjoint. By Theorem 7.1.6 there exists a matrix Y such that
y-iyy = diag(Tn , . . . , Tqq)·
A practical procedure for determining Y is now given together with an analysis of Y's
sensitivity as a function of the above partitioning.
Partition In = [E1 I · · · IEq] conformably with T and define the matrix Yij E 1Rnxn
as follows:

In other words, Yi; looks just like the identity except that Zii occupies the (i,j) block
position. It follows that if �j
1TYi; = t = (Ti;), then T and t are identical except
that
fi.; = TiiZi; - Zi;T;; + Ti;,
tik = Tik - Zi;T;k, (k = j + l:q),
Tk; = Tkizii + Tk;, (k = l:i - 1) .
Thus, Ti; can be zeroed provided we have an algorithm for solving the Sylvester equa
tion
FZ - ZG = C (7.6.4)
where F E R1'xp and G E R'"xr are given upper quasi-triangular matrices and C E wxr.
Bartels and Stewart (1972) have devised a method for doing this. Let C =
[ c1 I · · · I Cr ] and Z = [ z1 I · · · I Zr ] be column partitionings. If 9k+i,k = 0, then by
comparing columns in (7.6.4) we find
k
Fzk - L9ikZi = ck.
i=l
Thus, once we know zi, . . . , Zk-l, then we can solve the quasi-triangular system
k-1
(F - 9kkl) Zk = Ck + L9ikZi
i=l
for Zk· If 9k+l,k =F 0, then Zk and Zk+1 can be simultaneously found by solving the
2p-by-2p system
(7.6.5)
where m = k + 1. By reordering the equations according to the perfect shuffie per
mutation (l,p + 1, 2,p + 2, . . . ,p, 2p), a banded system is obtained that can be solved
in O(p2) flops. The details may be found in Bartels and Stewart (1972). Here is the
overall process for the case when F and G are each triangular.
Algorithm 7.6.2 (Bartels-Stewart Algorithm) Given C E wxr and upper triangular
matrices F E wxp and G E wxr that satisfy A(F)nA(G) = 0, the following algorithm
overwrites C with the solution to the equation FZ - ZG = C.
for k = l:r
end
C(l:p, k) = C(l:p, k) + C(l:p, l:k - l) ·G(l:k - 1, k)
Solve (F - G(k, k)I)z = C(l:p, k) for z.
C(l:p, k) = z
This algorithm requires pr(p + r) flops. By zeroing the superdiagonal blocks in T in
the appropriate order, the entire matrix can be reduced to block diagonal form.

Algorithm 7.6.3 Given an orthogonal matrix Q E Rnxn, an upper quasi-triangular
matrix T = QTAQ, and the partitioning (7.6.3), the following algorithm overwrites Q
with QY where y-1rY = diag(T11, . . . , Tqq)·
for j = 2:q
for i = l:j - 1
Solve Tiiz - ZT11 = -Tij for Z using the Bartels-Stewart algorithm.
for k = j + l:q
end
end
Tik = Tik - ZT1k
end
for k = l:q
Qk1 = Qkiz + Qk1
end
The number of flops required by this algorithm is a complicated function of the block
sizes in (7.6.3).
The choice of the real Schur form T and its partitioning in (7.6.3) determines
the sensitivity of the Sylvester equations that must be solved in Algorithm 7.6.3. This
in turn affects the condition of the matrix Y and the overall usefulness of the block
diagonalization. The reason for these dependencies is that the relative error of the
computed solution Z to
satisfies
II z - z IIF � II T II F
II z llf.
�
u sep(Tii, T11r
For details, see Golub, Nash, and Van Loan (1979). Since
sep(Tii, T11) min
X#O
II TiiX - XTjj llF <
11 x 11F
min
>.E>.(T;;)
µE>.(T;; )
1.x - µI
(7.6.6)
there can be a substantial loss of accuracy whenever the subsets .X(Tii) are insufficiently
separated. Moreover, if Z satisfies (7.6.6) then
II z llF :::;
II Tij llF
sep(Tii, T11)
Thus, large norm solutions can be expected if sep(Tii, T11) is small. This tends to make
the matrix Y in Algorithm 7.6.3 ill-conditioned since it is the product of the matrices
[In, Z l
Yi1 = .
0 In1

Confronted with these difficulties, Bavely and Stewart (1979) develop an algo
rithm for block diagonalizing that dynamically determines the eigenvalue ordering and
partitioning in (7.6.3) so that all the Z matrices in Algorithm 7.6.3 are bounded in
norm by some user-supplied tolerance. Their research suggests that the condition of Y
can be controlled by controlling the condition of the Yij.
7.6.4 Eigenvector Bases
If the blocks in the partitioning (7.6.3) are all l-by-1, then Algorithm 7.6.3 produces a
basis ofeigenvectors. As with the method of inverse iteration, the computed eigenvalue
eigenvector pairs are exact for some "nearby" matrix. A widely followed rule of thumb
for deciding upon a suitable eigenvector method is to use inverse iteration whenever
fewer than 25% of the eigenvectors are desired.
We point out, however, that the real Schur form can be used to determine selected
eigenvectors. Suppose
k- 1
u
>.
0
k- 1
n-k
n- k
is upper quasi-triangular and that >. (j. >.(T11) U >.(T33). It follows that if we solve the
linear systems (T11 - >.I)w = -u and (T33 - >.J)Tz = -v then
are the associated right and left eigenvectors, respectively. Note that the condition of
>. is prescribed by
l/s(>.) = .j(l + wTw)(l + zTz).
7.6.5 Ascertaining Jordan Block Structures
Suppose that we have computed the real Schur decomposition A = QTQT, identified
clusters of "equal" eigenvalues, and calculated the corresponding block diagonalization
T = Y·diag(T11, . . . , Tqq)Y-1 . As we have seen, this can be a formidable task. However,
even greater numerical problems confront us ifwe attempt to ascertain the Jordan block
structure of each Tii· A brief examination of these difficulties will serve to highlight the
limitations of the Jordan decomposition.
Assume for clarity that >.(Tii) is real. The reduction of Tii to Jordan form begins
by replacing it with a matrix of the form C = >.I + N, where N is the strictly upper
triangular portion of Tii and where >., say, is the mean of its eigenvalues.
Recall that the dimension of a Jordan block J(>.) is the smallest nonnegative
integer k for which [J(>.) - >.J]k = 0. Thus, if Pi = dim[null(Ni)J, for i = O:n, then
Pi - Pi-l equals the number of blocks in C's Jordan form that have dimension i or

greater. A concrete example helps to make this assertion clear and to illustrate the
role of the SVD in Jordan form computations.
Assume that c is 7-by-7. Suppose WC compute the SVD urNVi = E1 and
"discover" that N has rank 3. If we order the singular values from small to large then
it follows that the matrix Ni = VtNVi has the form
At this point, we know that the geometric multiplicity of .A is 4-i.e, C's Jordan form
has four blocks (P1 - Po = 4 - 0 = 4).
Now suppose UiLV2 = E2 is the SVD of L and that we find that L has unit rank.
Ifwe again order the singular values from small to large, then L2 = V{LV2 clearly has
the following structure:
L, �
[H �]
However, .X(L2) = .X(L) = {O, 0, O} and so c = 0. Thus, if
V2 = diag(h V2)
then N2 = V{N1Vi has the following form:
0 0 0 0 x x x
0 0 0 0 x x x
0 0 0 0 x x x
N2 0 0 0 0 x x x
0 0 0 0 0 0 a
0 0 0 0 0 0 b
0 0 0 0 0 0 0
Besides allowing us to introduce more zeros into the upper triangle, the SVD of L also
enables us to deduce the dimension of the nullspace of N2• Since
N2 =
[0 KL l [0 K l [0 K l
1 O L2 0 L 0 L
and [ � ] has full column rank,
p2 = dim(null(N2)) = dim(null(Nf)) = 4 + dim(null(L)) = P1 + 2.
Hence, we can conclude at this stage that the Jordan form of C has at least two blocks
of dimension 2 or greater.
Finally, it is easy to see that Nf = 0, from which we conclude that there is p3 -p2
== 7 - 6 = 1 block of dimension 3 or larger. If we define V = ViV2 then it follows that

the decomposition
.X 0 0 0 x x x
}four blocks of o.-d& 1 o• ia.g.,
0 .X 0 0 x x x
0 0 .X 0 x x x
vrcv = 0 0 0 .X x x x
0 0 0 0 .X x a
} two blocks of order 2 or larger
0 0 0 0 0 .X 0
0 0 0 0 0 0 .X } one block of order 3 or larger
displays C's Jordan block structure: two blocks of order 1, one block of order 2, and
one block of order 3.
To compute the Jordan decomposition it is necessary to resort to nonorthogonal
transformations. We refer the reader to Golub and Wilkinson (1976), Kagstrom and
Ruhe (1980a, 1980b), and Demmel (1983) for more details. The above calculations
with the SYD amply illustrate that difficult rank decisions must be made at each stage
and that the final computed block structure depends critically on those decisions.
Problems
P7.6.1 Give a complete algorithm for solving a real, n-by-n, upper quasi-triangular system Tx = b.
P7.6.2 Suppose u-1AU = diag(a1, . . . , am ) and v-1BV = diag(t3i , . . . , ,Bn)· Show that if
l/>(X) = AX - XB,
then
>..(</>) = { ai - .B; : i = l:m, j = l:n }.
What are the corresponding eigenvectors? How can these facts be used to solve AX - XB = C?
P7.6.3 Show that if Z E �pxq and
y = [ I� � ] '
then 1t2(Y) = (2 + u2 + v'4u2 + u4 ]/2 where u = II Z 112.
P7.6.4 Derive the system (7.6.5).
P7.6.5 Assume that T E Rnxn is block upper triangular and partitioned as follows:
T E Rnxn .
Suppose that the diagonal block T22 is 2-by-2 with complex eigenvalues that are disjoint from >..(Tu)
and >..(Taa). Give an algorithm for computing the 2-dimensional real invariant subspace associated
with T22's eigenvalues.
P7.6.6 Suppose H E Rnxn is upper Hessenberg with a complex eigenvalue >..+i · µ. How could inverse
iteration be used to compute x, y E Rn so that H(x + iy) = (>.. + iµ)(x + iy)? Hint: Compare real and
imaginary parts in this equation and obtain a 2n-by-2n real system.
Much of the material discussed in this section may be found in the following survey paper:
G.H. Golub and J.H. Wilkinson (1976). "Ill-Conditioned Eigensystems and the Computation of the
Jordan Canonical Form," SIAM Review 18, 578-619.
The problem of ordering the eigenvalues in the real Schur form is the subject of:

A. Rube (1970). "An Algorithm for Numerical Determination of the Structure of a General Matrix,"
BIT 10, 196-216.
G.W. Stewart (1976). "Algorithm 406: HQR3 and EXCHNG: Fortran Subroutines for Calculating
and Ordering the Eigenvalues of a Real Upper Hessenberg Matrix,'' ACM Trans. Math. Softw. 2,
275-280.
J.J. Dongarra, S. Hammarling, and J.H. Wilkinson (1992). "Numerical Considerations in Computing
Invariant Subspaces," SIAM J. Matrix Anal. Applic. 13, 145-161.
z. Bai and J.W. Demmel (1993). "On Swapping Diagonal Blocks in Real Schur Form," Lin. Alg.
Applic. 186, 73-95
Procedures for block diagonalization including the Jordan form are described in:
C. Bavely and G.W. Stewart (1979). "An Algorithm for Computing Reducing Subspaces by Block
Diagonalization,'' SIAM J. Numer. Anal. 1 6, 359-367.
B. Kagstrom and A. Rube (1980a). "An Algorithm for Numerical Computation of the Jordan Normal
Form of a Complex Matrix,'' ACM Trans. Math. Softw. 6, 398-419.
B. Kagstrom and A. Rube (1980b). "Algorithm 560 JNF: An Algorithm for Numerical Computation
of the Jordan Normal Form of a Complex Matrix,'' ACM Trans. Math. Softw. 6, 437-443.
J.W. Demmel (1983). "A Numerical Analyst's Jordan Canonical Form," PhD Thesis, Berkeley.
N. Ghosh, W.W. Hager, and P. Sarmah (1997). "The Application of Eigenpair Stability to Block
Diagonalization," SIAM J. Numer. Anal. 34, 1255-1268.
S. Serra-Capizzano, D. Bertaccini, and G.H. Golub (2005). "How to Deduce a Proper Eigenvalue
Cluster from a Proper Singular Value Cluster in the Nonnormal Case," SIAM J. Matrix Anal.
Applic. 27, 82-86.
Before we offer pointers to the literature associated with invariant subspace computation, we remind
the reader that in §7.3 we discussed the power method for computing the dominant eigenpair and the
method of orthogonal iteration that can be used to compute dominant invariant subspaces. Inverse
iteration is a related idea and is the concern of the following papers:
J. Varah (1968). "The Calculation of the Eigenvectors of a General Complex Matrix by Inverse
Iteration," Math. Comput. 22, 785-791.
J. Varah (1970). "Computing Invariant Subspaces of a General Matrix When the Eigensystem is
Poorly Determined," Math. Comput. 24, 137-149.
G. Peters and J.H. Wilkinson (1979). "Inverse Iteration, Ill-Conditioned Equations, and Newton's
Method," SIAM Review 21, 339-360.
I.C.F. Ipsen (1997). "Computing an Eigenvector with Inverse Iteration," SIAM Review 39, 254-291.
In certain applications it is necessary to track an invariant subspace as the matrix changes, see:
L. Dieci and M.J. Friedman (2001). "Continuation of Invariant Subspaces," Num. Lin. Alg. 8,
317-327.
D. Bindel, J.W. Demmel, and M. Friedman (2008). "Continuation of Invariant Subsapces in Large
Bifurcation Problems," SIAM J. Sci. Comput. 30, 637-656.
Papers concerned with estimating the error in a computed eigenvalue and/or eigenvector include:
S.P. Chan and B.N. Parlett (1977). "Algorithm 517: A Program for Computing the Condition Num
bers of Matrix Eigenvalues Without Computing Eigenvectors," A CM Trans. Math. Softw. 3,
186-203.
H.J. Symm and J.H. Wilkinson (1980). "Realistic Error Bounds for a Simple Eigenvalue and Its
Associated Eigenvector," Numer. Math. 35, 113-126.
C. Van Loan (1987). "On Estimating the Condition of Eigenvalues and Eigenvectors," Lin. Alg.
Applic. 88/89, 715-732.
Z. Bai, J. Demmel, and A. McKenney (1993). "On Computing Condition Numbers for the Nonsym-
metric Eigenproblem," ACM Trans. Math. Softw. 19, 202-223.
Some ideas about improving computed eigenvalues, eigenvectors, and invariant subspaces may be
found in:
J. Varah (1968). "Rigorous Machine Bounds for the Eigensystem of a General Complex Matrix,"
Math. Comp. 22, 793-801.
J.J. Dongarra, C.B. Moler, and J.H. Wilkinson (1983). "Improving the Accuracy of Computed Eigen
values and Eigenvectors,'' SIAM J. Numer. Anal. 20, 23-46.

J.W. Demmel (1987). "Three Methods for Refining Estimates of Invariant Subspaces,'' Comput. 38,
43-57.
As we have seen, the sep(.,.) function is of great importance in the assessment of a computed invariant
subspace. Aspects of this quantity and the associated Sylvester equation are discussed in:
J. Varah (1979). "On the Separation of Two Matrices," SIAM J. Numer. Anal. 16, 212-222.
R. Byers (1984). "A Linpack-Style Condition Estimator for the Equation AX - XBT = C," IEEE
Trans. Autom. Contr. A C-29, 926-928.
M. Gu and M.L. Overton (2006). "An Algorithm to Compute Sep.>.," SIAM J. Matrix Anal. Applic.
28, 348--359.
N.J. Higham (1993). "Perturbation Theory and Backward Error for AX - XB = C," BIT 33, 124-136.
Sylvester equations arise in many settings, and there are many solution frameworks, see:
R.H. Bartels and G.W. Stewart (1972). "Solution of the Equation AX + XB = C,'' Commun. ACM
15, 820-826.
G.H. Golub, S. Nash, and C. Van Loan (1979). "A Hessenberg-Schur Method for the Matrix Problem
AX + XB = C,'' IEEE Trans. Autom. Contr. AC-24, 909-913.
K. Datta (1988). "The Matrix Equation XA - BX = R and Its Applications," Lin. Alg. Applic. 109,
91-105.
B. Kagstrom and P. Poromaa (1992). "Distributed and Shared Memory Block Algorithms for the
Triangular Sylvester Equation with sep-1 Estimators,'' SIAM J. Matrix Anal. Applic. 13, 90-
101.
J. Gardiner, M.R. Wette, A.J. Laub, J.J. Amato, and C.B. Moler (1992). "Algorithm 705: A
FORTRAN-77 Software Package for Solving the Sylvester Matrix Equation AXBT +CXDT = E,"
ACM Trans. Math. Softw. 18, 232-238.
V. Simoncini {1996). "On the Numerical Solution of AX -XB =C," BIT 36, 814-830.
C.H. Bischof, B.N Datta, and A. Purkayastha (1996). "A Parallel Algorithm for the Sylvester Observer
Equation," SIAM J. Sci. Comput. 1 7, 686-698.
D. Calvetti, B. Lewis, L. Reichel (2001). "On the Solution of Large Sylvester-Observer Equations,"
Num. Lin. Alg. 8, 435-451.
The constrained Sylvester equation problem is considered in:
J.B. Barlow, M.M. Monahemi, and D.P. O'Leary (1992). "Constrained Matrix Sylvester Equations,"
A.R. Ghavimi and A.J. Laub (1996). "Numerical Methods for Nearly Singular Constrained Matrix
Sylvester Equations." SIAM J. Matrix Anal. Applic. 1 7, 212-221.
The Lyapunov problem FX + XFT = -C where C is non-negative definite has a very important role
to play in control theory, see:
G. Hewer and C. Kenney (1988). "The Sensitivity ofthe Stable Lyapunov Equation," SIAM J. Control
Optim 26, 321-344.
A.R. Ghavimi and A.J. Laub (1995). "Residual Bounds for Discrete-Time Lyapunov Equations,"
IEEE Trans. Autom. Contr. 40, 1244--1249.
.J.-R. Li and J. White (2004). "Low-Rank Solution of Lyapunov Equations,'' SIAM Review 46, 693-
713.
Several authors have considered generalizations of the Sylvester equation, i.e., EFiXGi = C. These
include:
P. Lancaster (1970). "Explicit Solution of Linear Matrix Equations,'' SIAM Review 12, 544-566.
H. Wimmer and A.D. Ziebur (1972). "Solving the Matrix Equations Efp(A)gp(A) = C," SIAM
Review 14, 318-323.
W.J. Vetter (1975). "Vector Structures and Solutions of Linear Matrix Equations," Lin. Alg. Applic.
10, 181-188.

7.7. The Generalized Eigenvalue Problem 405
7.7 The Generalized Eigenvalue Problem
If A, B E <Cnxn, then the set of all matrices of the form A - >..B with >.. E <C is a pencil.
The generalized eigenvalues of A - >..B are elements of the set >..(A, B) defined by
>..(A, B) = {z E <C : det(A - zB) = O }.
If >.. E >..(A, B) and 0 =/:- x E <Cn satisfies
Ax = >..Bx, (7.7.1)
then x is an eigenvector of A - >..B. The problem of finding nontrivial solutions to
(7.7.1) is the generalized eigenvalue problem and in this section we survey some of its
mathematical properties and derive a stable method for its solution. We briefly discuss
how a polynomial eigenvalue problem can be converted into an equivalent generalized
eigenvalue problem through a linearization process.
7.7.1 Background
The first thing to observe about the generalized eigenvalue problem is that there are
n eigenvalues if and only if rank(B) = n. If B is rank deficient then >..(A, B) may be
finite, empty, or infinite:
A =
[� � ], B
A [� �], B
[0
1 o
0
]
[o
o o
1
]
:::? >..(A, B) = {1},
:::? >..(A, B) = 0,
Note that if 0 =/:- >.. E >..(A, B), then (1/>..) E >..(B, A). Moreover, if B is nonsingular,
then >..(A, B) = >..(B-1 A, I) = >..(B-1A). This last observation suggests one method
for solving the A - >..B problem if B is nonsingular:
Step 1. Solve BC = A for C using (say) Gaussian elimination with pivoting.
Step 2. Use the QR algorithm to compute the eigenvalues of C.
In this framework, C is affected by roundoff errors of order ull A 11211 B-1 112- If B is ill
conditioned, then this precludes the possibility ofcomputing anygeneralized eigenvalue
accurately-even those eigenvalues that may be regarded as well-conditioned. For
example, if
A
[1.746 .940 l
1.246 1.898
and B
[.780 .563 ],
.913 .659

then A(A, B) = {2, 1.07 x 106}. With 7-digit floating point arithmetic, we find
A(fl(AB-1)) = {1.562539, 1.01 x 106}. The poor quality of the small eigenvalue is
because K2(B) � 2 x 106. On the other hand, we find that
A(l, fl(A-1B)) � {2.000001, 1.06 x 106}.
The accuracy of the small eigenvalue is improved because K2(A) � 4.
The example suggests that we seek an alternative approach to the generalized
eigenvalue problem. One idea is to compute well-conditioned Q and Z such that the
matrices
(7.7.2)
are each in canonical form. Note that A(A, B)= A(A1, Bi) since
We say that the pencils A - AB and A1 - AB1 are equivalent if (7.7.2) holds with
nonsingular Q and Z.
As in the standard eigenproblem A - Al there is a choice between canonical
forms. Corresponding to the Jordan form is a decomposition of Kronecker in which
both A1 and B1 are block diagonal with blocks that are similar in structure to Jordan
blocks. The Kronecker canonical form poses the same numerical challenges as the
Jordan form, but it provides insight into the mathematical properties of the pencil
A - AB. See Wilkinson (1978) and Demmel and Kagstrom (1987) for details.
7.7.2 The Generalized Schur Decomposition
From the numerical point of view, it makes to insist that the transformation matrices
Q and Z be unitary. This leads to the following decomposition described in Moler and
Stewart (1973).
Theorem 7.7.1 {Generalized Schur Decomposition). If A and B are in <Cnxn,
then there exist unitary Q and Z such that QHAZ = T and QHBZ = S are upper
triangular. Iffor some k, tkk and Skk are both zero, then A(A, B) = <C. Otherwise
A(A, B) = {tidsii : Sii #- O}.
Proof. Let {Bk} be a sequence of nonsingular matrices that converge to B. For each
k, let
Q{!(AB"k1)Qk = Rk
be a Schur decomposition of AB;1. Let Zk be unitary such that
z{!(B;1Qk) = s;1
is upper triangular. It follows that Q{!AZk = RkSk and Q{!BkZk = Sk are also
upper triangular. Using the Bolzano-Weierstrass theorem, we know that the bounded
sequence {(Qk, Zk)} has a converging subsequence,

It is easy to show that Q and Z are unitary and that QHAZ and QHBZ are upper
triangular. The assertions about ,(A, B) follow from the identity
n
det(A - -B) = det(QZH) IT(tii - ASii)
i=l
and that completes the proof of the theorem. 0
If A and B are real then the following decomposition, which corresponds to the
real Schur decomposition (Theorem 7.4.1), is of interest.
Theorem 7.7.2 (Generalized Real Schur Decomposition). If A and B are in
llnxn then there exist orthogonal matrices Q and Z such that QTAZ is upper quasi
triangular and QTBZ is upper triangular.
Proof. See Stewart (1972). 0
In the remainder of this section we are concerned with the computation of this decom
position and the mathematical insight that it provides.
7.7.3 Sensitivity Issues
The generalized Schur decomposition sheds light on the issue of eigenvalue sensitivity
for the A - ,B problem. Clearly, small changes in A and B can induce large changes
in the eigenvalue Ai = tii/sii if Sii is small. However, as Stewart (1978) argues, it
may not be appropriate to regard such an eigenvalue as "ill-conditioned." The reason
is that the reciprocal µi = sii/tii might be a very well-behaved eigenvalue for the
pencil µA - B. In the Stewart analysis, A and B are treated symmetrically and the
eigenvalues are regarded more as ordered pairs (tii, Sii) than as quotients. With this
point ofview it becomes appropriate to measure eigenvalue perturbations in the chordal
metric chord(a, b) defined by
chord(a, b) la - bl
Stewart shows that if , is a distinct eigenvalue of A - ,B and A1: is the corresponding
eigenvalue of the perturbed pencil A - -B with II A - A 112 � II B - B 112 � E, then
chord(-, -1:) :::;
E
+ 0(1:2)
J(yHAx)2 + (yHBx)2
where x and y have unit 2-norm and satisfy Ax = ,Bx and yHA= ,yHB. Note that the
denominator in the upper bound is symmetric in A and B. The "truly" ill-conditioned
eigenvalues are those for which this denominator is small.
The extreme case when both tkk and Skk are zero for some k has been studied by
Wilkinson (1979). In this case, the remaining quotients tii/Sii can take on arbitrary
values.

7.7.4 Hessenberg-Triangular Form
The first step in computing the generalized real Schur decomposition of the pair (A, B)
is to reduce A to upper Hessenberg form and B to upper triangular form via orthog
onal transformations. We first determine an orthogonal U such that UTB is upper
triangular. Of course, to preserve eigenvalues, we must also update A in exactly the
same way. Let us trace what happens in the n = 5 case.
x x
x x
x x
x x
x x
x
x
x
x
x
x
x
0
0
0
x
x
x
0
0
x
x
x
x
0
Next, we reduce A to upper Hessenberg form while preserving B's upper triangular
form. First, a Givens rotation Q45 is determined to zero as1:
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
0
0
0
x
x
x
0
0
x
x
x
x
x
The nonzero entry arising in the {5,4) position in B can be zeroed by postmultiplying
with an appropriate Givens rotation Z45:
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
0
0
0
x
x
x
0
0
x
x
x
x
0
Zeros are similarly introduced into the (4, 1) and (3, 1) positions in A:
A +-- AZa4
[x
x
x
�
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
:
:
x
l·
B +-- QfaB
x x x
x x x
0 x x
0 x x
0 0 0
x x x
x x x
0 x x
0 0 x
0 0 0
x
x
x
0
0
x
x
x
0
0
x
x
x
x
0

7.7. The Generalized Eigenvalue Problem
x x x
x x x
x x x
x x x
x x x
x
x
0
0
0
x
x
x
0
0
x
x
x
x
0
409
A is now upper Hessenberg through its first column. The reduction is completed by
zeroingas2, a42, and as3. Note that two orthogonal transformations arc required for
eachaii thatiszeroed-onetodothezeroingand theothertorestoreB'striangularity.
Either Givens rotations or 2-by-2 modified Householder transformations can be used.
Overallwehave:
Algorithm 7.7.1 (Hessenberg-Triangular Reduction) Given A and B in IRnxn, the
followingalgorithmoverwritesA with anupperHessenberg matrix QTAZ andBwith
an uppertriangularmatrixQTBZwherebothQand Z areorthogonal.
Computethefactorization B= QRusingAlgorithm5.2.1 andoverwrite
A with QTA and B with QTB.
for j = l:n - 2
end
for i = n: - l:j + 2
end
[c, s] = givens(A(i - 1,j), A(i,j))
A(i - l:i,j:n) = [ c
s ]TA(i - l:i,j:n)
- s c
B(i - l:i, i - l:n) = [ c s ]TB(i - l:i, i - l:n)
-s c
[c, s] = givens(- B(i, i), B(i, i - 1))
B(l:i, i - l:i) = B(l:i, i - l:i) [ -� � ]
A(l:n, i - l:i) = A(l:n, i - l:i) [ -S
c s
c ]
Thisalgorithm requires about 8n3 flops. TheaccumulationofQand Z requiresabout
4n3 and 3n3 flops, respectively.
The reductionofA - >..B to Hessenberg-triangular form serves as a "front end"
decomposition for a generalized QR iteration known as the QZ iteration which we
describenext.
7.7.5 Deflation
In describing the QZ iteration we may assume without loss of generality that A is
an unreduced upper Hessenberg matrixandthat B is anonsingular upper triangular

matrix. The first of these assertions is obvious, for if ak+l,k = 0 then
[ Au - >.Bu Ai2 - >.B12 ] k
A - >.B =
0 A22 - >.B22 n-k
k n-k
and we may proceed to solve the two smaller problems Au - >.Bu and A22 - >.B22.
On the other hand, if bkk = 0 for some k, then it is possible to introduce a zero in A's
(n, n - 1) position and thereby deflate. Illustrating by example, suppose n = 5 and
k = 3:
x
x
x
0
0
x
x
x
x
0
x
x
x
x
x
x
x
0
0
0
x
x
0
0
0
x
x
x
x
0
The zero on B's diagonal can be "pushed down" to the (5,5) position as follows using
Givens rotations:
x
x
x
x
0
x
x
x
0
0
x
x
x
0
0
x
x
x
0
0
x
x
x
0
0
x
x
x
x
0
x
x
x
x
0
x
x
x
x
x
x
x
x
x
0
x
x
x
x
0
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
0
[�
[�
[�
x x x
x x x
0 0 x
0 0 0
0 0 0
x x x
x x x
0 0 x
0 0 0
0 0 0
x x x
x x x
0 0 x
0 0 0
0 0 0
x x x
x x x
0 x x
0 0 0
0 0 0
x x x
x x x
0 x x
0 0 x
0 0 0
This zero-chasing technique is perfectly general and can be used to zero an,n-l regard
less of where the zero appears along B's diagonal.

7.7.6 The QZ Step
We are now in a position to describe a QZ step. The basic idea is to update A and B
as follows
(A - >.B) = QT(A - >.B)Z,
where A is upper Hessenberg, fJ is upper triangular, Q and Z are each orthogonal, and
AfJ-1 is essentially the same matrix that would result if a Francis QR step (Algorithm
7.5.1) were explicitly applied to AB-1 • This can be done with some clever zero-chasing
and an appeal to the implicit Q theorem.
Let M = AB-1 (upper Hessenberg) and let v be the first column of the matrix
(M - al)(M - bl), where a and b are the eigenvalues of M's lower 2-by-2 submatrix.
Note that v can be calculated in 0(1) flops. If Po is a Householder matrix such that
Pov is a multiple of e1, then
A +-- PoA =
x x x x x x
B +-- PoB =
x x x x x x
0 0 x x x x 0 0 0 x x x
0 0 0 x x x 0 0 0 0 x x
0 0 0 0 x x 0 0 0 0 0 x
The idea now is to restore these matrices to Hessenberg-triangular form by chasing the
unwanted nonzero elements down the diagonal.
To this end, we first determine a pair of Householder matrices Z1 and Z2 to zero
"31, ba2, and �1:
A+- AZ1Z2 =
x x x x x x
B +-- BZ1Z2 =
0 0 x x x x
x x x x x x 0 0 0 x x x
0 0 0 x x x 0 0 0 0 x x
0 0 0 0 x x 0 0 0 0 0 x
Then a Householder matrix P1 is used to zero aa1 and a41 :
A +-- P1A =
0 x x x x x
B +-- PiB =
0 x x x x x
0 x x x x x 0 x x x x x
0 0 0 x x x 0 0 0 0 x x
0 0 0 0 x x 0 0 0 0 0 x
Notice that with this step the unwanted nonzero elements have been shifted down
and to the right from their original position. This illustrates a typical step in the QZ
iteration. Notice that Q = QoQ1 · · · Qn-2 has the same first column as Qo. By the way
the initial Householder matrix was determined, we can apply the implicit Q theorem
and assert that AB-1 = QT(AB-1)Q is indeed essentially the same matrix that we
would obtain by applying the Francis iteration to M = AB-1 directly. Overall we have
the following algorithm.

Algorithm 7.7.2 (The QZ Step) Given an unreduced upper Hessenberg matrix
A E JRnxn and a nonsingular upper triangular matrix B E JRnxn, the following algo
rithm overwrites A with the upper Hessenberg matrix QTAZ and B with the upper
triangular matrix QTBZ where Q and Z are orthogonal and Q has the same first col
umn as the orthogonal similarity transformation in Algorithm 7.5.l when it is applied
to AB-1•
Let M = AB-1 and compute (M - al)(lvl - bl)e1 = [x, y, z, 0, . . . , O]T
where a and b are the eigenvalues of !v/'s lower 2-by-2.
for k = l:n - 2
end
Find Ho=ho!de< Q, so Q, [� ] [� ]·
A = diag(h-1 , QkJn-k-2) · A
B = diag(h-1, Qk, ln-k-2) · B
Find Householder Zk1 so [ bk+2,k I bk+2,k+l I bk+2,k+2 ] Zk1 = [ 0 I 0 I * ] .
A = A-diag(h-1 , Zk1, ln-k-2)
B = B·diag(h-1, Zk1 , ln-k-2)
Find Householder Zk2 so [ bk+l,k I bk+l,k+l ] Zk2 = [ 0 I * ] .
A = A-diag(Jk-1, Zk2, ln-k-i)
B = B·diag(Jk-1, Zk2, ln-k-1)
x = ak+1,k; Y = ak+2,k
if k < n - 2
z = ak+3,k
end
Find Householder Qn-1 so Qn-1 [ � ] = [ � ] .
A = diag(In-2, Qn-1 ) · A
B = diag(In-2, Qn-1) · B.
Find Householder Zn-l so [ bn,n-l j bnn ] Zn-l = [ 0 j * ] .
A = A-diag(ln-21 Zn-1)
B = B·diag(Jn-21 Zn-1)
This algorithm requires 22n2 flops. Q and Z can be accumulated for an additional 8n2
flops and 13n2 flops, respectively.
7.7.7 The Overall QZ Process
By applying a sequence of QZ steps to the Hessenberg-triangular pencil A - >..B, it is
possible to reduce A to quasi-triangular form. In doing this it is necessary to monitor
A's subdiagonal and B's diagonal in order to bring about decoupling whenever possible.
The complete process, due to Moler and Stewart (1973), is as follows:

Algorithm 7.7.3 Given A E R.nxn and B E R.nxn, the following algorithm computes
orthogonal Q and Z such that QTAZ = T is upper quasi-triangular and QTBZ = S
is upper triangular. A is overwritten by T and B by S.
Using Algorithm 7.7.1, overwrite A with QTAZ (upper Hessenberg) and
B with QTBZ (upper triangular).
until q = n
end
Set to zero subdiagonal entries that satisfy lai,i-1I ::::; E(lai-l,i-1I + laiiI).
Find the largest nonnegative q and the smallest nonnegative p such that if
A12 Ai3
l
p
A22 A23 n-p-q
0 A33 q
p n-p-q q
then A33 is upper quasi-triangular and A22 is upper Hessenberg
and unreduced.
Partition B conformably:
B
if q < n
if B22 is singular
Zero an-q,n-q-1
else
[
Bu
0
0
p
B1 2 B13
l
p
B22 B23 n-p-q
0 833 q
n-p-q q
Apply Algorithm 7.7.2 to A22 and B22 and update:
end
end
A = diag(/p, Q, Iq)TA·diag(/p, Z, lq)
B = diag(lv, Q, lq)TB·diag(lv, Z, Iq)
This algorithm requires 30n3 flops. If Q is desired, an additional 16n3 are necessary.
ff Z is required, an additional 20n3 are needed. These estimates of work are based on
the experience that about two QZ iterations per eigenvalue arc necessary. Thus, the
convergence properties of QZ are the same as for QR. The speed of the QZ algorithm
is not affected by rank deficiency in B.
The computed S and T can be shown to satisfy
Q5'(A + E)Zo = T, Q5(B + F)Zo = S,
where Qo and Zo are exactly orthogonal and II E 112 � ull A 112 and 11 F 112 � ull B 112-

7.7.8 Generalized Invariant Subspace Computations
Many of the invariant subspace computations discussed in §7.6 carry over to the gen
eralized eigenvalue problem. For example, approximate eigenvectors can be found via
inverse iteration:
q(O) E <Vnxn given.
for k = 1, 2, . . .
end
Solve (A - µB)z(k) = Bq(k-l).
Normalize: q(k) = z(k)/II z(k) 112·
A(k) = [q(k)]HAq(k) I [q(k)]HAq(k)
If B is nonsingular, then this is equivalent to applying (7.6.1) with the matrix B-
1A.
Typically, only a single iteration is required ifµ is an approximate eigenvalue computed
by the QZ algorithm. By inverse iterating with the Hessenberg-triangular pencil, costly
accumulation of the Z-transformations during the QZ iteration can be avoided.
Corresponding to the notion of an invariant subspace for a single matrix, we have
the notion of a deflating subspace for the pencil A - AB. In particular, we say that
a k-dimensional subspace S � <Vn is deflating for the pencil A - AB if the subspace
{ Ax + By : x, y E S } has dimension k or less. Note that if
is a generalized Schur decomposition ofA- AB, then the columns of Z in the generalized
Schur decomposition define a family of deflating subspaces. Indeed, if
are column partitionings, then
span{Az1, . . . , Azk} � span{q1, . . . , qk},
span{Bz1, . . . , Bzk} � span{q1, . . . , qk},
for k = l:n. Properties of deflating subspaces and their behavior under perturbation
are described in Stewart (1972).
7.7.9 A Note on the Polynomial Eigenvalue Problem
More general than the generalized eigenvalue problem is the polynomial eigenvalue
problem. Here we are given matrices Ao, . . . , Ad E <Vnxn and determine A E <V and
0 � x E <Vn so that
P(A)x = 0
where the A-matrix P(A) is defined by
P(A) = Ao + AA1 + · · · + AdAd.
(7.7.3)
(7.7.4)
We assume Ad � 0 and regard d as the degree of P(A). The theory behind the polyno
mial eigenvalue problem is nicely developed in Lancaster (1966).

It is possible to convert (7.7.3) into an equivalent linear eigenvalue problem with
larger dimension. For example, suppose d = 3 and
L(A) =
If
then
[ 0 0 � l
-I 0 Ai
0 -I A2
L(�) [�
l =
H[� 0
�l
I (7.7.5)
0 A3
Ul·
In general, we say that L(A) is a linearization of P(A) if there are dn-by-dn A-matrices
S(A) and T(A), each with constant nonzero determinants, so that
(7.7.6)
has unit degree. With this conversion, the A - AB methods just discussed can be
applied to find the required eigenvalues and eigenvectors.
Recent work has focused on how to choose the A-transformations S(A) and T(A)
so that special structure in P(A) is reflected in L(A). See Mackey, Mackey, Mehl, and
Mehrmann (2006). The idea is to think of (7.7.6) as a factorization and to identify the
transformations that produce a properly structured L(A). To appreciate this solution
framework it is necessary to have a facility with A-matrix manipulation and to that
end we briefly examine the A-matrix transformations behind the above linearization.
If
then
and it is easy to verify that
Notice that the transformation matrices have unit determinant and that the A-matrix
on the right-hand side has degree d - 1. The process can be repeated. If
then

416
and
n
0
In
0
0
][M.
->..In -In
In 0
Ao
Pi(>.)
0
�][;
In 0
0
0
-In
[M.
-In
0
0
]
In =
P2(>.)
0
�]
>..In Ai .
-In P2(>.)
Note that the matrix on the right has degree d - 2. A straightforward induction
argument can be assembled to establish that if the dn-by-dn matrices S(>.) and T(>.)
are defined by
In ->.In 0 0 0 0 0 I
0 I,.
->.In
-In 0 Pi (>.)
S(>.) = 0 ' T(>.) = 0 -In
In
-
>.In Pd-2(>.)
0 0 0 In 0 0 -In Pd-1 (>.)
where
then
>..In 0 0 Ao
S(>.) [P�>.)
-In >..In Ai
0
lT(>.) = 0 -In
I(d-l}n
>..In Ad-2
0 0 -In Ad-1 + >.Ad
Note that, if we solve the linearized problem using the QZ algorithm, then O((dn)3)
flops are required.
Problems
P7.7.1 Suppose A and B are in Rnxn and that
UTBV = [ D 0 ] r
0 0 n-r '
u = [ U1 I U2 ] ,
r n-r
r n-r
V = [ Vi l V2 ] ,
r n-r
is the SVD of B, where D is r-by-r and r = rank(B). Show that if >.(A, B) = <C then U[AV2 is
singular.
P7.7.2 Suppose A and B are in Rnxn_ Give an algorithm for computing orthogonal Q and Z such
that QTAZ is upper Hessenberg and zTBQ is upper triangular.

P7.7.3 Suppose
[ Bu
and B =
0
with A11, B11 E Rkxk and A22, B22 E R!Xi. Under what circumstances do there exist
X = [ ;
80 that y-iAx and y-1 BX are both block diagonal? This is the generalized Sylvester equation
problem. Specify an algorithm for the case when Au, A22, Bu, and B22 are upper triangular. See
Kiigstrom (1994).
P7.7.4 Suppose µ r/. >.(A, B). Relate the eigenvalues and eigenvectors of Ai = (A - µB)-iA and
Bi = (A - µB)-iB to the generalized eigenvalues and eigenvectors of A - >.B.
P7.7.5 What does the generalized Schur decomposition say about the pencil A - >.AT? Hint: If
T E Rnxn is upper triangular, then EnTEn is lower triangular where En is the exchange permutation
defined in § 1.2.11.
P7.7.6 Prove that
[A, + AA,
Li (>.) =
-
f
are linearizations of
A2 Ai
0 0
- In 0
0 - In �·i
0 '
0
[A, + AA,
A2
L2(>.) =
A1
Ao
P(>.) = Ao + >.A1 + >.2A2 + >.3A3 + >.4A4.
-In 0
0 - In
0 0
0 0
Specify the >.-matrix transformations that relate diag(P(>.), hn) to both Li (>.) and L2(>.).
0
l
0
-In
0
For background to the generalized eigenvalue problem we recommend Stewart(IMC), Stewart and Sun
(MPT), and Watkins (MEP) and:
B. Kagstrom and A. Ruhe (1983). Matrix Pencils, Proceedings Pite Havsbad, 1982, Lecture Notes
in Mathematics Vol. 973, Springer-Verlag, New York.
QZ-related papers include:
C.B. Moler and G.W. Stewart (1973). "An Algorithm for Generalized Matrix Eigenvalue Problems,"
L. Kaufman (1974). "The LZ Algorithm to Solve the Generalized Eigenvalue Problem," SIAM J.
Numer. Anal. 11, 997-1024.
R.C. Ward (1975). "The Combination Shift QZ Algorithm," SIAM J. Numer. Anal. 12, 835-853.
C.F. Van Loan (1975). "A General Matrix Eigenvalue Algorithm," SIAM J. Numer. Anal. 12,
819-834.
L. Kaufman (1977). "Some Thoughts on the QZ Algorithm for Solving the Generalized Eigenvalue
Problem," A CM Trans. Math. Softw. 3, 65 -75.
R.C. Ward (1981). "Balancing the Generalized Eigenvalue Problem," SIAM J. Sci. Stat. Comput.
2, 141-152.
P. Van Dooren (1982). "Algorithm 590: DSUBSP and EXCHQZ: Fortran Routines for Computing
Deflating Subspaces with Specified Spectrum," ACM Trans. Math. Softw. 8, 376-382.
K. Dackland and B. Kagstrom (1999). "Blocked Algorithms and Software for Reduction of a Regular
Matrix Pair to Generalized Schur Form," A CM Trans. Math. Softw. 25, 425-454.
D.S. Watkins (2000). "Performance of the QZ Algorithm in the Presence of Infinite Eigenvalues,"
B. Kiigstrom, D. Kressner, E.S. Quintana-Orti, and G. Quintana-Orti (2008). "Blocked Algorithms
for the Reduction to Hessenberg-Triangular Form Revisited," BIT 48, 563-584.
Many algorithmic ideas associated with the A - >.I problem extend to the A - >.B problem:
A. Jennings and M.R. Osborne (1977). "Generalized Eigenvalue Problems for Certain Unsymmetric
Band Matrices," Lin. Alg. Applic. 29, 139-150.

V.N. Kublanovskaya (1984). "AB Algorithm and Its Modifications for the Spectral Problem of Linear
Pencils of Matrices," Nu.mer. Math. 43, 329-342.
Z. Bai, J. Demmel, and M. Gu (1997). "An Inverse Free Parallel Spectral Divide and Conquer
Algorithm for Nonsymmetric Eigenproblems," Numer. Math. 76, 279-308.
G.H. Golub and Q. Ye (2000). "Inexact Inverse Iteration for Generalized Eigenvalue Problems," BIT
40, 671-684.
F. Tisseur (2001). "Newton's Method in Floating Point Arithmetic and Iterative Refinement of Gen
eralized Eigenvalue Problems," SIAM J. Matrix Anal. Applic. 22, 1038--1057.
D. Lemonnier and P. Van Dooren (2006). "Balancing Regular Matrix Pencils," SIAM J. Matrix Anal.
Applic. 28, 253-263.
R. Granat, B. Kagstrom, and D. Kressner (2007). "Computing Periodic Deflating Subspaces Associ
ated with a Specified Set of Eigenvalues," BIT 4 7, 763-791.
The perturbation theory for the generalized eigenvalue problem is treated in:
G.W. Stewart (1972). "On the Sensitivity of the Eigenvalue Problem Ax = >.Bx," SIAM J. Numer.
Anal. 9, 669-686.
G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with Certain Eigen
G.W. Stewart (1975). "Gershgorin Theory for the Generalized Eigenvalue Problem Ax = >.Bx," Math.
Comput. 29, 600-606.
A. Pokrzywa (1986). "On Perturbations and the Equivalence Orbit of a Matrix Pencil," Lin. Alg.
Applic. 82, 99-121.
J. Sun (1995). "Perturbation Bounds for the Generalized Schur Decomposition," SIAM J. Matrix
Anal. Applic. 16, 1328-1340.
R. Bhatia and R.-C. Li (1996). "On Perturbations of Matrix Pencils with Real Spectra. II," Math.
Comput. 65, 637-645.
J.-P. Dedieu (1997). "Condition Operators, Condition Numbers, and Condition Number Theorem for
the Generalized Eigenvalue Problem," Lin. Alg. Applic. 263, 1-24.
D.J. Higham and N.J. Higham (1998). "Structured Backward Error and Condition of Generalized
Eigenvalue Problems," SIAM J. Matrix Anal. Applic. 20, 493-512.
R. Byers, C. He, and V. Mehrmann (1998). "Where is the Nearest Non-Regular Pencil?," Lin. Alg.
Applic. 285, 81-105.
V. Frayss and V. Toumazou (1998). "A Note on the Normwise Perturbation Theory for the Regular
Generalized Eigenproblem," Numer. Lin. Alg. 5, 1-10.
R.-C. Li (2003). "On Perturbations of Matrix Pencils with Real Spectra, A Revisit," Math. Comput.
72, 715-728.
S. Bora and V. Mehrmann (2006). "Linear Perturbation Theory for Structured Matrix Pencils Arising
in Control Theory," SIAM J. Matrix Anal. Applic. 28, 148-169.
X.S. Chen (2007). "On Perturbation Bounds of Generalized Eigenvalues for Diagonalizable Pairs,"
Numer. Math. 107, 79-86.
The Kronecker structure of the pencil A - >.B is analogous to Jordan structure of A - >.I and it can
provide useful information about the underlying application. Papers concerned with this important
decomposition include:
J.H. Wilkinson (1978). "Linear Differential Equations and Kronecker's Canonical Form," in Recent
Advances in Numerical Analysis, C. de Boor and G.H. Golub (eds.), Academic Press, New York,
231--265.
J.H. Wilkinson (1979). "Kronecker's Canonical Form and the QZ Algorithm," Lin. Alg. Applic. 28,
285-303.
P. Van Dooren (1979). "The Computation of Kronecker's Canonical Form of a Singular Pencil," Lin.
Alg. Applic. 27, 103-140.
J.W. Demmel (1983). ''The Condition Number of Equivalence Transformations that Block Diagonalize
Matrix Pencils," SIAM J. Numer. Anal. 20, 599-610.
J.W. Demmel and B. Kagstrom (1987). "Computing Stable Eigendecompositions of Matrix Pencils,"
Linear Alg. Applic. 88/89, 139-186.
B. Kagstrom (1985). "The Generalized Singular Value Decomposition and the General A - >.B Pro'l>
lem," BIT 24, 568-583.
B. Kagstrom (1986). "RGSVD: An Algorithm for Computing the Kronecker Structure and Reducing
Subspaces of Singular A - >.B Pencils," SIAM J. Sci. Stat. Comput. 7, 185-211.

J. Demmel and B. Kiigstrom (1986). "Stably Computing the Kronecker Structure and Reducing
Subspaces of Singular Pencils A - >.B for Uncertain Data," in Large Scale Eigenvalue Problems, J.
Cullum and R.A. Willoughby (eds.), North-Holland, Amsterdam.
T. Beelen and P. Van Dooren (1988). "An Improved Algorithm for the Computation of Kronecker's
Canonical Form of a Singular Pencil," Lin. Alg. Applic. 105, 9-65.
E. Elmroth and B. Kiigstrom(1996). "The Set of 2-by-3 Matrix Pencils - Kronecker Structures and
Their Transitions under Perturbations," SIAM J. Matri:i; Anal. Applic. 1 7, 1-34.
A. Edelman, E. Elmroth, and B. Kiigstrom (1997). "A Geometric Approach to Perturbation Theory
of Matrices and Matrix Pencils Part I: Versa! Defformations," SIAM J. Matri:i; Anal. Applic. 18,
653-692.
E. Elmroth, P. Johansson, and B. Kiigstrom (2001). "Computation and Presentation of Graphs
Displaying Closure Hierarchies of Jordan and Kronecker Structures," Nv.m. Lin. Alg. 8, 381-399.
Just as the Schur decomposition can be used to solve the Sylvester equation problem A 1 X - XA2 = B,
the generalized Schur decomposition can be used to solve the generalized Sylvester equation problem
where matrices X and Y are sought so that A 1 X - YA2 = Bi and AaX - YA4 = B2, see:
W. Enright and S. Serbin (1978). "A Note on the Efficient Solution of Matrix Pencil Systems," BIT
18, 276-81.
B. Kagstrom and L. Westin (1989). "Generalized Schur Methods with Condition Estimators for
Solving the Generalized Sylvester Equation," IEEE '.Ihlns. Autom. Contr. AC-34, 745-751.
B . Kagstrom (1994). "A Perturbation Analysis ofthe Generalized Sylvester Equation (AR-LB, DR
LE} = (C, F}," SIAM J. Matri:i; Anal. Applic. 15, 1045-1060.
J.-G. Sun (1996}. "Perturbation Analysis of System Hessenberg and Hessenberg-Triangular Forms,"
Lin. Alg. Applic. 241-3, 811-849.
B. Kagstrom and P. Poromaa (1996}. "LAPACK-style Algorithms and Software for Solving the Gen
eralized Sylvester Equation and Estimating the Separation Between Regular Matrix Pairs," ACM
'.lhlns. Math. Softw. 22, 78-103.
I. Jonsson and B. Kagstrom (2002). "Recursive Blocked Algorithms for Solving Triangular Systems
Part II: Two-sided and Generalized Sylvester and Lyapunov Matrix Equations," A CM '.Ihlns.
Math. Softw. 28, 416-435.
R. Granat and B. Kagstrom (2010). "Parallel Solvers for Sylvester-Type Matrix Equations with
Applications in Condition Estimation, Part I: Theory and Algorithms," ACM '.Ihlns. Math. Softw.
37, Article 32.
Rectangular generalized eigenvalue problems also arise. In this setting the goal is to reduce the rank
of A - >.B, see:
G.W. Stewart (1994). "Perturbation Theory for Rectangular Matrix Pencils," Lin. Alg. Applic.
208/209, 297-301.
G. Boutry, M. Elad, G.H. Golub, and P. Milanfar (2005). "The Generalized Eigenvalue Problem for
Nonsquare Pencil'! Using a Minimal Perturbation Approach," SIAM J. Matri:i; Anal. Applic. 27,
582-601.
D . Chu and G.H. Golub (2006). "On a Generalized Eigenvalue Problem for Nonsquare Pencils," SIAM
J. Matri:i; Anal. Applic. 28, 770-787.
References for the polynomial eigenvalue problem include:
P. Lancaster (1966). Lambda-Matrices and Vibrating Systems, Pergamon Press, Oxford, U.K.
I. Gohberg, P. Lancaster, and L. Rodman (1982). Matri:I; Polynomials, Academic Press, New York.
F. Tisseur (2000}. "Backward Error and Condition of Polynomial Eigenvalue Problems," Lin. Alg.
Applic. 309, 339-361.
J.-P. Dedieu and F. Tisseur (2003). "Perturbation Theory for Homogeneous Polynomial Eigenvalue
Problems," Lin. Alg. Applic. 358, 71-94.
N.J. Higham, D.S. Mackey, and F. Tisseur (2006). "The Conditioning of Linearizations of Matrix
Polynomials," SIAM J. Matri:i; Anal. Applic. 28, 1005-1028.
D.S. Mackey, N. Mackey, C. Mehl, V. Mehrmann (2006). "Vector Spaces of Linearizations for Matrix
Polynomials,'' SIAM J. Matri:i; Anal. Applic. 28, 971-1004.
The structured quadratic eigenvalue problem is discussed briefly in §8.7.9.

7.8 Hamiltonian and Product Eigenvalue Problems
Two structured unsymmetric eigenvalue problems are considered. The Hamiltonian
matrix eigenvalue problem comes with its own special Schur decomposition. Orthogonal
symplectic similarity transformations are used to bring about the required reduction.
The product eigenvalue problem involves computing the eigenvalues of a product like
A1A21A3 without actually forming the product or the designated inverses. For detailed
background to these problems, sec Kressner (NMGS) and Watkins (MEP).
7.8.1 Hamiltonian Matrix Eigenproblems
Hamiltonian and symplectic matrices are introduced in §1.3.10. Their 2-by-2 block
structure provide a nice framework for practicing block matrix manipulation, see Pl.3.2
and P2.5.4. We now describe some interesting eigenvalue problems that involve these
matrices. For a given n, we define the matrix J E R2nx2n by
and proceed to work with the families of 2-by-2 block structured matrices that are
displayed in Figure 7.8.1. We mention four important facts concerning these matrices.
Family Definition What They Look Like
JM = (JM)T
[; -�Tl G symmetric
Hamiltonian M =
F symmetric
Skew
JN = -(JN)T N = [; �] G skew-symmetric
Hamiltonian F skew-symmetric
[Sn S12 l S'fiS21 symmetric
Symplectic JS = s-TJ S = S�S12 symmetric
S21 S22 S'fi.S22 = I + SfiS12
Orthogonal
JQ = QJ Q = [ Qi Q2
l QfQ2 symmetric
Symplectic -Q2 Q1 I = QfQ1 + QrQ2
Figure 7.8.1. Hamiltonian and symplectic structures
(1) Symplectic similarity transformations preserve Hamiltonian structure:

7.8. Hamiltonian and Product Eigenvalue Problems
(2) The square of a Hamiltonian matrix is skew-Hamiltonian:
JM2 = (JMJT)(JM) = -JvJY(JMf = -M2TJT = -(.JM2f.
(3) If M is a Hamiltonian matrix and >. E >.(M), then ->. E >.(M):
(4) If S is symplectic and >. E >.(S), then 1/>. E >.(S):
8r[ v ] .!. [ v ] .
-u >. -u
421
Symplectic versions of Householder and Givens transformations have a promi
nanent role to play in Hamiltonian matrix computations. If P = In - 2vvT is a
Householder matrix, then diag(P, P) is a symplectic orthogonal matrix. Likewise, if
G E JR2nx2n is a Givens rotation that involves planes i and i+n, then G is a symplectic
orthogonal matrix. Combinations of these transformations can be used to introduce
zeros. For example, a Householder-Givens-Householder sequence can do this:
x x x x
x x x 0
x x x 0
x diag(P1 ,P1 ) x Gl .5 x diag(P2,P2) 0
x --+ x --+ 0 --+ 0
x 0 0 0
x 0 0 0
x 0 0 0
This kind of vector reduction can be sequenced to produce a constructive proof
of a structured Schur decomposition for Hamiltonian matrices. Suppose >. is a real
eigenvalue of a Hamiltonian matrix lv
l and that x E JR2n is a unit 2-norm vector with
Mx = >.x. If Q1 E JR2nx2n is an orthogonal symplectic matrix and Qfx = e1 , then it
follows from (QfMQ1)(Qfx) = >.(Qfx) that
>. x x x x x x x
0 x x x x x x x
0 x x x x x x x
QfMQ1
0 x x x x x x x
=
0 0 0 0 ->. 0 0 0
0 x x x x x x x
0 x x x x x x x
0 x x x x x x x
The "extra" zeros follow from the Hamiltonian structure of QfMQ1. The process can
be repeated on the 6-by-6 Hamiltonian submatrix defined by rows and columns 2-3-4-
6-7-8. Together with the assumption that M has no purely imaginary eigenvalues, it
is possible to show that an orthogonal symplectic matrix Q exists so that

(7.8.1)
where T E 1Rnxn is upper quasi-triangular. This is the real Hamiltonian-Schur de
composition. See Paige and Van Loan (1981) and, for a more general version, Lin,
Mehrmann, and Xu (1999).
One reason that the Hamiltonian eigenvalue problem is so important is its con
nection to the algebraic Ricatti equation
(7.8.2)
This quadratic matrix problem arises in optimal control and a symmetric solution is
sought so that the eigenvalues of A - FX are in the open left half plane. Modest
assumptions typically ensure that M has no eigenvalues on the imaginary axis and
that the matrix Q1 in (7.8.1) is nonsingular. If we compare (2,1) blocks in (7.8.1), then
QfAQ1 - QfFQ2 + QfGQ1 + QfATQ2 = 0.
It follows from In = Q[Q1 + QfQ2 that X = Q2Q11 is symmetric and that it satisfies
(7.8.2). From (7.8.1) it is easy to show that A - FX = Q1TQ11 and so the eigen
values of A - FX are the eigenvalues of T. It follows that the desired solution to the
algebraic Ricatti equation can be obtained by computing the real Hamiltonian-Schur
decomposition and ordering the eigenvalues so that A(T) is in the left half plane.
How might the real Hamiltonian-Schur form be computed? One idea is to reduce
M to some condensed Hamiltonian form and then devise a structure-preserving QR
iteration. Regarding the former task, it is easy to compute an orthogonal symplectic
Uo so that
LloTMUo = [H
D
R
l
-HT
(7.8.3)
where H E 1Rnxn is upper Hessenberg and D is diagonal. Unfortunately, a structure
preserving QR iteration that maintains this condensed form has yet to be devised. This
impasse prompts consideration of methods that involve the skew-Hamiltonian matrix
N = M2. Because the (2,1) block of a skew-Hamiltonian matrix is skew-symmetric,
it has a zero diagonal. Symplectic similarity transforms preserve skew-Hamiltonian
structure, and it is straightforward to compute an orthogonal symplectic matrix Vo
such that
T 2 [H R
l
Vo M Vo =
0 HT , (7.8.4)
where H is upper Hessenberg. If UTHU = T is the real Schur form of H and and
Q = Vo · diag(U, U), then
= [T UTRU
]
0 TT

7.8. Hamiltonian and Product Eigenvalue Problems 423
is the real skew-Hamiltonian Schurform. See Van Loan (1984). It does not follow that
QTMQ is in Schur-Hamiltonian form. Moreover, the quality of the computed small
eigenvalues is not good because of the explicit squaring of M. However, these shortfalls
can be overcome in an efficient numerically sound way, see Chu, Lie, and Mehrmann
(2007) and the references therein. Kressner (NMSE, p. 175-208) and Watkins (MEP,
p. 319-341) have in-depth treatments of the Hamiltonian eigenvalue problem.
7. 8. 2 Product Eigenvalue Problems
Using SVD and QZ, wecan compute the eigenvalues ofATA and B-1A without forming
products or inverses. The intelligent computation of the Hamiltonian-Schur decompo
sition involves a correspondingly careful handling of the product M-times-M. In this
subsection we further develop this theme by discussing various product decompositions.
Here is an example that suggests how we might compute the Hessenberg decomposition
of
where A1, A2, A3 E JR"'xn. Instead of forming this product explicitly, we compute or
thogonal U1, U2, U3 E nexn such that
It follows that
UJA2U2 T2
UfA1U1 = T1
(upper Hessenberg),
(upper triangular),
(upper triangular).
(7.8.5)
is upper Hessenberg. A procedure for doing this would start by computing the QR
factorizations
If A3 = A3Q3, then A = A3R2R1 • The next phase involves reducing A3 to Hessenberg
form with Givens transformations coupled with "bulge chasing" to preserve the trian
gular structures already obtained. The process is similar to the reduction of A - >..B
to Hessenbcrg-triangular form; sec §7.7.4.
Now suppose we want to compute the real Schur form of A
QfA3Q3 = T3
QIA2Q2 = T2
QfA1Q1 = T1
(upper quasi-triangular),
(upper triangular),
(upper triangular),
(7.8.6)
where Q1, Q2, Q3 E R,nxn are orthogonal. Without loss of generality we may assume
that {A3, A2, Ai} is in Hessenberg-triangular-triangular form. Analogous to the QZ
iteration, the next phase is to produce a sequence of converging triplets
(7.8.7)
with the property that all the iterates are in Hessenberg-triangular-triangular form.

Product decompositions (7.8.5) and (7.8.6) can be framed as structured decom
positions of block-cyclic 3-by-3 matrices. For example, if
then we have the following restatement of (7.8.5):
Consider the zero-nonzero structure of this matrix for the case n = 4:
0 0 0 0 0 () () () x x x x
0 0 0 0 0 0 0 0 x x x x
0 0 () 0 0 () () () 0 x x x
0 () 0 0 0 0 0 0 0 0 x x
x x x x 0 0 0 0 0 () () ()
iI
0 x x x () () () () 0 0 0 0
0 0 x x 0 0 0 () () 0 () ()
0 0 0 x () () 0 0 0 0 0 0
0 0 0 0 x x x x 0 0 0 0
0 0 0 0 0 x x x 0 0 0 0
0 0 0 0 0 0 x x 0 0 0 0
0 0 0 0 0 0 0 x 0 0 0 ()
Using the perfect shuffle P34 (see §1.2.11) we also have
0 0 x 0 0 x () 0 x () 0 x
x 0 () x () 0 x 0 0 x 0 0
0 x 0 0 x 0 0 x () () x ()
0 0 x 0 0 x () () x 0 () x
() 0 () x () 0 x 0 0 x 0 0
0 0 0 0 x () 0 x () () x ()
0 0 0 0 0 x () 0 x () 0 x
0 0 0 0 0 0 x 0 0 x 0 ()
0 0 0 0 0 0 0 x 0 0 x 0
0 () 0 () 0 () () () x () 0 x
() 0 0 0 0 0 0 0 0 x () ()
0 () () () 0 () () () () 0 x 0
Note that this is a highly structured 12-by-12 upper Hessenberg matrix. This con
nection makes it possible to regard the product-QR iteration as a structure-preserving

7.8. Hamiltonian and Product Eigenvalue Problems 425
QR iteration. For a detailed discussion about this connection and its implications for
both analysis and computation, sec Kressner (NMSE, pp. 146-174) and Watkins(MEP,
pp. 293-303). We mention that with the "technology" that has been developed, it is
possible to solve product eigenvalue problems where the factor matrices that define A
are rectangular. Square nonsingular factors can also participate through their inverses,
e.g., A = A3A21Ai.
Problems
P7.8.l What can you say about the eigenvalues and eigenvectors of a symplectic matrix?
P7.8.2 Suppose S1 , S2 E Rnxn arc both skew-symmetric and let A = S1 S2. Show that the nonzero
eigenvalues of A are not simple. How would you compute these eigenvalues?
P7.8.3 Relate the eigenvalues and eigenvectors of
A - [ �
- 0
A4
Ai
0
0
0
to the eigenvalues and eigenvectors of A = A1 A2A3A4 . Assume that the diagonal blocks are square.
The books by Kressner(NMSE) and Watkins (MEP) have chapters on product eigenvalue problems
and Hamiltonian eigenvalue problems. The sometimes bewildering network of interconnections that
exist among various structured classes of matrices is clarified in:
A. Bunse-Gerstner, R. Byers, and V. Mehrmann (1992). "A Chart of Numerical Methods for Struc
tured Eigenvalue Problems,'' SIAM J. Matrix Anal. Applic. 13, 419-453.
Papers concerned with the Hamiltonian Schur decomposition include:
A.J. Laub and K. Meyer (1974). "Canonical Forms for Symplectic and Hamiltonian Matrices,'' J.
Celestial Mechanics 9, 213-238.
C.C. Paige and C. Van Loan (1981). "A Schur Decomposition for Hamiltonian Matrices,'' Lin. Alg.
Applic. 41, 11-32. •
V. Mehrmann (1991). Autonomous Linear Quadratic Contml Pmblems, Theory and Numerical So
lution, Lecture Notes in Control and Information Sciences No. 163, Springer-Verlag, Heidelberg.
W.-W. Lin, V. Mehrmann, and H. Xu (1999). "Canonical Forms for Hamiltonian and Symplectic
Matrices and Pencils,'' Lin. Alg. Applic. 302/303, 469-533.
Various methods for Hamiltonian eigenvalue problems have been devised that exploit the rich under
lying structure, see:
C. Van Loan (1984). "A Symplectic Method for Approximating All the Eigenvalues of a Hamiltonian
Matrix," Lin. Alg. Applic. 61, 233-252.
R. Byers (1986) "A Hamiltonian QR Algorithm," SIAM J. Sci. Stat. Comput. 7, 212-229.
P. Benner, R. Byers, and E. Barth (2000). "Algorithm 800: Fortran 77 Subroutines for Computing
the Eigenvalues of Hamiltonian Matrices. I: the Square-Reduced Method," ACM Trans. Math.
Softw. 26, 49-77.
H. Fassbender, D.S. Mackey and N. Mackey (2001). "Hamilton and Jacobi Come Full Circle: Jacobi
Algorithms for Structured Hamiltonian Eigenproblems," Lin. Alg. Applic. 332-4, 37-80.
D.S. Watkins (2006). "On the Reduction of a Hamiltonian Matrix to Hamiltonian Schur Form,''
ETNA 23, 141-157.
D.S. Watkins (2004). "On Hamiltonian and Symplectic Lanczos Processes," Lin. Alg. Applic. 385,
23-45.
D. Chu, X. Liu, and V. Mehrmann (2007). "A Numerical Method for Computing the Hamiltonian
Schur Form," Numer. Math. 105, 375-412.
Generalized eigenvalue problems that involve Hamiltonian matrices also arise:
P. Benner, V. Mehrmann, and H. Xu (1998). "A Numerically Stable, Structure Preserving Method
for Computing the Eigenvalues of Real Hamiltonian or Symplectic Pencils," Numer. Math. 78,
329-358.

C. Mehl (2000). "Condensed Forms for Skew-Hamiltonian/Hamiltonian Pencils," SIAM J. Matrix
Anal. Applic. 21, 454-476.
V. Mehrmann and D.S. Watkins (2001). "Structure-Preserving Methods for Computing Eigenpairs
of Large Sparse Skew-Hamiltonian/Hamiltonian Pencils,'' SIAM J. Sci. Comput. 22, 1905-1925.
P. Benner and R. Byers, V. Mehrmann, and H. Xu (2002). "Numerical Computation of Deflating
Subspaces of Skew-Hamiltonian/Hamiltonian Pencils," SIAM J. Matrix Anal. Applic. 24, 165-
190.
Methods for symplectic eigenvalue problems are discussed in:
P. Benner, H. Fassbender and D.S. Watkins (1999). "SR and SZ Algorithms for the Symplectic
(Butterfly) Eigenproblem," Lin. Alg. Applic. 287, 41-76.
The Golub-Kahan SYD algorithm that we discuss in the next chapter does not form ATA or AAT
despite the rich connection to the Schurdecompositionsofthose matrices. From that point on there has
been an appreciation for the numerical dangers associated with explicit products. Here is a sampling
of the literature:
C. Van Loan (1975). "A General Matrix Eigenvalue Algorithm,'' SIAM J. Numer. Anal. 12, 819-834.
M.T. Heath, A.J. Laub, C.C. Paige, and R.C. Ward (1986). "Computing the SYD of a Product of
Two Matrices," SIAM J. Sci. Stat. Comput. 7, 1147-1159.
R. Mathias (1998). "Analysis of Algorithms for Orthogonalizing Products of Unitary Matrices,'' Num.
Lin. Alg. 3, 125--145.
G. Golub, K. Solna, and P. Van Dooren (2000). "Computing the SYD of a General Matrix Prod
uct/Quotient," SIAM J. Matrix Anal. Applic. 22, 1-19.
D.S. Watkins (2005). "Product Eigenvalue Problems," SIAM Review 4 7, 3-40.
R. Granat and B. Kgstrom (2006). "Direct Eigenvalue Reordering in a Product of Matrices in Periodic
Schur Form,'' SIAM J. Matrix Anal. Applic. 28, 285-300.
Finally we mention that there is a substantial body of work concerned with structured error analysis
and structured perturbation theory for structured matrix problems, see:
F. Tisseur (2003). "A Chart of Backward Errors for Singly and Doubly Structured Eigenvalue Prob
lems,'' SIAM J. Matrix Anal. Applic. 24, 877-897.
R. Byers and D. Kressner (2006). "Structured Condition Numbers for Invariant Subspaces," SIAM J.
M. Karow, D. Kressner, and F. Tisseur (2006). "Structured Eigenvalue Condition Numbers," SIAM
7.9 Pseudospectra
If the purpose of computing is insight, then it is easy to see why the well-conditioned
eigenvector basis is such a valued commodity, for in many matrix problems, replace
ment of A with its diagonalization x-1AX leads to powerful, analytic simplifications.
However, the insight-through-eigensystem paradigm has diminished impact in problems
where the matrix of eigenvectors is ill-conditioned or nonexistent. Intelligent invariant
subspace computation as discussed in §7.6 is one way to address the shortfall; pseu
dospectra are another. In this brief section we discuss the essential ideas behind the
theory and computation of pseudospectra. The central message is simple: if you are
working with a nonnormal matrix, then a graphical pseudospectral analysis effectively
tells you just how much to trust the eigenvalue/eigenvector "story."
A slightly awkward feature of our presentation has to do with the positioning
of this section in the text. As we will see, SVD calculations are an essential part of
the pseudospectra scene and we do not detail dense matrix algorithms for that im
portant decomposition until the next chapter. However, it makes sense to introduce
the pseudospectra concept here at the end of Chapter 7 while the challenges of the

7.9. Pseudospectra 427
unsymmetric eigenvalue problem are fresh in mind. Moreover, with this "early" foun
dation we can subsequently present various pseudospectra insights that concern the
behavior of the matrix exponential (§9.3), the Arnoldi method for sparse unsymmetric
eigenvalue problems (§10.5), and the GMRES method for sparse unsymmetric linear
systems (§11.4).
For maximum generality, we investigate the pseudospectra of complex, non
normal matrices. The definitive pseudospectra reference is 'frefethen and Embree
(SAP). Virtually everything we discuss is presented in greater detail in that excellent
volume.
7.9.1 Motivation
In many settings, the eigenvalues of a matrix "say something" about an underlying
phenomenon. For example, if
A = M > 0,
then
lim II Ak 112 = 0
k-+oo
if and only if !Ai l < 1 and IA2I < 1. This follows from Lemma 7.3.1, a result that
we needed to establish the convergence of the QR iteration. Applied to our 2-by-2
example, the lemma can be used to show that
II Ak 112 :s
M (p(A) + t:)k
E
for any E > 0 where p(A) = max{IA1 I, IA2I} is the spectral radius. By making E small
enough in this inequality, we can draw a conclusion about the asymptotic behavior of
Ak:
If p(A) < 1, then asymptotically Ak converges to zero as p(A)k. (7.9.1)
However, while the eigenvalues adequately predict the limiting behavior of II Ak 112 ,
they do not (by themselves) tell us much about what is happening if k is small. Indeed,
if A1 -:/; A2, then using the diagonalization
A = [� M/(��-�i) ] [:' �' ] [� M/(��-�i)r
we can show that
[Ak M�Ak- l-i
Ai ]
Ak _
1 L.., 1 2
- i=O ·
0 A�
(7.9.2)
(7.9.3)
Consideration of the (1,2) entry suggests that Ak may grow before decay sets in. This
is affirmed in Figure 7.9.1 where the size of II Ak 112 is tracked for the example
A =
[0.999 1000 l·
0.0 0.998

2.5
2
< 1 .5
0.5
1 000 2000 3000 4000 5000
k
Figure 7.9.1. II Ak 112 can grow even ifp(A) < 1
Thus, it is perhaps better to augment (7.9.1) as follows:
If p(A) < 1, then aymptotically Ak converges to zero like p(A)k.
However, Ak may grow substantially before exponential decay sets in.
(7.9.4)
This example with its ill-conditioned eigenvector matrix displayed in (7.9.2), points
to just why classical eigenvalue analysis is not so informative for nonnormal matrices.
Ill-conditioned eigenvector bases create a discrepancy between how A behaves and how
its diagonalization XAx-1 behaves. Pseudospcctra analysis and computation narrow
this gap.
7.9.2 Definitions
The pseudospectra idea is a generalization of the eigenvalue idea. Whereas the spec
trum A(A) is the set of all z E <C that make O'min(A - >.I) zero, the E-pseudospectrum
of a matrix A E <Cnxn is the subset of the complex plane defined by
Ae(A) = {z E <C : O'min(A - >.I) � f.} · (7.9. 5)
If A E Ae(A), then A is an E-pseudoeigenvalue of A. A unit 2-norm vector v that satisfies
II (A - >.I)v 112 = f. is a corresponding f.-pseudoeigenvector. Note that if f. is zero, then
Ae(A) is just the set of A's eigenvalues, i.e., Ao(A) = A(A).
We mention that because oftheir interest in what pseudospectra say about general
linear operators, Trefethen and Embree (2005) use a strict inequality in the definition
(7.9.5). The distinction has no impact in the matrix case.

Equivalent definitions of AE(· ) include
(7.9.6)
which highlights the resolvent (zl - A)-1 and
(7.9.7)
which characterize pseudspectra as (traditional) eigenvalues of nearby matrices. The
equivalence of these three definitions is a straightforward verification that makes use
of Chapter 2 facts about singular values, 2-norms, and matrix inverses. We mention
that greater generality can be achieved in (7.9.6) and (7.9.7) by replacing the 2-norm
with an arbitrary matrix norm.
7.9.3 Display
The pseudospectrum of a matrix is a visible subset of the complex plane so graphical
display has a critical role to play in pseudospectra analysis. The MATLAB-based Eigtool
system developed by Wright(2002) can be used to produce pseudospectra plots that
are as pleasing to the eye as they are informative. Eigtool's pseudospectra plots are
contour plots where each contour displays the z-values associated with a specified value
of f. Since
fl � f2 :::? A.1 � AE2
the typical pseudospectral plot is basically a topographical map that depicts the func
tion f(z) = CTmin(zl - A) in the vicinity of the eigenvalues.
We present three Eigtool-produced plots that serve as illuminating examples. The
first involves the n-by-n Kahan matrix Kahn(s), e.g.,
1 -c -c -c -c
0 s -SC -SC -SC
Kahs(s) = 0 0 s2 -s2c -s2c c2 + s2 = 1.
0 0 0 s3 -s3c
0 0 0 0 s4
Recall that we used these matrices in §5.4.3 to show that QR with column pivoting
can fail to detect rank deficiency. The eigenvalues {1, s, s2, . . . , sn-l} of Kahn(s) are
extremely sensitive to perturbation. This is revealed by considering the f = 10-5
contour that is displayed in Figure 7.9.2 together with A(Kahn(s)).
The second example is the Demmel matrix Demn(/3), e.g.,
1 /3 132 /33 /34
0 1 /3 132 /33
Dems(/3) = 0 0 1 /3 132
0 0 0 1 /3
0 0 0 0 1

430
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-0.5 0
0.5
Figure 7.9.2. A€(Kah30(s)) with s29 = 0.1 and contours for € = 10-2, . . • , 10-6
The matrix Demn({3) is defective and has the property that very small perturbations
can move an original eigenvalue to a position that are relatively far out on the imaginary
axis. See Figure 7.9.3. The example is used to illuminate the nearness-to-instability
problem presented in P7.9.13.
1 0
8
6
4
2
0
-2
-4
-6
-8
-5 0 5 1 0
Figure 7.9.3. A€(Demso(f3)) with {349 = 108 and contours for € = 10-2, • • • , 10-6

The last example concerns the pseudospectra ofthe MATLAB "Gallery(5)" matrix:
-9 11 -21 63 -252
70 -69 141 -421 1684
Gs = -575 575 -1149 3451 -13801
3891 -3891 7782 -23345 93365
1024 -1024 2048 -6144 24572
Notice in Figure 7.9.4 that A10-ia.5 (Gs) has five components. In general, it can be
0.06
0.04
0.02
0
-0.02
-0.04
-0.06
-o. oe ��-��-��-�
-0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06
Figure 7.9.4. Af(Gs) with contours for e = 10-11.s, 10-12, . . . , 10-l3.s, 10-14
shown that each connected component of A.(A) contains at least one eigenvalue of A.
7.9.4 Some Elementary Properties
Pseudospectra are subsets of the complex plane so we start with a quick summary of
notation. If S1 and S2 are subsets of the complex plane, then their sum S1 + S2 is
defined by
S1 + S2 = {s : s = s1 + s2, s1 E Si, s2 E S2 }.
If 81 consists of a single complex number a, then we write a + 82. If 8 is a subset of
the complex plane and /3 is a complex number, then /3·S is defined by
/3·8 = { /3z : z E S }.
The disk of radius e centered at the origin is denoted by
A. = {z : lzl � f }.
Finally, the distance from a complex number z0 to a set of complex numbers S is
defined by
dist(zo, 8) = min{ lzo - z I : z E S }.

Our first result is about the effect of translation and scaling. For eigenvalues we
have
A(ad + ,BA) = a + ,B·A(A).
The following theorem establishes an analogous result for pseudospectra.
Theorem 7.9.1. If a, ,B E <C and A E <Cnxn, then A,1131 (od + ,BA) = a + ,B·A,(A).
Proof. Note that
and
A,(aI + A) { z : 11 (zI - (al + A))-1 11 ? l/E }
{ z : 11 ((z - a)I - A)-1 11 ? l/f. }
a + { z - a : II ((z - a)I - A)-1 II ? l/E }
a + { z : II (zl - A)-1 II ? l/f. } = A,(A)
A,1131 (,B · A) = { z : II (zl - ,BA)-1 II ? l/l,Blf. }
{ z : II (z/,B)I - A)-1 11 ? l/f. }
,B· { z/,B : II (z/,B)I - A)-1 II ? l/f. }
,B· { z : II zl - A)-1 11 ? l/E } = ,B·A,(A).
The theorem readily follows by composing these two results. D
General similarity transforms preserve eigenvalues but not E-pseudoeigenvalues. How
ever, a simple inclusion property holds in the pseudospectra case.
Theorem 7.9.2. If B = x-1AX, then A,(B) � A,"2(x)(A).
Proof. If z E A,(B), then
� ::; II (zl - B)-1 II = II x-1(zl - A)-1x-1 II < !i2(X)ll (zI - A)-1 11,
f.
from which the theorem follows. D
Corollary 7.9.3. If X E <Cnxn is unitary and A E <Cnxn
, then A,(x-1AX) = A,(A).
Proof. The proof is left as an exercise. D
The E-pseudospectrum of a diagonal matrix is the union of €-disks.
Theorem 7.9.4. If D = diag(>.1 , . . . , >.n), then A,(D) = {>.1 , . . . , >.n} + �• .

7.9. Pseudospectra
Corollary 7.9.5. If A E <Cnxn is normal, then A,(A) = A(A) + D., .
433
Proof. Since A is normal, it has a diagonal Schur form QHAQ = diag(A1 , . . . , An) = D
with unitary Q. The proof follows from Theorem 7.9.4. D
If T = (Tij) is a 2-by-2 block triangular matrix, then A(T) = A(T11) U A(T22). Here is
the pseudospectral analog:
Theorem 7.9.6. If
T =
[T11 T12 l
0 T22
with square diagonal blocks, then A,(T11 ) U A,(T22) � A,(T).
Corollary 7.9.7. If
T =
[T11 0 l
0 T22
with square diagonal blocks, then A,(T) = A,(T11) U A,(T22).
The last property in our gallery of facts connects the resolvant (z0I - A)-1 to the
distance that separates z0 from A,(A).
Theorem 7.9.8. If Zo E <C and A E ccnxn, then
1
dist(zo, A,(A)) �
II (zoI - A)-1 112
- f.
Proof. For any z E A,(A) we have from Corollary 2.4.4 and (7.9.6) that
E � O'rnin (zI - A) = O'min ((zoI - A) - (z - zo)I) � O'min(zoI - A) - lz - zol
and thus
1
lz - zol �
ll (zoI - A)-1 11
- f.
The proof is completed by minimizing over all z E A,(A). D
7. 9. 5 Computing Pseudospectra
The production of a pseudospectral contour plot such as those displayed above requires
sufficiently accurate approximations of O'min (zI-A) on a grid that consists of (perhaps)

lOOO's of z-values. As we will see in §8.6, the computation of the complete SVD of an
n-by-n dense matrix is an O(n3) endeavor. Fortunately, steps can be taken to reduce
each grid point calculation to O(n2) or less by exploiting the following ideas:
1. Avoid SVD-type computations in regions where am;n(zl - A) is slowly varying.
See Gallestey (1998).
2. Exploit Theorem 7.9.6 by ordering the eigenvalues so that the invariant subspace
associated with A(T11) captures the essential behavior of (zl - A)-1 . See Reddy,
Schmid, and Henningson (1993).
3. Precompute the Schur decomposition QHAQ = T and apply a am;,, algorithm
that is efficient for triangular matrices. See Lui (1997).
We offer a few comments on the last strategy since it has much in common with the
condition estimation problem that we discussed in §3.5.4. The starting point is to
recognize that since Q is unitary,
The triangular structure of the transformed problem makes it possible to obtain a
satisfactory estimate of amin(zl - A) in O(n2) flops. If d is a unit 2-norm vector and
(zl - T)y = d, then it follows from the SVD of zl - T that
1
am;n(zl - T) � hl.
Let Um;n be a left singular vector associated with andn(zl - T). If d is has a significant
component in the direction of Um;n, then
Recall that Algorithm 3.5.1 is a cheap heuristic procedure that dynamically determines
the right hand side vector d so that the solution to a given triangular system is large
in norm. This is tantamount to choosing d so that it is rich in the direction of Um;n. A
complex arithmetic, 2-norm variant of Algorithm 3.5.1 is outlined in P7.9.13. It can be
applied to zl - T. The resulting d-vector can be refined using inverse iteration ideas,
see Toh and Trefethen (1996) and §8.2.2. Other approaches are discussed by Wright
and Trefethen (2001).
7.9.6 Computing the E-Pseudospectral Abscissa and Radius
The €-pseudospectral abscissa of a matrix A E <Cnxn is the rightmost point on the
boundary of AE:
aE(A) = max Re(z). (7.9.8)
zEA. (A)
Likewise, the €-pseudospectral radius is the point of largest magnitude on the boundary
of AE:

7.9. Pseudospectra
p,(A) = max lzl.
zEA, (A)
435
(7.9.9)
These quantities arise in the analysis of dynamical systems and effective iterative algo
rithms for their estimation have been proposed by Burke, Lewis, and Overton (2003)
and Mengi and Overton (2005). A complete presentation and analysis of their very
clever optimization procedures, which build on the work of Byers (1988), is beyond the
scope of the text. However, at their core they involve interesting intersection problems
that can be reformulated as structured eigenvalue problems. For example, if i · r is an
eigenvalue of the matrix
-El l
ie-i6A ' (7.9.10)
then E is a singular value of A - re
i6I. To see this, observe that if
then
(A - rei6I)H (A - re
i6I)g = E2g.
The complex version of the SVD (§2.4.4) says that E is a singular value of A - re16I.
It can be shown that if irmax is the largest pure imaginary eigenvalue of M, then
This result can be used to compute the intersection of the ray { re
i6 : R ;::: 0 } and the
boundary of A,(A). This computation is at the heart ofcomputing the E-pseudospectral
radius. See Mengi and Overton (2005).
7.9.7 Matrix Powers and the E-Pseudospectral Radius
At the start of this section we used the example
[0.999 1000 l
A =
0.000 0.998
to show that II Ak 112 can grow even though p(A) < 1. This kind of transient behavior
can be anticipated by the pseudospectral radius. Indeed, it can be shown that for any
f > 0,
sup II Ak 112 2::
p,(A) - 1
·
k�O f
(7.9.11)
See Trefethen and Embree (SAP, pp. 160-161). This says that transient growth will
occur if there is a contour {z:ll ( llzl - A)-1 = l/E } that extends beyond the unit disk.
For the above 2-by-2 example, if E = 10-8, then p,(A) � 1.0017 and the inequality
(7.9.11) says that for some k, II Ak 112 2:: 1.7 x 105. This is consistent with what is
displayed in Figure 7.9.1.

436
Problems
P7.9.1 Show that the definitions (7.9.5), (7.9.6), and (7.9.7) are equivalent.
P7.9.2 Prove Corollary 7.9.3.
P7.9.3 Prove Theorem 7.9.4.
P7.9.4 Prove Theorem 7.9.6.
P7.9.5 Prove Corollary 7.9.7.
P7.9.6 Show that if A, E E a:;nxn, then A,(A + E) � A•+llEll2 (A).
P7.9.7 Suppose O'm;n(z1 I - A) = fl and O'm;0(z2/ - A) = €2· Prove that there exists a real number µ,
so that if Z3 = (1 - µ,)z1 + µ,z2, then O'm;u (z3/ - A) = (€! + €2)/2?
P7.9.B Suppose A E a:;nxn is normal and E E a:;nxn is nonnormal. State and prove a theorem about
A,(A + E).
P7.9.9 Explain the connection between Theorem 7.9.2 and the Bauer-Fike Theorem (Theorem 7.2.2).
P7.9.10 Define the matrix J E R2nx2n by
J =
[ -�n
In ]
0 .
(a) The matrix H E R2nx2n is a Hamiltonian matrix if JTHJ = -HT. It is easy to show that if H
is Hamiltonian and >. E A(H), then ->. E A(H). Does it follow that if >. E A,(H), then ->. E A,(H)?
(b) The matrix S E R2nx2n is a symplectic matrix if JTSJ = s-T. It is easy to show that if S is
symplectic and >. E A(S), then 1/>. E A(S). Does it follow that if >. E A,(S), then 1/>. E A,(S)?
P7.9.ll Unsymmetric Toeplitz matrices tend to have very ill-conditioned eigensystems and thus have
interesting pseudospectral properties. Suppose
1
A
[:
0
0
O< 0
(a) Construct a diagonal matrix S so that s-1AS = B is symmetric and tridiagonal with l's on its
subdiagonal and superdiagonal. (b) What can you say about the condition of A's eigenvector matrix?
P7.9.12 A matrix A E a:;nxn is stable if all of its eigenvalues have negative real parts. Consider the
problem of minimizing II E 1'2 subject to the constraint that A + E has an eigenvalue on the imaginary
axis. Explain why this optimization problem is equivalent to minimizing Um;,, (irl - A) over all r E R.
If E. is a minimizing E, then II E 1'2 can be regarded as measure of A's nearness to instability. What
is the connection between A's nearness to instability and o,(A)?
P7.9.13 This problem is about the cheap estimation of the minimum singular value of a matrix, a
critical computation that is performed over an over again during the course of displaying the pseu
dospectrum of a matrix. In light of the discussion in §7.9.5, the challenge is to estimate the smallest
singular value of an upper triangular matrix U = T - zl where T is the Schur form of A E Rnxn. The
condition estimation ideas of §3.5.4 are relevant. We want to determine a unit 2-norm vector d E q:n
such that the solution to Uy = d has a large 2-norm for then O'n,;n (U) � 1/11 y 1'2· (a) Suppose
U = [ U�l �: ] y = [ : ] d =
where u 1 1 , T E <C, u, z,d1 E a:;n-l , U1 E <C(n-l) x (n-l)' II di 112 = 1, U1y1 = d1, and c2 + s2 = 1.
Give an algorithm that determines c and s so that if Uy = d, then II y 112 is as large as possible. Hint:
This is a 2-by-2 SVD problem. (b) Using part (a), develop a nonrecursive method for estimating
O'rn;n(U(k:n, k:n)) for k = n: - 1:1.
Notes and References for §7. 7
Besides Trefethen and Embree (SAP), the following papers provide a nice introduction to the pseu
dospectra idea:

M. Embree and L.N. Trefethen (2001). "Generalizing Eigenvalue Theorems to Pseudospectra Theo-
rems," SIAM J. Sci. Comput. 23, 583-590.
L.N. Trefethen (1997). "Pseudospectra of Linear Operators," SIAM Review 39, 383-406.
For more details concerning the computation and display of pseudoeigenvalues, see:
s.C. Reddy, P.J. Schmid, and D.S. Henningson (1993). "Pseudospectra of the Orr-Sommerfeld Oper
ator," SIAM J. Applic. Math. 53, 15-47.
s.-H. Lui (1997). "Computation of Pseudospectra by Continuation,'' SIAM J. Sci. Comput. 18,
565-573.
E. Gallestey (1998). "Computing Spectral Value Sets Using the Subharmonicity of the Norm of
Rational Matrices,'' BIT, 38, 22-33.
L.N. Trefethen (1999). "Computation of Pseudospectra," Acta Numerica 8, 247-295.
T.G. Wright (2002). Eigtool, http://www.comlab.ox.ac.uk/pseudospectra/eigtool/.
Interesting extensions/generalizations/applications of the pseudospectra idea include:
L. Reichel and L.N. Trefethen (1992). "Eigenvalues and Pseudo-Eigenvalues of Toeplitz Matrices,''
Lin. Alg. Applic. 164-164, 153-185.
K-C. Toh and L.N. Trefethen (1994). "Pseudozeros of Polynomials and Pseudospectra of Companion
Matrices," Numer. Math. 68, 403-425.
F. Kittaneh (1995). "Singular Values of Companion Matrices and Bounds on Zeros of Polynomials,''
N.J. Higham and F. Tisseur (2000). "A Block Algorithm for Matrix 1-Norm Estimation, with an
Application to 1-Norm Pseudospectra,'' SIAM J. Matrix Anal. Applic. 21, 1185-1201.
T.G. Wright and L.N. Trefethen (2002). "Pseudospectra of Rectangular matrices," IMA J. Numer.
Anal. 22, 501-·519.
R. Alam and S. Bora (2005). "On Stable Eigendecompositions of Matrices,'' SIAM J. Matrix Anal.
Applic. 26, 830-848.
Pseudospectra papers that relate to the notions ofcontrollability and stability oflinear systems include:
J.V. Burke and A.S. Lewis. and M.L. Overton (2003). "Optimization and Pseudospectra, with
Applications to Robust Stability," SIAM J. Matrix Anal. Applic. 25, 80-104.
J.V. Burke, A.S. Lewis, and M.L. Overton (2003). "Robust Stability and a Criss-Cross Algorithm for
Pseudospectra," IMA J. Numer. Anal. 23, 359-375.
J.V. Burke, A.S. Lewis and M.L. Overton (2004). "Pseudospectral Components and the Distance to
Uncontrollability," SIAM J. Matrix Anal. Applic. 26, 350-361.
The following papers are concerned with the computation of the numerical radius, spectral radius,
and field of values:
C. He and G.A. Watson (1997). "An Algorithm for Computing the Numerical Radius," IMA J.
Numer. Anal. 17, 329-342.
G.A. Watson (1996). "Computing the Numerical Radius" Lin. Alg. Applic. 234, 163-172.
T. Braconnier and N.J. Higham (1996). "Computing the Field of Values and Pseudospectra Using the
Lanczos Method with Continuation," BIT 36, 422-440.
E. Mengi and M.L. Overton (2005). "Algorithms for the Computation of the Pseudospectral Radius
and the Numerical Radius of a Matrix," IMA J. Numer. Anal. 25, 648-669.
N. Guglielmi and M. Overton (2011). "Fast Algorithms for the Approximation of the Pseudospectral
Abscissa and Pseudospectral Radius of a Matrix," SIAM J. Matrix Anal. Applic. 32, 1166-1192.
For more insight into the behavior of matrix powers, see:
P. Henrici (1962). "Bounds for Iterates, Inverses, Spectral Variation, and Fields of Values of Non
normal Matrices," Numer. Math.4, 24-40.
J. Descloux (1963). "Bounds for the Spectral Norm of Functions of Matrices," Numer. Math. 5,
185-90.
T. Ransford (2007). "On Pseudospectra and Power Growth,'' SIAM J. Matrix Anal. Applic. 29,
699-711.
As an example of what pseudospectra can tell us about highly structured matrices, see:
L. Reichel and L.N. Trefethen (1992). "Eigenvalues and Pseudo-eigenvalues of Toeplitz Matrices,''
Lin. Alg. Applic. 162/163/164, 153-186.

Chapter 8
Symmetric Eigenvalue
Problems
8 . 1 Properties and Decompositions
8.3 The Symmetric QR Algorithm
8.4 More Methods for Tridiagonal Problems
8.5 Jacobi Methods
8.6 Computing the SVD
8.7 Generalized Eigenvalue Problems with Symmetry
The symmetric eigenvalue problem with its rich mathematical structure is one of
the most aesthetically pleasing problems in numerical linear algebra. We begin with a
brief discussion of the mathematical properties that underlie the algorithms that follow.
In §8.2 and §8.3 we develop various power iterations and eventually focus on the sym
metric QR algorithm. Methods for the important case when the matrix is tridiagonal
are covered in §8.4. These include the method of bisection and a divide and conquer
technique. In §8.5 we discuss Jacobi's method, one of the earliest matrix algorithms to
appear in the literature. This technique is of interest because it is amenable to parallel
computation and because of its interesting high-accuracy properties. The computa
tion of the singular value decomposition is detailed in §8.6. The central algorithm is a
variant of the symmetric QR iteration that works on bidiagonal matrices.
In §8.7 we discuss the generalized eigenvalue problem Ax = >..Bx for the impor
tant case when A is symmetric and B is symmetric positive definite. The generalized
singular value decomposition ATAx = µ2BTBx is also covered. The section concludes
with a brief examination of the quadratic eigenvalue problem (>..2M + >..C + K)x = 0
in the presence of symmetry, skew-symmetry, and definiteness.
Reading Notes
Knowledge of Chapters 1-3 and §5.1-§5.2 are assumed. Within this chapter there
are the following dependencies:
439

440
§8.4
t
Chapter 8. Symmetric Eigenvalue Problems
§8.1 "'""* §8.2 "'""* §8.3 "'""* §8.6 "'""* §8.7
.!.
§8.5
Many of the algorithms and theorems in this chapter have unsymmetric counterparts
in Chapter 7. However, except for a few concepts and definitions, our treatment of the
symmetric eigenproblem can be studied before reading Chapter 7.
Complementary references include Wilkinson (AEP), Stewart (MAE), Parlett
(SEP), and Stewart and Sun (MPA).
8.1 Properties and Decompositions
In this section we summarize the mathematics required to develop and analyze algo
rithms for the symmetric eigenvalue problem.
8.1.1 Eigenvalues and Eigenvectors
Symmetry guarantees that all of A's eigenvalues are real and that there is an orthonor
mal basis of eigenvectors.
Theorem 8.1.1 (Symmetric Schur Decomposition). If A E JR.nxn is symmetric,
then there exists a real orthogonal Q such that
QTAQ = A = diag(>.i , . . . , >.n)·
Moreover, for k = l:n, AQ(:, k) = >.kQ(:, k). Compare with Theorem 7. 1.3.
Proof. Suppose >.i E >.(A) and that x E ccn is a unit 2-norm eigenvector with Ax =
>.ix. Since >.1 = xHAx = xHAHx = xHAx = >.1 it follows that >.i E JR.. Thus,
we may assume that x E JR.n. Let Pi E JR.nxn be a Householder matrix such that
P'{x = ei = In(:, 1). It follows from Ax = >.1x that (P'{APi)ei = >.ei. This says that
the first column of P'{APi is a multiple of e1. But since P'{AP1 is symmetric, it must
have the form
T [A1 Q
l
Pi APi =
0 Ai
where Ai E JR,(n-l)x{n-i) is symmetric. By induction we may assume that there is
an orthogonal Qi E JR,(n-i)x(n-l) such that QfA1Qi = Ai is diagonal. The theorem
follows by setting
Q = P1 [1 0
l and A -
[>.i 0 l
0 Q1
-
0 Ai
and comparing columns in the matrix equation AQ = QA. 0
For a symmetric matrix A we shall use the notation >.k(A) to designate the kth largest
eigenvalue, i.e.,

It follows from the orthogonal invariance of the 2-norm that A has singular values
{J,1(A)J, . . . , J,n(A)J} and
II A 112 = max{ IA1(A)I , l,n(A)I }.
The eigenvalues of a symmetric matrix have a minimax characterization that
revolves around the quadratic form xTAx/xTx.
Theorem 8.1.2 (Courant-Fischer Minimax Theorem). If A E JR.nxn is symmet
ric, then
yTAy
max min
dim(S)=k O#yES yTY
for k = l:n.
Proof. Let QTAQ = diag(,i) be the Schur decomposition with ,k
Q = [ qi J . . · J qn ] . Define
sk = span{q1, . . . , qk},
the invariant subspace associated with Ai, . . . , ,k· It is easy to show that
yTAy
max min >
dim(S)=k O#yES yTy
To establish the reverse inequality, let S be any k-dimensional subspace and note
that it must intersect span{qk, . . . , qn}, a subspace that has dimension n - k + 1. If
y* = akqk + · · · + O:nqn is in this intersection, then
Since this inequality holds for all k-dimensional subspaces,
max
dim(S)=k
thereby completing the proof of the theorem. D
Note that if A E JR.nxn is symmetric positive definite, then An(A) > 0.
8.1.2 Eigenvalue Sensitivity
An important solution framework for the symmetric eigenproblem involves the pro
duction of a sequence of orthogonal transformations {Qk} with the property that the
matrices QfAQk are progressively "more diagonal." The question naturally arises,
how well do the diagonal elements of a matrix approximate its eigenvalues?

442 Chapter 8. Symmetric Eigenvalue Problems
Theorem 8.1.3 (Gershgorin). Suppose A E 1Rnxn is symmetric and that Q E 1Rnxn
is orthogonal. If QTAQ = D + F where D = diag(d1, . • • , dn) and F has zero diagonal
entries, then
where ri
n
>.(A) � u [di - Ti, di + Ti]
i=l
n
L l/ijl for i = l:n. Compare with Theorem 7.2. 1.
j=l
Proof. Suppose >. E >.(A) and assume without loss of generality that >. =I- di for
i = l:n. Since (D - >.I) + F is singular, it follows from Lemma 2.3.3 that
for some k, 1 :::; k :::; n. But this implies that >. E [dk - rk, dk + rk]· D
The next results show that if A is perturbed by a symmetric matrix E, then its
eigenvalues do not move by more than 11 E llF·
Theorem 8.1.4 (Wielandt-Hoffman). If A and A + E are n-by-n symmetric ma
trices, then
n
L (>.i(A + E) - Ai(A))2 :::; II E II! .
i=l
Proof. See Wilkinson (AEP, pp. 104-108), Stewart and Sun (MPT, pp. 189-191), or
Lax (1997, pp. 134-136). D
Theorem 8.1.5. If A and A + E are n-by-n symmetric matrices, then
k = l:n.
Proof. This follows from the minimax characterization. For details see Wilkinson
(AEP, pp. 101-102) or Stewart and Sun (MPT, p. 203). D
Corollary 8.1.6. If A and A + E are n-by-n symmetric matrices, then
for k = l:n.
Proof. Observe that
for k = l:n. D

A pair of additional perturbation results that are important follow from the minimax
property.
Theorem 8.1.7 {Interlacing Property). If A E Rnxn is symmetric and Ar =
A(l:r, l:r), then
Ar+i(Ar+l) $ Ar(Ar) $ Ar(Ar+l) $ · · · $ A2(Ar+1) $ A1(Ar) $ Ai(Ar+l)
for r = l:n - 1.
Proof. Wilkinson (AEP, pp. 103-104). D
Theorem 8.1.8. Suppose B = A + TCCT where A E Rnxn is symmetric, c E Rn has
unit 2-norm, and T E R.. IfT � 0, then
while if T $ 0 then
i = 2:n,
i = l:n- 1 .
In either case, there exist nonnegative m1 , . . . ,mn such that
i = l:n
with mi + · · · + mn = 1.
Proof. Wilkinson (AEP, pp. 94-97). See also P8.1.8. D
8.1.3 Invariant Subspaces
If S � Rn and x E S ::::} Ax E S, then S is an invariant subspace for A E Rnxn.
Note that if x E Ris an eigenvector for A, then S = span{x} is !-dimensional invariant
subspace. Invariant subspaces serve to "take apart" the eigenvalue problem and figure
heavily in many solution frameworks. The following theorem explains why.
Theorem 8.1.9. Suppose A E Rnxn is symmetric and that
r n-r
is orthogonal. If ran(Qi) is an invariant subspace, then
QTAQ = D = [ Di 0 ] r
0 D2 n-r
r n-r
and >.(A) = >.(Di) U .>.(D2). Compare with Lemma 7.1.2.
(8.1.1)

Proof. If
QTAQ = [ D1 Efi ],
E21 D2
then from AQ = QD we have AQ1 - QiD1 = Q2E21. Since ran(Q1) is invariant, the
columns of Q2E21 are also in ran(Qi) and therefore perpendicular to the columns of
Q2. Thus,
0 = Qf(AQ1 - QiD1) = QfQ2E21 = E21·
and so (8.1.1) holds. It is easy to show
det(A - ).Jn) = det(QTAQ - ).Jn) det(D1 - )..Jr)·det(D2 - >.In-r)
confirming that >.(A) = >.(D1) U >.(D2). D
The sensitivity to perturbation of an invariant subspace depends upon the sep
aration of the associated eigenvalues from the rest of the spectrum. The appropriate
measure of separation between the eigenvalues of two symmetric matrices B and C is
given by
sep(B, C) min I>. - µI.
.>.E.>.(B)
µE.>.(C)
With this definition we have the following result.
(8.1.2)
Theorem 8.1.10. Suppose A and A + E are n-by-n symmetric matrices and that
r n-r
is an orthogonal matrix such that ran(Q1) is an invariant subspace for A. Partition
the matrices QTAQ and QTEQ as follows:
r n-r
II E llF
� sep(D;, D2)
'
then there exists a matrix P E JR(n-r)xr with
4
(D D ) II E21 llF
sep 1, 2
r n- r
such that the columns ofQ1 = (Q1 + Q2P)(I + pTP)-1/2 define an orthonormal basis
for a subspace that is invariant for A + E. Compare with Theorem 7.2.4.
Proof. This result is a slight adaptation of Theorem 4.11 in Stewart (1973). The
matrix (I + pTP)-112 is the inverse of the square root of I + pTP. See §4.2.4. D

8.1. Properties and Decompositions
Corollary 8.1.11. If the conditions of the theorem hold, then
' 4
dist( ran(Qi), ran(Qi)) ::;; (D D ) II E2i llF·
sep i, 2
Compare with Corollary 7.2.5.
Proof. It can be shown using the SVD that
II P(I + pTP)-if2 112 ::;; II p 112 < II p llF·
Since QfQi = P(I + pTP)-1!2 it follows that
dist(ran(Qi), ran(Qi)) = 11 QfQi 112 = 11 P(I + pHP)-if2 112
::;; II P 112 ::;; 411 E21 llF/sep(Di , D2)
completing the proof. 0
445
(8.1.3)
Thus, the reciprocal of sep(Di, D2) can be thought of as a condition number that
measures the sensitivity of ran(Q1 ) as an invariant subspace.
The effect of perturbations on a single eigenvector is sufficiently important that
we specialize the above results to this case.
Theorem 8.1.12. Suppose A and A + E are n-by-n symmetric matrices and that
Q = [ q1 I Q2 l
1 n-i
is an orthogonal matrix such that Qi is an eigenvector for A. Partition the matrices
QTAQ and QTEQ as follows:
If
1 n-i
d = min IA - /ti > 0
µE >.(D2 )
and
then there exists p E JRn-l satisfying
ll E llF
d
<
5'
i n-1
such that q1 = (qi +Q2p)/Jl + pTp is a unit 2-norm eigenvectorfor A+E. Moreover,

Compare with Corollary 7.2.6.
Proof. Apply Theorem 8.1.10 and Corollary 8.1.11 with r = 1 and observe that if
D1 = (A), then d = sep(Di, D2). D
8.1.4 Approximate Invariant Subspaces
If the columns of Q1 E Rnxr are independent and the residual matrix R = AQ1 - Q1S
is small for some S E R'"xr, then the columns of Q1 define an approximate invariant
subspace. Let us discover what we can say about the eigensystem of A when in the
possession of such a matrix.
Theorem 8.1.13. Suppose A E Rnxn and S E R'"xr are symmetric and that
where Qi E Rnxr satisfies QfQi = Ir· Then there exist µi, . . . , µr E A(A) such that
for k = l:r.
Proof. Let Q2 E Rnx(n-r) be any matrix such that Q = [ Q1 I Q2 J is orthogonal. It
follows that
B + E
and so by using Corollary 8.1.6 we have IAk(A) - Ak(B)I :::; II E 112 for k = l:n. Since
A(S) � A(B), there exist µi, . . . , µr E A(A) such that lµk - Ak(S)I :::; 11 E 112 for
k = l:r. The theorem follows by noting that for any x E Rr and y E Rn-r we have
from which we readily conclude that 11 E 112 :::; v'211 E1 112· D
The eigenvalue bounds in Theorem 8.1.13 depend on II AQ1 - Q1S 112. Given
A and Qi, the following theorem indicates how to choose S so that this quantity is
minimized in the Frobenius norm.
Theorem 8.1.14. IfA E Rnxn is symmetric andQi E Rnxr has orthonormal columns,
then
min
and S = QfAQ1 is the minimizer.

proof. Let Q2 E Rnx(n-r) be such that Q = [ Qi. Q2 ] is orthogonal. For any
S E m;xr we have
11 AQi - Qis 11! = 11 QTAQi - QTQis 11! = 11 QfAQi - s 11! + 11 QrAQi 11!.
Clearly, the minimizing S is given by S = QfAQi. D
This result enables us to associate any r-dimensional subspace ran(Qi), with a set of r
"optimal" eigenvalue-eigenvector approximates.
Theorem 8.1.15.
QfQi = Ir. If
Suppose A E Rnxn is symmetric and that Qi E Rnxr satisfies
zT(QfAQi)Z = diag(t]i, . . . , Or) = D
is the Schur decomposition of QfAQi and QiZ = [ Yi I · · · I Yr ] , then
for k = l:r.
Proof. It is easy to show that
Ayk - OkYk = AQiZek - QiZDek = (AQi - Qi(QfAQi))Zek.
The theorem follows by taking norms. D
In Theorem 8.1.15, the Ok are called Ritz values, the Yk are called Ritz vectors, and the
(Ok,Yk) are called Ritz pairs.
The usefulness of Theorem 8.1.13 is enhanced if we weaken the assumption that
the columns of Qi are orthonormal. As can be expected, the bounds deteriorate with
the loss of orthogonality.
AXi - XiS = Fi,
where Xi E Rnxr and S = X'{AXi. If
II X[Xi - Ir 112 = r < 1,
then there exist µi, . . . , µr E A(A) such that
for k = l:r.
Proof. For any Q E Rnxr with orthonormal columns, define Ei E Rnxr by
Ei = AQ - QS.
It follows that
(8.1.4)

and so
{8.1.5)
Note that
{8.1.6)
Let UTX1V = E = diag(ui , . . . , ur) be the thin SVD of X1. It follows from {8.1.4)
that
and thus 1 - u� = T. This implies
II Q- X1 lb = II U(Ir - E)VT 112 = II Ir - E 112 = 1 - Ur � 1 - U� = T. {8.1.7)
The theorem is established by substituting {8.1.6) and {8.1.7) into (8.1.5) and using
Theorem 8.1.13. 0
8.1.5 The Law of Inertia
The inertia of a symmetric matrix A is a triplet of nonnegative integers (m, z, p) where
m, z, and p are respectively the numbers of negative, zero, and positive eigenvalues.
Theorem 8.1.17 (Sylvester Law of Inertia). If A E Rnxn is symmetric and
X E Rnxn is nonsingular, then A and XTAX have the same inertia.
Proof. Suppose for some r that Ar(A) > 0 and define the subspace So � Rn by
qi =/. 0,
where Aqi = .Xi(A)qi and i = l:r. From the minimax characterization of .Xr(XTAX)
we have
Since
it follows that
max
dim(S)=r
min
yE S
min
yESo
An analogous argument with the roles of A and xrAX reversed shows that

Thus, Ar(A) and Ar(XTAX) have the same sign and so we have shown that A and
XTAX have the same number of positive eigenvalues. If we apply this result to -A, we
conclude that A and xrAX have the same number of negative eigenvalues. Obviously,
the number of zero eigenvalues possessed by each matrix is also the same. D
A transformation of the form A � xrAX where X is nonsingular is called a conguence
transformation. Thus, a congruence transformation of a symmetric matrix preserves
inertia.
Problems
PB.1.1 Without using any of the results in this section, show that the eigenvalues ofa 2-by-2 symmetric
matrix must be real.
PB.1.2 Compute the Schur decomposition of A = [ ; � ].
PB.1.3 Show that the eigenvalues of a Hermitian matrix (AH = A) are real. For each theorem and
corollary in this section, state and prove the corresponding result for Hermitian matrices. Which
results have analogs when A is skew-symmetric? Hint: If AT = -A, then iA is Hermitian.
PB.1.4 Show that if x E R" x r
, r :5 n, and II xTx - 1 112 = T < 1, then O"min(X) � 1 - T.
PB.1.5 Suppose A, E E R" x n are symmetric and consider the Schur decomposition A + tE = QDQT
where we assume that Q = Q(t) and D = D(t) are continuously differentiable functions of t E R. Show
that D(t) = diag(Q(t)TEQ(t)) where the matrix on the right is the diagonal part of Q(t)TEQ(t).
Establish the Wielandt-Hoffman theorem by integrating both sides of this equation from 0 to 1 and
taking Frobenius norms to show that
11 D(l) - D(O) llF :5 1111 diag(Q(t)TEQ(t) llF dt :5 II E llF"
PB.1.6 Prove Theorem 8.1.5.
PB.1.7 Prove Theorem 8.1.7.
PB.1.8 Prove Theorem 8.1.8 using the fact that the trace of a square matrix is the sum of its eigen
values.
PB.1.9 Show that if B E R'n x m and C E Rn x n are symmetric, then sep(B, C) = min II BX - XC llF
where the min is taken over all matrices X E � x n.
PB.1.10 Prove the inequality (8.1.3).
PB.1.11 Suppose A E nn x n is symmetric and C E Rn x r has full column rank and assume that r « n.
By using Theorem 8.1.8 relate the eigenvalues of A + CCT to the eigenvalues of A.
PB.1.12 Give an algorithm for computing the solution to
min II A - S ll1-· .
rank(S) = 1
S = sr
Note that if S E Rn x n is a symmetric rank-1 matrix then either S = vvT or S = -vvT for some
v e Rn .
PB.1.13 Give an algorithm for computing the solution to
min II A - S llF .
rank(S) = 2
8 = -ST
PB.1.14 Give an example of a real 3-by-3 normal matrix with integer entries that is neither orthogonal,
symmetric, nor skew-symmetric.

The perturbation theory for the symmetric eigenproblem is surveyed in Wilkinson (AEP, Chap. 2),
Parlett (SEP, Chaps. 10 and 11), and Stewart and Sun (MPT, Chaps. 4 and 5). Some representative
papers in this well-researched area include:
G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with Certain Eigen-
C.C. Paige (1974). "Eigenvalues of Perturbed Hermitian Matrices," Lin. Alg. Applic. 8, 1-10.
W. Kahan (1975). "Spectra of Nearly Hermitian Matrices," Proc. AMS 48, 11-17.
A. Schonhage (1979). "Arbitrary Perturbations of Hermitian Matrices," Lin. Alg. Applic. 24, 143-49.
D.S. Scott (1985). "On the Accuracy of the Gershgorin Circle Theorem for Bounding the Spread of a
Real Symmetric Matrix," Lin. Alg. Applic. 65, 147-155
J.-G. Sun (1995). "A Note on Backward Error Perturbations for the Hermitian Eigenvalue Problem,"
BIT 35, 385-393.
Z. Drma.C (1996). On Relative Residual Bounds for the Eigenvalues of a Hermitian Matrix," Lin. Alg.
Applic. 244, 155-163.
Z. Drma.C and V. Hari (1997). "Relative Residual Bounds For The Eigenvalues of a Hermitian Semidef
inite Matrix," SIAM J. Matrix Anal. Applic. 18, 21-29.
R.-C. Li (1998). "Relative Perturbation Theory: I. Eigenvalue and Singular Value Variations," SIAM
R.-C. Li (1998). "Relative Perturbation Theory: II. Eigenspace and Singular Subspace Variations,"
F.M. Dopico, J. Moro and J.M. Molera (2000). "Weyl-Type Relative Perturbation Bounds for Eigen
systems of Hermitian Matrices," Lin. Alg. Applic. 309, 3-18.
J.L. Barlow and I. Slapnicar (2000). "Optimal Perturbation Bounds for the Hermitian Eigenvalue
Problem," Lin. Alg. Applic. 309, 19-43.
N. Truhar and R.-C. Li (2003). "A sin(29) Theorem for Graded Indefinite Hermitian Matrices," Lin.
Alg. Applic. 359, 263-276.
W. Li and W. Sun (2004). "The Perturbation Bounds for Eigenvalues of Normal Matrices," Num.
Lin. Alg. 12, 89-94.
C.-K. Li and R.-C. Li (2005). "A Note on Eigenvalues of Perturbed Hermitian Matrices," Lin. Alg.
Applic. 395, 183-190.
N. Truhar (2006). "Relative Residual Bounds for Eigenvalues of Hermitian Matrices," SIAM J. Matrix
Anal. Applic. 28, 949-960.
An elementary proof of the Wielandt-Hoffman theorem is given in:
P. Lax (1997). Linear Algebra, Wiley-lnterscience, New York.
For connections to optimization and differential equations, see:
P. Deift, T. Nanda, and C. Tomei (1983). "Ordinary Differential Equations and the Symmetric
Eigenvalue Problem," SIAM J. Nu.mer. Anal. 20, 1-22.
M.L. Overton (1988). "Minimizing the Maximum Eigenvalue of a Symmetric Matrix," SIAM J. Matrix
T. Kollo and H. Neudecker (1997). "The Derivative of an Orthogonal Matrix of Eigenvectors of a
Symmetric Matrix," Lin. Alg. Applic. 264, 489-493.
Assume that A E 1Rnxn is symmetric and that U0 E 1Rnxn is orthogonal. Consider the
following QR iteration:
To = UJ'AUo
for k = 1, 2, . . .
end
Tk-1 = UkRk
Tk = RkUk

Since Tk = RkUk = U'{(UkRk)Uk = U'{Tk-1Uk it follows by induction that
Tk = (UoU1 · · · Uk)TA(UoU1 · · · Uk)·
451
(8.2.2)
Thus, each Tk is orthogonally similar to A. Moreover, the Tk almost always converge
to diagonal form and so it can be said that (8.2.1) almost always converges to a Schur
decomposition of A. In order to establish this remarkable result we first consider the
power method and the method of orthogonal iteration.
8.2.1 The Power Method
Given a unit 2-norm q<0> E Rn, the power method produces a sequence of vectors q(k)
as follows:
for k = 1, 2, . . .
z(k) = Aq(k-1)
end
q(k) = z(k)/II z(k) 112
A(k) = [q(k)]TAq(k)
(8.2.3)
If q<0> is not "deficient" and A's eigenvalue of maximum modulus is unique, then the
q(k) converge to an eigenvector.
QTAQ = diag(Ai . . . . , An)
where Q = [qi I · · · I qn ] is orthogonal and IA1I > IA2I � · · · � !An1- Let the vectors q(k)
be specified by {8.2.3) and define fh E [O,.rr /2] by
cos(Ok) = lqfq(k) I·
Ifcos(Oo) =F 0, then for k = 0, 1, ... we have
lsin(Ok)I < tan(Oo) I�:lk
,
I
A 12k
IA{k) _ A1 I � max IA1 - Ail tan(Oo)2
A
2
2:5i:5n 1
(8.2.4)
(8.2.5)
Proof. From the definition of the iteration, it follows that q(k) is a multiple of Akq(O)
and so

452
and
Thus,
lsin(lh )l2 1 -
i=l
Chapter 8. Symmetric Eigenvalue Problems
2 2 1
a1 + · · · + ar. = ,
=
n
"""' a�_x2k
L i i
i=2
n
"""' a2_x2k
L , i
i=l
<
n
"""' a2_x2k
L i i
i=2
a2_x2k
1 1
� n 2 (.Xi )2k
a2 Lai A1
1 i=2
< � (n 2)(_x2)2k
a2 L:a,. A
1 i=2 1
=
(.X )2k
tan(Oo)2
.X�
This proves (8.2.4). Likewise,
and so
_x(k)
n
La;_x;k (.Xi - .X1)
i=2
n
"""' a2_x2k
L i i
i=l
[q(o)fA2k+Iq(o)
[q<olfA2kq(D)
<
(.X )2k
< max I.Xi - Anl · tan(Oo)2 · / ,
2�i�n 1
i=l
n
"""' a�_x�k
L i ,
i=l
Computable error bounds for the power method can be obtained by using Theorem
8.1.13. If
II Aq(k) - _x(k)q(k) 112 = 8,
then there exists .X E .X(A) such that l.X(k) - .XI � v'2 8.

8.2.2 Inverse Iteration
If the power method (8.2.3) is applied with A replaced by (A - >.I)-1, then we obtain
the method of inverse iteration. If ).. is very close to a distinct eigenvalue of A, then
q(k) will be much richer in the corresponding eigenvector direction than its predecessor
q(Hl,
X � ta;q; }
i=l =>
Aqi = )..iqi, i = l:n
Thus, if ).. is reasonably close to a well-separated eigenvalue )..i , then inverse iteration
will produce iterates that are increasingly in the direction of qi . Note that inverse
iteration requires at each step the solution of a linear system with matrix of coefficients
A - >.I.
8.2.3 Rayleigh Quotient Iteration
Suppose A E JR.nxn is symmetric and that x is a given nonzero n-vector. A simple
differentiation reveals that
xTAx
).. = r(x) = -
T-
x x
minimizes II (A - M)x 112· (See also Theorem 8.1.14.) The scalar r(x) is called the
Rayleigh quotient of x. Clearly, if x is an approximate eigenvector, then r(x) is a
reasonable choice for the corresponding eigenvalue. Combining this idea with inverse
iteration gives rise to the Rayleigh quotient iteration where x0 -:/:- 0 is given.
for k = 0, 1, . . .
end
µk = r(xk)
Solve (A - µkl)zk+l = Xk for Zk+l
Xk+l = Zk+i/11 Zk+l 112
(8.2.6)
The Rayleigh quotient iteration almost always converges and when it does, the
rate of convergence is cubic. We demonstrate this for the case n = 2. Without loss of
generality, we may assume that A = diag(>.1, >.2), with )..1 > >.2. Denoting Xk by
it follows that µk >.1c1 + >.2s� in (8.2.6) and
A calculation shows that
(8.2.7)

From these equations it is clear that the Xk converge cubically to either span{ei} or
span{e2} provided lckl =/: !ski· Details associated with the practical implementation of
the Rayleigh quotient iteration may be found in Parlett (1974).
8.2.4 Orthogonal Iteration
A straightforward generalization of the power method can be used to compute higher
dimensional invariant subspaces. Let r be a chosen integer that satisfies 1 S r :::;
n. Given an n-by-r matrix Qo with orthonormal columns, the method of orthogonal
iteration generates a sequence of matrices {Qk} � JR.nxr as follows:
for k = 1, 2, . . .
Zk = AQk-1 (8.2.8)
(QR factorization)
Note that, if r = 1, then this is just the power method. Moreover, the sequence {Qkei}
is precisely the sequence of vectors produced by the power iteration with starting vector
qC0l = Qoe1 .
In order to analyze the behavior of (8.2.8), assume that
QTAQ = D = diag(Ai),
is a Schur decomposition of A E JR.nxn. Partition Q and D as follows:
Q = [ QQ I Q13 J
r n-r
If !Ari > IAr+il, then
r n-r
(8.2.9)
(8.2.10)
is the dominant invariant subspace of dimension r. It is the unique invariant subspace
associated with the eigenvalues Ai, . . . , Ar·
The following theorem shows that with reasonable assumptions, the subspaces
ran(Qk) generated by (8.2.8) converge to Dr(A) at a rate proportional to IAr+if>.rlk·
Theorem 8.2.2. Let the Schur decomposition of A E JR.nxn be given by (8.2.9} and
(8.2. 10} with n 2:: 2. Assume l.Xrl > l>.r+il and that dk is defined by
dk = dist(Dr(A), ran(Qk)),
If
do < 1,
then the matrices Qk generated by (8.2.8} satisfy
dk s
IAr+l lk do
Ar Ji - 4
k ;::: o.
(8.2.11)
(8.2.12)

Compare with Theorem 7.3.1.
proof. We mention at the start that the condition (8.2.11) means that no vector in
the span of Qo's columns is perpendicular to Dr(A).
Using induction it can be shown that the matrix Qk in (8.2.8) satisfies
This is a QR factorization of AkQ0 and upon substitution of the Schur decomposition
(8.2.9)-(8.2.10) we obtain
[Df 0 l [Q�Qo l
0 D� Q�Qo
If the matrices Vi and wk are defined by
then
Since
Vk = Q�Qo,
Wk = Q�Qo,
D�Vo = Vk (Rk · · · R1) ,
D�Wo = Wk (Rk · · · R1) .
[ Vk l [Q�Qk l T T
wk
=
Q�Qk
= [Qa I Q13J Qk = Q Qk,
it follows from the thin CS decomposition (Theorem 2.5.2) that
A consequence of this is that
O"min(Vo)2
= 1 - O"max(Wo)2 = 1 - d5 > 0.
(8.2.13)
(8.2.14)
It follows from (8.2.13) that the matrices Vk and (Rk · · · R1) are nonsingular. Using
both that equation and (8.2.14) we obtain
and so
Wk = D�Wo(Rk · · · R1)-1 = D�Wo(D�Vo)-1Vk = D�(WoV0-1)D!kVi
dk II wk 112 < II D� 112 · II Wo 112 · II vo-1 112 · II D!
k
112 · II vk 112
k 1 1
< IAr+i I . do .
1 - dfi
.
IArIk '
from which the theorem follows. D

8.2.5 The QR Iteration
Consider what happens if we apply the method of orthogonal iteration (8.2.8) with
r = n. Let QTAQ = diag(A1, . . . , An) be the Schur decomposition and assume
IAi l > IA2I > · · · > IAnl·
If Q = [ qi I . . · I qn ] , Qk = [ q�k) I . . · I q�k) ] , and
d. (D (A) { (o) (O)}) 1
1st i , span q1 , . . . , qi <
for i = l:n -1, then it follows from Theorem 8.2.2 that
. (k) (k) _
AH1
(I '
k
)
d1st(span{q1 , . . . , qi }, span{q1, . . . , qi}) - 0 T;
for i = l:n -1. This implies that the matrices Tk defined by
(8.2.15)
are converging to diagonal form. Thus, it can be said that the method of orthogonal
iteration computes a Schur decomposition if r = n and the original iterate Q0 E 1Rnxn
is not deficient in the sense of (8.2.11).
The QR iteration arises by considering how to compute the matrix Tk directly
from its predecessor Tk-l · On the one hand, we have from (8.2.8) and the definition
of Tk-l that
On the other hand,
Thus, Tk is determined by computing the QR factorization ofTk-1 and then multiplying
the factors together in reverse order. This is precisely what is done in (8.2.1).
Note that a single QR iteration involves O(n3) flops. Moreover, since convergence
is only linear (when it exists), it is clear that the method is a prohibitively expensive
way to compute Schur decompositions. Fortunately, these practical difficulties can be
overcome, as we show in the next section.
Problems
PB.2.1 Suppose AoE Rnxn is symmetric and positive definite and consider the following iteration:
for k =1, 2, . . .
end
Ak-1 = GkGf
Ak=GfGk
(Cholesky factorization)
(a) Show that this iteration is defined. (b) Show that if
Ao= [ � � ]

with a � c has eigenvalues Al � A2 > 0, then the Ak converge to diag(A1 , A2)·
PB.2.2 Prove (8.2.7).
PB.2.3 Suppose A E Rnxn is symmetric and define the function /:Rn+l --+ Rn+l by
I ([ � ]) = [ (x�:= �);2 ]
457
where x E Rn and A E R. Suppose x+ and A+ are produced by applying Newton's method to f at
the "current point" defined by Xe and Ac. Give expressions for x+ and A+ assuming that II Xe 112 = 1
and Ac = x'[Ax
e.
The following references are concerned with the method of orthogonal iteration, which is also known
as the method of simultaneous iteration:
G.W. Stewart (1969). "Accelerating The Orthogonal Iteration for the Eigenvalues of a Hermitian
Matrix," Numer. Math. 13, 362-376.
M. Clint and A. Jennings (1970). "The Evaluation of Eigenvalues and Eigenvectors ofReal Symmetric
Matrices by Simultaneous Iteration," Comput. J. 13, 76-80.
H. Rutishauser (1970). "Simultaneous Iteration Method for Symmetric Matrices," Numer. Math. 16,
205-223.
References for the Rayleigh quotient method include:
J. Vandergraft (1971). "Generalized Rayleigh Methods with Applications to Finding Eigenvalues of
Large Matrices," Lin. Alg. Applic. 4, 353-368.
B.N. Parlett (1974). "The Rayleigh Quotient Iteration and Some Generalizations for Nonnormal
Matrices," Math. Comput. 28, 679-693.
S. Batterson and J. Smillie (1989). "The Dynamics of Rayleigh Quotient Iteration," SIAM J. Numer.
Anal. 26, 624-636.
C. Beattie and D.W. Fox (1989). "Localization Criteria and Containment for Rayleigh Quotient
Iteration," SIAM J. Matrix Anal. Applic. 10, 80-93.
P.T.P. Tang (1994). "Dynamic Condition Estimation and Rayleigh-Ritz Approximation," SIAM J.
D. P. O'Leary and G. W. Stewart (1998). "On the Convergence of a New Rayleigh Quotient Method
with Applications to Large Eigenproblems," ETNA 7, 182-189.
J.-L. Fattebert (1998). "A Block Rayleigh Quotient Iteration with Local Quadratic Convergence,"
ETNA 7, 56-74.
Z. Jia and G.W. Stewart (2001). "An Analysis of the Rayleigh-Ritz Method for Approximating
Eigenspaces," Math. Comput. 70, 637--647.
V. Simoncini and L. Elden (2002). "Inexact Rayleigh Quotient-Type Methods for Eigenvalue Compu
tations," BIT 42, 159-182.
P.A. Absil, R. Mahony, R. Sepulchre, and P. Van Dooren (2002). "A Grassmann-Rayleigh Quotient
Iteration for Computing Invariant Subspaces," SIAM Review 44, 57-73.
Y. Notay (2003). "Convergence Analysis of Inexact Rayleigh Quotient Iteration," SIAM J. Matrix
Anal. Applic. 24, 627-644.
A. Dax (2003). "The Orthogonal Rayleigh Quotient Iteration {ORQI) method," Lin. Alg. Applic.
358, 23-43.
R.-C. Li (2004). "Accuracy of Computed Eigenvectors Via Optimizing a Rayleigh Quotient," BIT 44,
585-593.
Various Newton-type methods have also been derived for the symmetric eigenvalue problem, see:
R.A. Tapia and D.L. Whitley (1988). "The Projected Newton Method Has Order 1 + v'2 for the
Symmetric Eigenvalue Problem," SIAM J. Numer. Anal. 25, 1376-1382.
P.A. Absil, R. Sepulchre, P. Van Dooren, and R. Mahony {2004). "Cubically Convergent Iterations
for Invariant Subspace Computation," SIAM J. Matrix Anal. Applic. 26, 70-96.

8.3 The Symmetric QR Algorithm
The symmetric QR iteration (8.2.1) can be made more efficient in two ways. First, we
show how to compute an orthogonal Uo such that UJ'AUo = T is tridiagonal. With
this reduction, the iterates produced by (8.2.1) are all tridiagonal and this reduces the
work per step to O(n2). Second, the idea of shifts are introduced and with this change
the convergence to diagonal form proceeds at a cubic rate. This is far better than
having the off-diagonal entries going to to zero as IAi+i/Ailk as discussed in §8.2.5.
8.3.1 Reduction to Tridiagonal Form
If A is symmetric, then it is possible to find an orthogonal Q such that
(8.3.1)
is tridiagonal. We call this the tridiagonal decomposition and as a compression ofdata,
it represents a very big step toward diagonalization.
We show how to compute (8.3.1) with Householder matrices. Suppose that House
holder matrices Pi, . . . , Pk-I have been determined such that if
then
k-1
B
:
3
] k�1
833 n-k
n-k
is tridiagonal through its first k - 1 columns. If A is an order-(n - k) Householder
matrix such that AB32 is a multiple of In-k(:, 1) and if Pk = diag(Jk, Pk), then the
leading k-by-k principal submatrix of
[ Bu B12 0
l
k-1
Ak = PkAk-1Pk = B21 B22 B23A 1
0 AB32 AB33A n-k
k-1 n-k
is tridiagonal. Clearly, if Uo = P1 · · · Pn-2, then UJ'AUo = T is tridiagonal.
In the calculation of Ak it is important to exploit symmetry during the formation
of the matrix PkB33Fk. To be specific, suppose that A has the form
- T
Pk = I - /3vv , /3 21 T 0 -'- v E Rn-k.
= v v, -;-
Note that if p = f3B33V and w = p - (f3pTv/2)v, then
Since only the upper triangular portion of this matrix needs to be calculated, we see
that the transition from Ak-l to Ak can be accomplished in only 4(n - k)2 flops.

8.3. The Symmetric QR Algorithm 459
Algorithm 8.3.1 (Householder Tridiagonalization) Given a symmetric A E R.nxn, the
following algorithm overwrites A with T = QTAQ, where T is tridiagonal and Q =
H1 · · · Hn-2 is the product of Householder transformations.
for k = l:n - 2
[v, .BJ = house(A(k + l:n, k))
p = ,BA(k + l:n, k + l:n)v
w = p - (,BpTv/2)v
A(k + 1, k) = II A(k + l:n, k) 112; A(k, k + 1) = A(k + 1, k)
A(k + l:n, k + l:n) = A(k + l:n, k + l:n) - vwT - wvT
end
This algorithm requires 4n3/3 flops when symmetry is exploited in calculating the rank-
2 update. The matrix Q can be stored in factored form in the subdiagonal portion of
A. If Q is explicitly required, then it can be formed with an additional 4n3/3 flops.
Note that if T has a zero subdiagonal, then the eigenproblem splits into a pair of
smaller eigenproblems. In particular, if tk+l,k = 0, then
>.(T) = >.(T(l:k, l:k)) u >.(T(k + l:n, k + l:n)).
If T has no zero subdiagonal entries, then it is said to be unreduced.
Let T denote the computed version of T obtained by Algorithm 8.3.1. It can
be shown that T= QT(A + E)Q where Q is exactly orthogonal and E is a symmetric
matrix satisfying II E llF � cull A llF where c is a small constant. See Wilkinson (AEP,
p. 297).
8.3.2 Properties of the Tridiagonal Decomposition
We prove two theorems about the tridiagonal decomposition both of which have key
roles to play in the following. The first connects (8.3.1) to the QR factorization of a
certain Krylov matrix. These matrices have the form
K(A, v, k) = [ v I Av I · · · I Ak-lv ] ,
Theorem 8.3.1. IfQTAQ = T is the tridiagonal decomposition of the symmetric ma
trix A E R.nxn, then QTK(A, Q(:, 1), n) = R is upper triangular. If R is nonsingular,
then T is unreduced. If R is singular and k is the smallest index so rkk = 0, then k is
also the smallest index so tk,k-1 is zero. Compare with Theorem 1.4.3.
Proof. It is clear that if q1 = Q(:, 1), then
QTK(A, Q(:, 1), n) = [ QTq1 I (QTAQ)(QTq1) I · . · I (QTAQ)n-l (QTq1) ]
= [ el I Te1 I · . · I rn-1ei ] = R
is upper triangular with the property that ru = 1 and rii = t21ta2 · · · ti,i-l for i = 2:n.
Clearly, if R is nonsingular, then T is unreduced. If R is singular and rkk is its first
zero diagonal entry, then k � 2 and tk,k-1 is the first zero subdiagonal entry. 0

The next result shows that Q is essentially unique once Q(:, 1) is specified.
Theorem 8.3.2 (Implicit Q Theorem). Suppose Q = [ Q1 I · · · I Qn ] and V =
[ v1 I · · · I Vn ] are orthogonal matrices with the property that both QTAQ = T and
vrAV = S are tridiagonal where A E nrxn is symmetric. Let k denote the smallest
positive integer for which tk+I,k = 0, with the convention that k = n ifT is unreduced.
If v1 = Q1 , then Vi = ±qi and lti,i-1 I =
lsi,i-1 I for i = 2:k. Moreover, if k < n, then
Sk+I,k = 0. Compare with Theorem 7.4.2.
Proof. Define the orthogonal matrix W = QTV and observe that W(:, 1) = In(:, 1) =
e1 and wrrw = S. By Theorem 8.3.1, wr·K(T, e1, k) is upper triangular with full
column rank. But K(T, ei , k) is upper triangular and so by the essential uniqueness
of the thin QR factorization, W(:, l:k) = In(:, l:k) ·diag(±l, . . . , ±1). This says that
Q(:, i) = ±V(:, i) for i = l:k. The comments about the subdiagonal entries follow since
ti+I,i = Q(:, i + l)rAQ(:, i) and Si+i,i = V(:,i + l)TAV(:, i) for i = l:n - 1. 0
8.3.3 The QR Iteration and Tridiagonal Matrices
We quickly state four facts that pertain to the QR iteration and tridiagonal matrices.
Complete verifications are straightforward.
• Preservation of Form. If T = QR is the QR factorization of a symmetric tridi
agonal matrix T E 1Rnxn, then Q has lower bandwidth 1 and R has upper band
width 2 and it follows that T+ = RQ = QT(QR)Q = qrTQ is also symmetric
and tridiagonal.
• Shifts. If s E JR and T - sl = QR is the QR factorization, then T+ = RQ + sl =
QTTQ is also tridiagonal. This is called a shifted QR step.
• Perfect Shifts. If T is unreduced, then the first n - 1 columns of T - sl are
independent regardless of s. Thus, if s E .A(T) and QR = T - sl is a QR
factorization, then rnn = 0 and the last column ofT+ = RQ+sl equals sin(:, n) =
sen.
• Cost. If T E 1Rnxn is tridiagonal, then its QR factorization can be computed by
applying a sequence of n - 1 Givens rotations:
for k = l:n - 1
[c, s] = givens(tkk, tk+I,k)
m = min{k + 2, n}
T(k:k + 1, k:m) = [ -� � ]T
T(k:k + 1, k:m)
end
This requires O(n) flops. If the rotations are accumulated, then O(n2) flops are
needed.

8.3. The Symmetric QR Algorithm 461
8.3.4 Explicit Single-Shift QR Iteration
If s is a good approximate eigenvalue, then we suspect that the (n, n - 1) will be small
after a QR step with shift s. This is the philosophy behind the following iteration:
If
T = UJ'AUo (tridiagonal)
for k = 0, 1, . . .
end
Determine real shift µ.
T - µI = UR (QR factorization)
T = RU + µI
T
0
(8.3.2)
0
then one reasonable choice for the shift is µ = an- However, a more effective choice is
to shift by the eigenvalue of
T(n - l:n, n - l:n) = [an-l
bn-1
that is closer to a11• This is known as the Wilkinson shift and it is given by
µ = an + d - sign(d)Vd2 + b;_1 (8.3.3)
where d = (an-l - an)/2. Wilkinson (1968) has shown that (8.3.2) is cubically
convergent with either shift strategy, but gives heuristic reasons why (8.3.3) is preferred.
8.3.5 Implicit Shift Version
It is possible to execute the transition from T to T+ = RU + µI = urru without
explicitly forming the matrix T-µI. This has advantages when the shift is much larger
than some of the ai. Let c = cos(B) and s = sin(B) be computed such that
[ c
:n"
�
µ
i [ � l·
-s
If we set G1 = G(l, 2, B), then G1e1 = Ue1 and
x x + 0 0 0
x x x 0 0 0
T +- GfTG1
+ x x x 0 0
0 0 x x x 0
0 0 0 x x x
0 0 0 0 x x

We are thus in a position to apply the implicit Q theorem provided we can compute
rotations G2, . . . , Gn-1 with the property that if Z = G1G2 · · · Gn-1, then Ze1 =
G1e1 = Ue1 and zTTz is tridiagonal. Note that the first column of Z and U are
identical provided we take each Gi to be of the form Gi = G{i, i + 1, 0i), i = 2:n- l.
But Gi of this form can be used to chase the unwanted nonzero element "+" out of
the matrix GfTG1 as follows:
x x 0
x x x
0 x x
0 + x
0 0 0
0 0 0
x x 0
x x x
0 x x
0 0 x
0 0 0
0 0 0
0 0 0
+ 0 0
x 0 0
x x 0
x x x
0 x x
0 0 0
0 0 0
x 0 0
x x +
x x x
+ x x
x
x
0
0
0
0
x
x
0
0
0
0
x 0 0
x x 0
x x x
0 x x
0 + x
0 0 0
x 0 0
x x 0
x x x
0 x x
0 0 x
0 0 0
0 0
0 0
+ 0
x 0
x x
x x
0 0
0 0
0 0
x 0
x x
x x
Thus, it follows from the implicit Q theorem that the tridiagonal matrix zTTz pro
duced by this zero-chasing technique is essentially the same as the tridiagonal matrix
T obtained by the explicit method. (We may assume that all tridiagonal matrices in
question are unreduced for otherwise the problem decouples.)
Note that at any stage ofthe zero-chasing, there is only one nonzero entry outside
the tridiagonal band. How this nonzero entry moves down the matrix during the update
T +- GfTGk is illustrated in the following:
[1 0 0 0 lT[ak bk Zk
0 c s 0 bk a,, b,,
0 -S C 0 Zk bp aq
0 0 0 1 0 0 bq
0 0 0 l [ak bk
c s 0 _ bk a,,
-s c 0 - 0 b,,
0 0 1 0 Zp
Here (p, q, r) = (k + 1, k + 2, k + 3). This update can be performed in about 26 flops
once c and s have been determined from the equation bks+ ZkC = 0. Overall, we obtain
Algorithm 8.3.2 {Implicit Symmetric QR Step with Wilkinson Shift) Given
an unreduced symmetric tridiagonal matrix T E Rnxn, the following algorithm over
writes T with zTTz, where Z = G1 · · • Gn-l is a product of Givens rotations with the
property that zT(T - µI) is upper triangular and µ is that eigenvalue of T's trailing
2-by-2 principal submatrix closer to tnn·
d = {tn-1,n-l - tnn)/2
µ = tnn - t�,n-1! (d + sign{dh/d2 + t�,n-1 )
x = tu - µ
z = t21

8.3. The Symmetric QR Algorithm
for k = l:n - 1
end
[ c, s ] = givens(x, z)
T = GITGk, where Gk = G(k, k + 1, 8)
if k < n - 1
X = tk+I,k
z = tk+2,k
end
463
This algorithm requires about 30n fl.ops and n square roots. If a given orthogonal
matrix Q is overwritten with QG1 · · · Gn-1 , then an additional 6n2 fl.ops are needed.
Of course, in any practical implementation the tridiagonal matrix T would be stored
in a pair of n-vectors and not in an n-by-n array.
Algorithm 8.3.2 is the basis of the symmetric QR algorithm-the standard means
for computing the Schur decomposition of a dense symmetric matrix.
Algorithm 8.3.3 (Symmetric QR Algorithm) Given A E Rnxn (symmetric) and
a tolerance tol greater than the unit roundoff, this algorithm computes an approximate
symmetric Schur decomposition QTAQ = D. A is overwritten with the tridiagonal
decomposition.
Use Algorithm 8.3.1, compute the tridiagonalization
T = (P1 · · · Pn-2)TA(P1 · · · Pn-2)
Set D = T and if Q is desired, form Q = P1 · · · Pn-2· (See §5.1.6.)
until q = n
end
For i = l:n - 1, set di+t,i and di,i+l to zero if
l�+i,il = ldi,i+i I � tol (ldiil + ldi+1,Hi l)
Find the largest q and the smallest p such that if
0 0
D22 0
0 Daa
p n-p-q
then D33 is diagonal and D22 is unreduced.
if q < n
Apply Algorithm 8.3.2 to D22:
D = diag(Jp, Z, lq)T · D· diag(Jp, Z, lq)
If Q is desired, then Q = Q· diag(Jp, Z, lq)·
end
q
l
p
n-p-q
q
This algorithm requires about 4n3/3 fl.ops if Q is not accumulated and about 9n3 fl.ops
if Q is accumulated.

The computed eigenvalues �i obtained via Algorithm 8.3.3 are the exact eigen
values of a matrix that is near to A:
T A
Q0 (A + E)Qo = diag(Ai),
Using Corollary 8.1.6 we know that the absolute error in each �i is small in the sense
that
l�i - Ail � ullA 112·
If Q = [ iii I · · ·
I <in ] is the computed matrix of orthonormal eigenvectors, then the
accuracy of iii depends on the separation of Ai from the remainder of the spectrum.
See Theorem 8.1.12.
If all ofthe eigenvalues and a few ofthe eigenvectors are desired, then it is cheaper
not to accumulate Q in Algorithm 8.3.3. Instead, the desired eigenvectors can be found
via inverse iteration with T. See §8.2.2. Usually just one step is sufficient to get a good
eigenvector, even with a random initial vector.
Ifjust a few eigenvalues and eigenvectors are required, then the special techniques
in §8.4 are appropriate.
8.3.6 The Rayleigh Quotient Connection
It is interesting to identify a relationship between the Rayleigh quotient iteration and
the symmetric QR algorithm. Suppose we apply the latter to the tridiagonal matrix
T E Rnxn with shift a = e'f:.Ten = tnn· If T- al= QR, then we obtain T+ = RQ+al.
From the equation (T - al)Q = RT it follows that
where qn is the last column of the orthogonal matrix Q. Thus, if we apply {8.2.6) with
xo = en, then X1 = qn.
8.3.7 Orthogonal Iteration with Ritz Acceleration
Recall from §8.2.4 that an orthogonal iteration step involves a matrix-matrix product
and a QR factorization:
Zk = AQk-1,
Theorem 8.1.14 says that we can minimize II AQk - QkS llF by setting S equal to
-r -
Sk = Qk AQk.
If U'{SkUk = Dk is the Schur decomposition of Sk E R'"xr and Qk = QkUk, then
showing that the columns of Qk are the best possible basis to take after k steps from
the standpoint of minimizing the residual. This defines the Ritz acceleration idea:

8.3. The Symmetric QR Algorithm
Q0 E 1Rnxr given with Q'f;Qo = I,.
for k = 1, 2, . . .
end
Zk = AQk-1
- r -
Sk = Qk AQk
u'[skuk = Dk
Qk = QkUk
(Schur decomposition)
It can be shown that if
then
IB�k) - ,i(A)I = O (I,�:1 lk
),
465
(8.3.6)
i = l:r.
Recall that Theorem 8.2.2 says the eigenvalues ofQfAQk converge with rate IAr+ifArlk·
Thus, the Ritz values converge at a more favorable rate. For details, see Stewart (1969).
Problems
PB.3.1 Suppose >. is an eigenvalue of a symmetric tridiagonal matrix T. Show that if >. has algebraic
multiplicity k, then at least k - 1 of T's subdiagonal elements arc zero.
PB.3.2 Suppose A is symmetric and has bandwidth p. Show that if we perform the shifted QR step
A - µI = QR, A = RQ + µI, then A has bandwidth p.
PB.3.3 Let
A = [ : : ]
be real and suppose we perform the following shifted QR step: A - zl = UR, A = RU + zl. Show
that
where
w = w+ x2(w-z)/[(w- z)2 + x2],
z = z-x2(w-z)/[(w-z)2 + x2],
x = -x3/[(w - z)2 + x2].
PB.3.4 Suppose A E <Cnxn is Hermitian. Show how to construct unitary Q such that QHAQ = T is
real, symmetric, and tridiagonal.
PB.3.5 Show that if A = B + iC is Hermitian, then
M = [ � -� ]
is symmetric. Relate the eigenvalues and eigenvectors of A and !vi.
PB.3.6 Rewrite Algorithm 8.3.2 for the case when A is stored in two n-vectors. Justify the given flop
count.
PB.3.7 Suppose A = S+ uuuT where S E Rnxn is skew-symmetric (ST = -S), u E Rn has unit

2-norm, and <T E R. Show how to compute an orthogonal Q such that QTAQ is tridiagonal and
QTu = e 1 .
P8.3.8 Suppose
C = [ � B: ]
where BE Rnxn is upper bidiagonal. Determine a perfect shuffle permutation P E R2"x2n so that
T = PCpT is tridiagonal with a zero diagonal.
Historically important Algol specifications related to the algorithms in this section include:
R.S. Martin and J.H. Wilkinson (1967). "Solution of Symmetric and Unsymmetric Band Equations
and the Calculation of Eigenvectors of Band Matrices," Numer. Math. 9, 279-301.
H. Bowdler, R.S. Martin, C. Reinsch, and J.H. Wilkinson (1968). "The QR and QL Algorithms for
Symmetric Matrices," Numer. Math. 11, 293-306.
A. Dubrulle, R.S. Martin, and J.H. Wilkinson (1968). "The Implicit QL Algorithm," Numer. Math.
12, 377-383.
R.S. Martin and J.H. Wilkinson (1968). "Householder's Tridiagonalization of a Symmetric Matrix,"
Numer. Math. 11, 181-195.
C. Reinsch and F.L. Bauer (1968). "Rational QR Transformation with Newton's Shift for Symmetric
Tridiagonal Matrices," Numer. Math. 11, 264-272.
R.S. Martin, C. Reinsch, and J.H. Wilkinson (1970). "The QR Algorithm for Band Symmetric Ma
trices," Numer. Math. 16, 85-92.
The convergence properties of Algorithm 8.3.3 are detailed in Lawson and Hanson (SLE), see:
J.H. Wilkinson (1968). "Global Convergence of Tridiagonal QR Algorithm With Origin Shifts," Lin.
Alg. Applic. 1, 409-420.
T.J. Dekker and J.F. Traub (1971). "The Shifted QR Algorithm for Hermitian Matrices," Lin. Alg.
Applic. 4, 137-154.
W. Hoffman and B.N. Parlett (1978). "A New Proof of Global Convergence for the Tridiagonal QL
Algorithm," SIAM J. Numer. Anal. 15, 929-937.
S. Batterson (1994). "Convergence of the Francis Shifted QR Algorithm on Normal Matrices," Lin.
Alg. Applic. 207, 181-195.
T.-L. Wang (2001). "Convergence of the Tridiagonal QR Algorithm," Lin. Alg. Applic. 322, 1-17.
Shifting and deflation are critical to the effective implementation of the symmetric QR iteration, see:
F.L. Bauer and C. Reinsch (1968). "Rational QR Transformations with Newton Shift for Symmetric
Tridiagonal Matrices," Numer. Math. 11, 264-272.
G.W. Stewart (1970). "Incorporating Origin Shifts into the QR Algorithm for Symmetric Tridiagonal
Matrices," Commun. ACM 13, 365-367.
I.S. Dhillon and A.N. Malyshev (2003). "Inner Deflation for Symmetric Tridiagonal Matrices," Lin.
Alg. Applic. 358, 139-144.
The efficient reduction of a general band symmetric matrix to tridiagonal form is a challenging com
putation from several standpoints:
H.R. Schwartz (1968). "Tridiagonalization of a Symmetric Band Matrix," Nv.mer. Math. 12, 231-241.
C.H. Bischof and X. Sun (1996). "On Tridiagonalizing and Diagonalizing Symmetric Matrices with
Repeated Eigenvalues," SIAM J. Matrix Anal. Applic. 17, 869-885.
L. Kaufman (2000). "Band Reduction Algorithms Revisited," ACM TI-ans. Math. Softw. 26, 551-567.
C.H. Bischof, B. Lang, and X. Sun (2000). "A Framework for Symmetric Band Reduction," ACM
TI-ans. Math. Softw. 26, 581-601.
Finally we mention that comparable techniques exist for skew-symmetric and general normal matrices,
see:
R.C. Ward and L.J. Gray (1978). "Eigensystem Computation for Skew-Symmetric and A Class of
Symmetric Matrices," ACM TI-ans. Math. Softw. 4, 278-285.

8.4. More Methods for Tridiagonal Problems 467
C.P. Huang (1981). "On the Convergence of the QR Algorithm with Origin Shifts for Normal Matri
ces," IMA J. Numer. Anal. 1, 127-133.
S. Iwata (1998). "Block Triangularization of Skew-Symmetric Matrices," Lin. Alg. Applic. 273,
215-226.
8.4 More Methods for Tridiagonal Problems
In this section we develop special methods for the symmetric tridiagonal eigenproblem.
The tridiagonal form
0:1 !31 0
!31 0:2
T = (8.4.1)
f3n-1
0 f3n-1 lln
can be obtained by Householder reduction (cf. §8.3.1). However, symmetric tridiagonal
eigenproblems arise naturally in many settings.
We first discuss bisection methods that are of interest when selected portions of
the eigensystem are required. This is followed by the presentation of a divide-and
conquer algorithm that can be used to acquire the full symmetric Schur decomposition
in a way that is amenable to parallel processing.
8.4.1 Eigenvalues by Bisection
Let Tr denote the leading r-by-r principal submatrix of the matrix T in (8.4.1). Define
the polynomial Pr(x) by
p,.(x) = det(Tr - xI)
for r = 1:n. A simple determinantal expansion shows that
Pr(x) = (ar - x)Pr-1(x) - f3;_1Pr-2(x) (8.4.2)
for r = 2:n if we set Po(x) = 1. Because Pn(x) can be evaluated in O(n) flops, it is
feasible to find its roots using the method of bisection. For example, if tol is a small
positive constant, Pn(Y)·Pn(z) < 0, and y < z, then the iteration
while IY - zl > tol·(IYI + lzl)
x = (y + z)/2
end
if Pn(x)·Pn(Y) < 0
Z = X
else
y = x
end
is guaranteed to terminate with (y+z)/2 an approximate zero ofPn(x), i.e., an approxi
mate eigenvalue of T. The iteration converges linearly in that the error is approximately
halved at each step.

8.4.2 Sturm Sequence Methods
Sometimes it is necessary to compute the kth largest eigenvalue ofT for some prescribed
value of k. This can be done efficiently by using the bisection idea and the following
classical result:
Theorem 8.4.1 (Sturm Sequence Property). If the tridiagonal matrix in {8.4.1)
has no zero subdiagonal entries, then the eigenvalues ofTr-I strictly separate the eigen
values ofTr:
Moreover, if a(.X) denotes the number of sign changes in the sequence
{ Po(.X), PI(.X), · · · , Pn(A) },
then a(.X) equals the number of T 's eigenvalues that are less than .X. Here, the poly
nomials Pr(x) are defined by {8.4.2) and we have the convention that Pr(.X) has the
opposite sign from Pr-I (.A) ifPr(.X) = 0.
Proof. It follows from Theorem 8.1.7 that the eigenvalues of Tr-l weakly separate
those of Tr. To prove strict separation, suppose that Pr(µ) = Pr-l (µ) = 0 for some r
and µ. It follows from (8.4.2) and the assumption that the matrix T is unreduced that
Po(µ) = PI (µ) = · · · = Pr(µ) = 0,
a contradiction. Thus, we must have strict separation. The assertion about a(.A) is
established in Wilkinson (AEP, pp. 300-301). D
Suppose we wish to compute .Ak(T). From the Gershgorin theorem (Theorem
8.1.3) it follows that .Xk(T) E [y, z] where
y = min ai - lbil - lbi-II ,
I:5i:5n
z = max ai + lbil + lbi-I I
I:5i:5n
and we have set b0 = bn = 0. Using [ y, z J as an initial bracketing interval, it is clear
from the Sturm sequence property that the iteration
while lz - YI > u(IYI + lzl)
x = (y + z)/2
end
if a(x) ?:. n - k
Z = X
else
y = x
end
(8.4.3)
produces a sequence of subintervals that are repeatedly halved in length but which
always contain Ak(T).

During the execution of {8.4.3), information about the location of other eigen
values is obtained. By systematically keeping track of this information it is pos
sible to devise an efficient scheme for computing contiguous subsets of >.(T), e.g.,
{>.k(T), >.k+l (T), . . . , >.k+i(T)}. See Barth, Martin, and Wilkinson {1967).
If selected eigenvalues of a general symmetric matrix A are desired, then it is
necessary first to compute the tridiagonalization T = UJ'AUo before the above bisection
schemes can be applied. This can be done using Algorithm 8.3.1 or by the Lanczos
algorithm discussed in §10.2. In either case, the corresponding eigenvectors can be
readily found via inverse iteration since tridiagonal systems can be solved in 0(n)
flops. See §4.3.6 and §8.2.2.
In those applications where the original matrix A already has tridiagonal form,
bisection computes eigenvalues with small relative error, regardless of their magnitude.
This is in contrast to the tridiagonal QR iteration, where the computed eigenvalues 5.i
can be guaranteed only to have small absolute error: 15.i - >.i(T)I � ull T 112
Finally, it is possible to compute specific eigenvalues of a symmetric matrix by
using the LDLT factorization (§4.3.6) and exploiting the Sylvester inertia theorem
(Theorem 8.1.17). If
A - µ/ = LDLT,
is the LDLT factorization of A - µI with D = diag{di , . . . , dn), then the number of
negative di equals the number of >.i(A) that are less than µ. See Parlett {SEP, p. 46)
for details.
8.4.3 Eigensystems of Diagonal Plus Rank-1 Matrices
Our next method for the symmetric tridiagonal eigenproblem requires that we be able
to compute efficiently the eigenvalues and eigenvectors of a matrix of the form D+pzzT
where D E Rnxn is diagonal, z E Rn, and p E R. This problem is important in its own
right and the key computations rest upon the following pair of results.
Lemma 8.4.2. Suppose D = diag(di . . . . , dn) E Rnxn with
di > · · · > dn.
Assume that p f. 0 and that z E Rn has no zero components. If
v f. 0,
then zTv f. 0 and D - >.I is nonsingular.
Proof. If >. E >.(D), then >. = di for some i and thus
0 = ef[(D - >.I)v + p(zTv)z] = p(zTv)zi.
Since p and Zi are nonzero, it follows that 0 = zTv and so Dv = >.v. However, D
has distinct eigenvalues and therefore v E span{ei}. This implies 0 = zTv = Zi, a
contradiction. Thus, D and D + pzzT have no common eigenvalues and zTv f. 0. 0

Theorem 8.4.3. Suppose D = diag(d1 , . . . ,dn) E R.nxn and that the diagonal entries
satisfy di > · · · > dn. Assume that p =/. 0 and that z E R.n has no zero components. If
V E R.nxn is orthogonal such that
VT(D + pzzT)V = diag(A1 , .. . , An)
with Ai � · · · � An and V = [ V1 I · · · I Vn ] , then
(a) The Ai are the n zeros ofj(A) = 1 + pzT(D - M)-1z.
(b) If p > 0, then Ai > di > A2 > · · · > An > dn.
If p < 0, then di > Al > d2 > · · · > dn > An.
(c) The eigenvector Vi is a multiple of (D - Ail)-1z.
Proof. If (D + pzzT)v = AV, then
(D - AI)v + p(zTv)z = 0.
We know from Lemma 8.4.2 that D - Al is nonsingular. Thus,
v E span{(D - AI)-1z},
(8.4.4)
thereby establishing (c). Moreover, if we apply zT(D - M)-1 to both sides of equation
(8.4.4) we obtain
(zTv)· (1 + pzT(D - M)-1z) = 0.
By Lemma 8.4.2, zTv =f. 0 and so this shows that if A E A(D+pzzT), then f(A) = 0. We
must show that all the zeros of f are eigenvalues of D + pzzT and that the interlacing
relations (b) hold.
To do this we look more carefully at the equations
f(A) 1 + p (---=L+ . . . + -=L) '
di - A dn - A
J'(A) = p ((d1�A)2 + . . . + (dn:A)2 ).
Note that f is monotone in between its poles. This allows us to conclude that, if p > 0,
then f has precisely n roots, one in each of the intervals
If p < 0, then f has exactly n roots, one in each of the intervals
Thus, in either case the zeros of f are exactly the eigenvalues of D + pvvT. 0
The theorem suggests that in order to compute V we must find the roots Al, ..., An
of f using a Newton-like procedure and then compute the columns of V by normalizing

the vectors (D - Ai/)-1z for i = l :n. The same plan of attack can be followed even if
there are repeated di and zero Zi.
Theorem 8.4.4. If D = diag(d1, . . . , dn) and z E IRn, then there exists an orthogonal
matrix Vi such that if VtDVi = diag(µ1, . . . , µn) and w = Vtz then
µ1 > µ2 > · · · > µr � µr+l � · · · � µn ,
Wi =f 0 for i = l:r, and Wi = 0 for i = r + l:n.
Proof. We give a constructive proof based upon two elementary operations. The first
deals with repeated diagonal entries while the second handles the situation when the
z-vector has a zero component.
Suppose di = di for some i < j . Let G{i,j, 0) be a Givens rotation in the {i,j)
plane with the property that the jth component of G{i, j, O)Tz is zero. It is not hard
to show that G{i, j, O)TD G(i,j, 0) = D. Thus, we can zero a component of z if there
is a repeated di.
If Zi = 0, Zj =f 0, and i < j, then let P be the identity with columns i and j
interchanged. It follows that pTDP is diagonal, (PTz)i =f 0, and (PTz)i = 0. Thus,
we can permute all the zero Zi to the "bottom."
It is clear that the repetition of these two maneuvers will render the desired
canonical structure. The orthogonal matrix Vi is the product of the rotations that are
required by the process. D
See Barlow {1993) and the references therein for a discussion of the solution procedures
that we have outlined above.
8.4.4 A Divide-and-Conquer Framework
We now present a divide-and-conquer method for computing the Schur decomposition
{8.4.5)
for tridiagonal T that involves (a) "tearing" T in half, {b) computing the Schur decom
positions of the two parts, and (c) combining the two half-sized Schur decompositions
into the required full-size Schur decomposition. The overall procedure, developed by
Dongarra and Sorensen {1987), is suitable for parallel computation.
We first show how T can be "torn" in half with a rank-1 modification. For
simplicity, assume n = 2m and that T E IR.nxn is given by {8.4.1). Define v E IR.n as
follows
v =
[0e�l ]• 0 E {-1, +1}. {8.4.6)
Note that for all p E IR the matrix T = T - pvvT is identical to T except in its "middle
four" entries:
-
= [ O!m - p
T(m:m + 1, m:m + 1)
f3m - pO

If we set p() = f3m, then
where
0!1 !31 0
!31 0!2
Ti =
fJm-1
0 fJm-1 Um
and am = am -p and llm+i = am+I - p()2.
Um+i f3m+I
f3m+I O!m+2
0
0
f3n-l
f3n-l O!n
Now suppose that we have m-by-m orthogonal matrices Q1 and Q2 such that
QfT1Qi = D1 and QfT2Q2 = D2 are each diagonal. If we set
then
where
is diagonal and
u = [�i ;2
],
urru = ur ([:i �2
l+ pvvr)u = D + pzzr
D [�1 ;2
]
Z = UTv = [ Qf;m l·
()Q2 e1
Comparing these equations we see that the effective synthesis of the two half-sized
Schur decompositions requires the quick and stable computation of an orthogonal V
such that
VT(D + pzzT)V = A = diag(.A1 , . . . , .An)
which we discussed in §8.4.3.
8.4.5 A Parallel Implementation
Having stepped through the tearing and synthesis operations, wecannow illustrate how
the overall process can be implemented in parallel. For clarity, assume that n = 8N
for some positive integer N and that three levels of tearing are performed. See Figure
8.4.1. The indices are specified in binary and at each node the Schur decomposition of
a tridiagonal matrix T(b) is obtained from the eigensystems of the tridiagonals T(bO)
and T(bl). For example, the eigensystems for the N-by-N matrices T(llO) and T(lll)
are combined to produce the eigensystem for the 2N-by-2N tridiagonal matrix T(ll).
What makes this framework amenable to parallel computation is the independence of
the tearing/synthesis problems that are associated with each level in the tree.

8.4. More Methods for Tridiagonal Problems
T
T(O)
�
T(l)
�
T(OO) T(Ol) T(lO) T(ll)
8.4.6
A A A A
T(OOO) T(OOl) T(OlO) T(Oll) T(lOO) T(101) T(llO)
Figure 8.4.1. The divide-and-conquer framework
An Inverse Tridiagonal Eigenvalue Problem
T(lll)
473
For additional perspective on symmetric trididagonal matrices and their rich eigen
structure we consider an inverse eigenvalue problem. Assume that A1, . . . , An and
5.1, . . . , Xn-1 are given real numbers that satisfy
Al > 5.1 > A2 > · · · > A�-1 > >-n-1 > An ·
The goal is to compute a symmetric tridiagonal matrix T E lR,nxn such that
A(T) = {>.1, . . . , An, } ,
A(T(2:n, 2:n)) = {5.1 , . . . .5.n-d·
(8.4.7)
(8.4.8)
(8.4.9)
Inverse eigenvalue problems arise in many applications and generally involve computing
a matrix that has specified spectral properties. For an overview, see Chu and Golub
(2005). Our example is taken from Golub (1973).
The problem we are considering can be framed as a Householder tridiagonalization
problem with a constraint on the orthogonal transformation. Define
A = diag(A1, . . . , An)
and let Q be orthogonal so that QTAQ = T is tridiagonal. There are an infinite number
ofpossible Q-matrices that do this and in each case the matrix T satisfies (8.4.8). The
challenge is to choose Q so that (8.4.9) holds as well. Recall that a tridiagonalizing Q is
essentially determined by its first column because of the implicit-Q- theorem (Theorem
8.3.2). Thus, the problem is solved if we can figure out a way to compute Q(:, 1) so
that (8.4.9) holds.
The starting point in the derivation ofthe method is to realize that the eigenvalues
of T(2:n, 2:n) are the stationary values of xTTx subject to the constraints xTx = 1
and efx = 0. To characterize these stationary values we use the method of Lagrange
multipliers and set to zero the gradient of
</>(x, A, /L) = xTTx - A(xTx - 1) + 2µxTe1

which gives (T - >..I)x = -µe1 . Because A is an eigenvalue of T(2:n, 2:n) it is not an
eigenvalue of T and so
Since efx = 0, it follows that
n d2
L A· .:_A
i=l t
(8.4.10)
where
Q(,, 1) � [:] (8.4.11)
By multiplying both sides of equation (8.4.10) by (A1 - A) · · · (An-A), we can conclude
that 5.1, ...,An-l are the zeros of the polynomial
It follows that
n n
p(A) = LdT II(Aj - A).
i=l j=l
#i
n- 1
p(A) = a · II (Aj -A)
j=l
for some scalar a. By comparing the coefficient of An-l in each of these expressions
for p(A) and noting from (8.4.11) that d� + · · · + d; = 1, we see that a = 1. From the
equation
we immediately see that
n n
LdT II(Aj - A)
i=l j=l
#i
n-1
II (Aj - A)
j=l
k = l:n. (8.4.12)
It is easy to show using (8.4.7) that the quantity on the right is positive and thus
(8.4.11) can be used to determine the components of d = Q(:, 1) up to with a factor
of ±1. Once this vector is available, then we can determine the required tridiagonal
matrix T as follows:
Step 1. Let P be a Householder matrix so that Pd = ±1 and set A = pTAP.
Step 2. Compute the tridiagonalization QfAQ1 = T via Algorithm 8.3.1 and ob
serve from the implementation that Qi(:, 1) = ei.
Step 3. Set Q = PQ1 .

It follows that Q(:, 1) = P(Q1e1) = Pei = ±d. The sign does not matter.
Problems
p&.4.1 Suppose >. is an eigenvalue of a symmetric tridiagonal matrix T. Show that if >. has algebraic
multiplicity k, then T has at least k - 1 subdiagonal entries that are zero.
p&.4.2 Give an algorithm for determining p and 9 in (8.4.6) with the property that 9 E {-1, 1} and
min{ lam - pl, lam+l - Pl } is maximized.
p&.4.3 Let Pr(>.) = det(T(l:r, l:r) - >.Ir) where T is given by (8.4.1). Derive a recursion for evaluating
p�(>.) and use it to develop a Newton iteration that can compute eigenvalues of T.
PS.4.4 If T is positive definite, does it follow that the matrices T1 and T2 in §8.4.4 are positive
definite?
PS.4.5 Suppose A = S+uuuT where S E Rnxn is skew-symmetric, u E Rn, and u E R. Show how to
compute an orthogonal Q such that QTAQ = T + ue1ef where T is tridiagonal and skew-symmetric.
PS.4.6 Suppose >. is a known eigenvalue of a unreduced symmetric tridiagonal matrix T E Rnxn.
Show how to compute x(l:n - 1) from the equation Tx = >.x given that Xn = 1.
PS.4.7 Verify that the quantity on the right-hand side of (8.4.12) is positive.
PS.4.8 Suppose that
A = [ ; d: J
where D = diag(d1, . . . , dn-1) has distinct diagonal entries and v E Rn-l has no zero entries. (a)
Show that if >. E >.(A), then D- >.In-1 is nonsingular. (b) Show that if >. E >.(A), then >. is a zero of
Bisection/Sturm sequence methods are discussed in:
W. Barth, R.S. Martin, and J.H. Wilkinson (1967). "Calculation of the Eigenvalues of a Symmetric
Tridiagonal Matrix by the Method of Bisection," Numer. Math. 9, 386-393.
K.K. Gupta (1972). "Solution of Eigenvalue Problems by Sturm Sequence Method," Int. J. Numer.
Meth. Eng. 4, 379--404.
J.W. Demmel, I.S. Dhillon, and H. Ren (1994) "On the Correctness of Parallel Bisection in Floating
Point," ETNA 3, 116-149.
Early references concerned with the divide-and-conquer framework that we outlined include:
J.R. Bunch, C.P. Nielsen, and D.C. Sorensen (1978). "Rank-One Modification of the Symmetric
Eigenproblem," Numer. Math. 31, 31-48.
J.J.M. Cuppen (1981). "A Divide and Conquer Method for the Symmetric Eigenproblem," Numer.
Math. 36, 177-195.
J.J. Dongarra and D.C. Sorensen (1987). "A Fully Parallel Algorithm for the Symmetric Eigenvalue
Problem," SIAM J. Sci. Stat. Comput. 8, S139-S154.
Great care must be taken to ensure orthogonality in the computed matrix of eigenvectors, something
that is a major challenge when the eigenvalues are close and clustered. The development of reliable
implementations is a classic tale that involves a mix of sophisticated theory and clever algorithmic
insights, see:
M. Gu and S.C. Eisenstat (1995). "A Divide-and-Conquer Algorithm for the Symmetric Tridiagonal
Eigenproblem," SIAM J. Matrix Anal. Applic. 16, 172-191.
B.N. Parlett (1996). "Invariant Subspaces for Tightly Clustered Eigenvalues of Tridiagonals," BIT
36, 542-562.
B.N. Parlett and I.S. Dhillon (2000). "Relatively Robust Representations of Symmetric Tridiagonals,''
Lin. Alg. Applic. 309, 121-151.

l.S. Dhillon and B.N. Parlett (2003). "Orthogonal Eigenvectors and Relative Gaps," SIAM J. Matri,x
Anal. Applic. 25, 858-899.
l.S. Dhillon and B.N. Parlett (2004). "Multiple Representations to Compute Orthogonal Eigenvectors
of Symmetric Tridiagonal Matrices," Lin. Alg. Applic. 387, 1-28.
O.A. Marques, B.N. Parlett, and C. Vomel (2005). "Computations of Eigenpair Subsets with the
MRRR Algorithm," Numer. Lin. Alg. Applic. 13, 643-653.
P. Bientinesi, LS. Dhillon, and R.A. van de Geijn (2005). "A Parallel Eigensolver for Dense Symmetric
Matrices Based on Multiple Relatively Robust Representations," SIAM J. Sci. Comput. 27, 43-66.
Various extensions and generalizations of the basic idea have also been proposed:
S. Huss-Lederman, A. Tsao, and T. Turnbull (1997). "A Parallelizable Eigensolver for Real Diago
nalizable Matrices with Real Eigenvalues," SIAM .J. Sci. Comput. 18, 869-885.
B. Hendrickson, E. Jessup, and C. Smith (1998). "Toward an Efficient Parallel Eigensolver for Dense
Symmetric Matrices," SIAM J. Sci. Comput. 20, 1132-1154.
W.N. Gansterer, J. Schneid, and C.W. Ueberhuber (2001). "A Low-Complexity Divide-and-Conquer
Method for Computing Eigenvalues and Eigenvectors of Symmetric Band Matrices," BIT 41, 967-
976.
W.N. Gansterer, R.C. Ward, and R.P. Muller (2002). "An Extension of the Divide-and-Conquer
Method for a Class of Symmetric Block-Tridiagonal Eigenproblems,'' ACM Trans. Math. Softw.
28, 45-58.
W.N. Gansterer, R.C. Ward, R.P. Muller, and W.A. Goddard and III (2003). "Computing Approxi
mate Eigenpairs of Symmetric Block Tridiagonal Matrices," SIAM J. Sci. Comput. 24, 65-85.
Y. Bai and R.C. Ward (2007). "A Parallel Symmetric Block-Tridiagonal Divide-and-Conquer Algo
rithm," A CM Trans. Math. Softw. 33, Article 35.
For a detailed treatment of various inverse eigenvalue problems, see:
M.T. Chu and G.H. Golub (2005). Inverse Eigenvalue Problems, Oxford University Press, Oxford,
U.K.
Selected papers that discuss a range of inverse eigenvalue problems include:
D. Boley and G.H. Golub (1987). "A Survey of Matrix Inverse Eigenvalue Problems,'' Inverse Problems
3, 595- 622.
M.T. Chu (1998). "Inverse Eigenvalue Problems," SIAM Review 40, 1-39.
C.-K. Li and R. Mathias (2001). "Construction of Matrices with Prescribed Singular Values and
Eigenvalues," BIT 41, 115-126.
The derivation in §8.4.6 involved the constrained optimization of a quadratic form, an important
problem in its own right, see:
G.H. Golub and R. Underwood (1970). "Stationary Values of the Ratio of Quadratic Forms Subject
to Linear Constraints," Z. Angew. Math. Phys. 21, 318-326.
G.H. Golub (1973). "Some Modified Eigenvalue Problems," SIAM Review 15, 318--334.
S. Leon (1994). "Maximizing Bilinear Forms Subject to Linear Constraints," Lin. Alg. Applic. 210,
49-58.
8.5 Jacobi Methods
Jacobi methods for the symmetric eigenvalue problem attract current attention be
cause they are inherently parallel. They work by performing a sequence of orthogonal
similarity updates A � QTAQ with the property that each new A, although full, is
"more diagonal" than its predecessor. Eventually, the off-diagonal entries are small
enough to be declared zero.
After surveying the basic ideas behind the Jacobi approach we develop a parallel
Jacobi procedure.

8.5. Jacobi Methods
8.5.l The Jacobi Idea
The idea behind Jacobi's method is to systematically reduce the quantity
off(A)
n n
L L a�i ,
i=l j = l
Ni
477
i.e., the Frobenius norm of the off-diagonal elements. The tools for doing this are
rotations of the form
1 0 0 0
0 c s . . . 0 p
J(p, q, fJ) =
0 - s c 0 q
0 0 0 1
p q
which we call Jacobi rotations. Jacobi rotations are no different from Givens rotations;
see §5.1.8. We submit to the name change in this section to honor the inventor.
The basic step in a Jacobi eigenvalue procedure involves (i) choosing an index
pair (p, q) that satisfies 1 � p < q � n, (ii) computing a cosine-sine pair (c, s) such that
[bpp bpq l [ c s ]T
[app apq ] [ c s
l
bqp bqq -s c aq1, aqq -s c
(8.5.1)
is diagonal, and (iii) overwriting A with B = JTA.I where J = J(p, q, fJ). Observe that
the matrix B agrees with A except in rows and columns p and q. Moreover, since the
Frobenius norm is preserved by orthogonal transformations, we find that
2 2 2 2 b2 b2 2b2 b2 b2
aPP + aqq + apq = PP + qq + pq = PP + qq·
It follows that
n
off(B)2 = II B II! - L b�i
i=l
= off(A)2 - 2a;q .
n
= II A II! - L a�i + (a;P + a�q - b;P - b�q) (8.5.2)
i=l
It is in this sense that A moves closer to diagonal form with each Jacobi step.
Before we discuss how the index pair (p, q) can be chosen, let us look at the actual
computations associated with the (p, q) subproblem.

8.5.2 The 2-by-2 Symmetric Schur Decomposition
To say that we diagonalize in (8.5.I) is to say that
0 = bpq = apq(c2 - s2) + (app - aqq)cs.
If apq = 0, then we just set c = I and s = 0. Otherwise, define
a - a
T =
qq PP and t = s/c
2apq
and conclude from (8.5.3) that t = tan(8) solves the quadratic
t2 + 2rt - I = 0 .
It turns out to be important to select the smaller of the two roots:
tmin = { I/(r + JI + r2) if T � 0,
I/(r - JI + r2) if T < O.
(8.5.3)
This is implies that the rotation angle satisfies 181 :5 7r/4 and has the effect of maxi-
mizing c:
C = IJ/I + t�in• S = tmin C ·
This in turn minimizes the difference between A and the update B:
..
II B - A II! = 4(I - c) L (a�P + a�q) + 2a�q/c2•
i=l
i"#-p,q
We summarize the 2-by-2 computations as follows:
Algorithm 8.5.1 Given an n-by-n symmetric A and integers p and q that satisfy
I :5 p < q :5 n, this algorithm computes a cosine-sine pair {c, s} such that if B =
J(p, q, 8)TAJ(p, q, 8), then bpq = bqp = 0.
function [c , s] = symSchur2(A,p, q)
if A(p, q) =/:- 0
T = (A(q, q) - A(p,p))/(2A(p, q))
if T � 0
t = I/(r + JI + r2)
else
t = I/(r - JI + r2)
end
c = I/JI + t2, s = tc
else
c = I, s = 0
end

8.5. Jacobi Methods 479
8.5.3 The Classical Jacobi Algorithm
As we mentioned above, only rows and columns p and q are altered when the (p, q)
subproblem is solved. Once sym5chur2 determines the 2-by-2 rotation, then the update
A +- J(p, q, ())TAJ(p, q, (}) can be implemented in 6n flops if symmetry is exploited.
How do we choose the indices p and q? From the standpoint of maximizing the
reduction of off(A) in (8.5.2), it makes sense to choose (p, q) so that a�q is maximal.
This is the basis of the classical Jacobi algorithm.
Algorithm 8.5.2 (Classical Jacobi} Given a symmetric A E 1Rnxn and a positive
tolerance tol, this algorithm overwrites A with VTAV where V is orthogonal and
off(VTAV) :'5 tol· ll A llF·
V = In, O = tol · 11 A llF
while off(A) > o
end
Choose (p, q) so lapql = maxi#i laii l
[c , s] = sym5chur2(A,p, q)
A = J(p, q, ())TA J(p, q, (})
v = vJ(p, q, (})
Since lapql is the largest off-diagonal entry,
where
From (8.5.2) it follows that
off(A)2 :::; N(a;q + a�p)
N =
n(n - 1)
2
.
off(B)2 :::; (1 -�)off(A)2 •
By induction, if A(k) denotes the matrix A after k Jacobi updates, then
This implies that the classical Jacobi procedure converges at a linear rate.
However, the asymptotic convergence rate of the method is considerably better
than linear. Schonhage (1964) and van Kempen (1966) show that for k large enough,
there is a constant c such that
i.e., quadratic convergence. An earlier paper by Henrici (1958) established the same
result for the special case when A has distinct eigenvalues. In the convergence theory
for the Jacobi iteration, it is critical that IOI :::; 7r/4. Among other things this precludes
the possibility of interchanging nearly converged diagonal entries. This follows from

the formulae bpp = app -tapq and bqq = aqq +tapq, which can be derived from Equation
(8.5.1) and the definition t = sin{O)/ cos(O).
It is customary to refer to N Jacobi updates as a sweep. Thus, after a sufficient
number of iterations, quadratic convergence is observed when examining off{A) after
every sweep.
There is no rigorous theory that enables one to predict the number ofsweeps that
are required to achieve a specified reduction in off{A). However, Brent and Luk {1985)
have argued heuristically that the number of sweeps is proportional to log(n) and this
seems to be the case in practice.
8.5.4 The Cyclic-by-Row Algorithm
The trouble with the classical Jacobi method is that the updates involve O(n) flops
while the search for the optimal {p, q) is O(n2) . One way to address this imbalance is
to fix the sequence of subproblems to be solved in advance. A reasonable possibility is
to step through all the subproblems in row-by-row fashion. For example, if n = 4 we
cycle as follows:
(p, q) = (1, 2), {1, 3), {1, 4), (2, 3), (2, 4), (3, 4), (1, 2), . . . .
This ordering scheme is referred to as cyclic by row and it results in the following
procedure:
Algorithm 8.5.3 (Cyclic Jacobi) Given a symmetric matrix A E JRnxn and a positive
tolerance tol, this algorithm overwrites A with vrAV where V is orthogonal and
off(VTAV) � tol· ll A llF·
V = ln, 8 = tol · ll A llF
while off(A) > 8
end
for p = l:n - 1
end
for q = p + l:n
end
[c, s] = sym5chur2{A,p, q)
A = J(p, q, ())TAJ(p, q, ())
v = vJ(p, q, ())
The cyclic Jacobi algorithm also converges quadratically. (See Wilkinson {1962) and
van Kempen {1966).) However, since it does not require off-diagonal search, it is
considerably faster than Jacobi's original algorithm.
8.5.5 Error Analysis
Using Wilkinson's error analysis it is possible to show that if r sweeps are required by
Algorithm 8.5.3 and d1, . . . , dn specify the diagonal entries of the final, computed A

matrix, then
n
i=l
for some ordering of A's eigenvalues Ai· The parameter kr depends mildly on r.
Although the cyclic Jacobi method converges quadratically, it is not generally
competitive with the symmetric QR algorithm. For example, ifwejust count flops, then
two sweeps of Jacobi arc roughly equivalent to a complete QR reduction to diagonal
form with accumulation of transformations. However, for small n this liability is not
very dramatic. Moreover, if an approximate eigenvector matrix V is known, then
yrAV is almost diagonal, a situation that Jacobi can exploit but not QR.
Another interesting feature of the Jacobi method is that it can compute the
eigenvalues with small relative error if A is positive definite. To appreciate this point,
note that the Wilkinson analysis cited above coupled with the §8.1 perturbation theory
ensures that the computed eigenvalues -X1 2: · · · 2: -Xn satisfy
However, a refined, componentwise error analysis by Demmel and Veselic (1992) shows
that in the positive definite case
(8.5.4)
where D = diag(foil, . . . , �) and this is generally a much smaller approximating
bound. The key to establishing this result is some new perturbation theory and a
demonstration that if A+ is a computed Jacobi update obtained from the current
matrix Ac, then the eigenvalues of A+ are relatively close to the eigenvalues of Ac
in the sense of (8.5.4). To make the whole thing work in practice, the termination
criterion is not based upon the comparison of off(A) with ull A llF but rather on the
size of each laij I compared to uJaiilljj.
8.5.6 Block Jacobi Procedures
It is usually the case when solving the symmetric eigenvalue problem on a p-processor
machine that n » p. In this case a block version of the Jacobi algorithm may be
appropriate. Block versions of the above procedures are straightforward. Suppose that
n = rN and that we partition the n-by-n matrix A as follows:
Here, each Aij is r-by-r. In a block Jacobi procedure the (p, q) subproblem involves
computing the 2r-by-2r Schur decomposition
Vpq ]T[
Vqq ] [�: �:] [

and then applying to A the block Jacobi rotation made up of the Vii· If we call this
block rotation V, then it is easy to show that
off(VTAV)2 = off(A)2 - (211 Apq II! + off(App)2 + off(Aqq)2) .
Block Jacobi procedures have many interesting computational aspects. For example,
there are several ways to solve the subproblems, and the choice appears to be critical.
See Bischof (1987).
8.5.7 A Note on the Parallel Ordering
The Block Jacobi approach to the symmetric eigenvalue problem has an inherent par
allelism that has attracted significant attention. The key observation is that the (i1,j1)
subproblem is independent of the (i2,j2) subproblem ifthe four indices i1, ji, i2, and
j2 are distinct. Moreover, if we regard the A as a 2m-by-2m block matrix, then it
is possible to partition the set of off-diagonal index pairs into a collection of 2m - 1
rotation sets, each of which identifies m, nonconfticting subproblems.
A good way to visualize this is to imagine a chess tournament with 2m players in
which everybody must play everybody else exactly once. Suppose m = 4. In "round
1" we have Player 1 versus Player 2, Player 3 versus Player 4, Player 5 versus Player
6, and Player 7 versus Player 8. Thus, there are four tables of action:
I � II ! II � II � I·
This corresponds to the first rotation set:
rot.set(!) = { (1, 2), (3, 4), (5, 6), (7, 8) }.
To set up rounds 2 through 7, Player 1 stays put and Players 2 through 8 move from
table to table in merry-go-round fashion:
I ! II � II � II � I rot.set(2) {(1, 4), (2, 6), (3, 8), (5, 7)},
I ! II : II � II � I rot.set(3) = {(1, 6), (4, 8), (2, 7), (3, 5)},
I � II � II : II � I rot.set(4) =
{(1, 8), (6, 7), (4, 5), (2, 3)},
I � II � II � II � I rot.set(5) = {(1, 7), (5, 8), (3, 6), (2, 4)},
I ! II � II � II � I rot.set(6) = {(1, 5), (3, 7), (2, 8), (4, 6)},
I ! II � II � II � I rot.set(7) {(1, 3), (2, 5), (4, 7), (6, 8)}.

Taken in order, the seven rotation sets define the parallel ordering of the 28 possible
off-diagonal index pairs.
For general m, a multiprocessor implementation would involve solving the sub
problems within each rotation set in parallel. Although the generation of the subprob
lem rotations is independent, some synchronization is required to carry out the block
similarity transform updates.
Problems
P8.5.l Let the scalar "'( be given along with the matrix
A = [ � � ] ·
It is desired to compute an orthogonal matrix
J =
c
-s � ]
such that the (1, 1) entry of JTAJ equals 'Y· Show that this requirement leads to the equation
(w - -y)T
2 - 2xT + (z - -y) = 0,
where T = c/s. Verify that this quadratic has real roots if 'Y satisfies .>.2 :5 'Y :5 .>.1 , where .>.1 and .>.2
are the eigenvalues of A.
P8.5.2 Let A E Fxn be symmetric. Give an algorithm that computes the factorization
QTAQ = -yl + F
where Q is a product of Jacobi rotations, 'Y = tr(A)/n, and F has zero diagonal entries. Discuss the
uniqueness of Q.
P8.5.3 Formulate Jacobi procedures for (a) skew-symmetric matrices and (b) complex Hermitian
matrices.
P8.5.4 Partition the n-by-n real symmetric matrix A as follows:
A = [ :
vT ] 1
A1 n-1
1 n-1
Let Q be a Householder matrix such that if B = QTAQ, then B(3:n, 1) = 0. Let J = J(l, 2, 6) be
determined such that if C = JTBJ, then c12 = 0 and en ;::: c22. Show en ;::: a + II v 112. La Budde
(1964) formulated an algorithm for the symmetric eigenvalue probem based upon repetition of this
Householder-Jacobi computation.
P8.5.5 When implementing the cyclic Jacobi algorithm, it is sensible to skip the annihilation of apq
if its modulus is less than some small, sweep-dependent parameter because the net reduction in off(A)
is not worth the cost. This leads to what is called the threshold Jacobi method. Details concerning
this variant of Jacobi's algorithm may be found in Wilkinson (AEP, p. 277). Show that appropriate
thresholding can guarantee convergence.
P8.5.6 Given a positive integer m, let M = (2m - l)m. Develop an algorithm for computing integer
vectors i,j E RM so that (ii .ii), . . . , (iM,iM) defines the parallel ordering.
Jacobi's original paper is one of the earliest references found in the numerical analysis literature:
C.G.J. Jacobi (1846). "Uber ein Leichtes Verfahren Die in der Theorie der Sacularstroungen Vorkom
mendern Gleichungen Numerisch Aufzulosen," Crelle's J. 90, 51-94.
Prior to the QR algorithm, the Jacobi technique was the standard method for solving dense symmetric
eigenvalue problems. Early references include:

M. Lotkin (1956). "Characteristic Values of Arbitrary Matrices," Quart. Appl. Math. 14, 267-275.
D.A. Pope and C. Tompkins (1957). "Maximizing Functions of Rotations: Experiments Concerning
Speed of Diagonalization of Symmetric Matrices Using Jacobi's Method,'' J. ACM 4, 459-466.
C.D. La Budde (1964). "Two Classes of Algorithms for Finding the Eigenvalues and Eigenvectors of
Real Symmetric Matrices," J. A CM 11, 53-58.
H. Rutishauser (1966). "The Jacobi Method for Real Symmetric Matrices,'' Numer. Math. 9, 1-10.
See also Wilkinson (AEP, p. 265) and:
J.H. Wilkinson (1968). "Almost Diagonal Matrices with Multiple or Close Eigenvalues," Lin. Alg.
Applic. 1, 1-12.
Papers that are concerned with quadratic convergence include:
P. Henrici (1958). "On the Speed of Convergence of Cyclic and Quasicyclic .Jacobi Methods for
Computing the Eigenvalues of Hermitian Matrices," SIAM J. Appl. Math. 6, 144-162.
E.R. Hansen (1962). "On Quasicyclic Jacobi Methods," J. ACM 9, 118 135.
J.H. Wilkinson (1962). "Note on the Quadratic Convergence of the Cyclic Jacobi Process," Numer.
Math. 6, 296-300.
E.R. Hansen (1963). "On Cyclic Jacobi Methods," SIAM J. Appl. Math. 11, 448-459.
A. Schonhage (1964). "On the Quadratic Convergence of the Jacobi Process,'' Numer. Math. 6,
410-412.
H.P.M. van Kempen (1966). "On Quadratic Convergence of the Special Cyclic .Jacobi Method,"
Numer. Math. 9, 19-22.
P. Henrici and K. Zimmermann (1968). "An Estimate for the Norms of Certain Cyclic Jacobi Opera
tors," Lin. Alg. Applic. 1, 489- 501.
K.W. Brodlie and M.J.D. Powell (1975). "On the Convergence of Cyclic Jacobi Methods,'' J. Inst.
Math. Applic. 15, 279-287.
The ordering of the subproblems within a sweep is important:
W.F. Mascarenhas (1995). "On the Convergence of the .Jacobi Method for Arbitrary Orderings,"
Z. Dramac (1996). "On the Condition Behaviour in the Jacobi Method,'' SIAM J. Matrix Anal.
Applic. 1 7, 509-514.
V. Hari (2007). "Convergence of a Block-Oriented Quasi-Cyclic Jacobi Method,'' SIAM J. Matrix
Anal. Applic. 29, 349-369.
z. Drmac (2010). "A Global Convergence Proof for Cyclic Jacobi Methods with Block Rotations,''
Detailed error analyses that establish the high accuracy of Jacobi's method include:
J. Barlow and J. Demmel (1990). "Computing Accurate Eigensystems of Scaled Diagonally Dominant
Matrices,'' SIAM J. Numer. Anal. 27, 762-791.
J.W. Demmel and K. Veselic (1992). "Jacobi's Method is More Accurate than QR,'' SIAM J. Matrix
Anal. Applic. 13, 1204-1245.
W.F. Mascarenhas (1994). "A Note on Jacobi Being More Accurate than QR,'' SIAM J. Matrix Anal.
Applic. 15, 215-218.
R. Mathias (1995). "Accurate Eigensystem Computations by Jacobi Methods," SIAM J. Matrix Anal.
Applic. 16, 977-1003.
K. Veselic (1996). "A Note on the Accuracy of Symmetric Eigenreduction Algorithms,'' ETNA 4,
37-45.
F.M. Dopico, J.M. Molera, and J. Moro (2003). "An Orthogonal High Relative Accuracy Algorithm
for the Symmetric Eigenproblem,'' SIAM J. Matrix Anal. Applic. 25, 301-351.
F.M. Dopico, P. Koev, and J.M. Molera (2008). "Implicit Standard Jacobi Gives High Relative
Accuracy,'' Numer. Math. 113, 519-553.
Attempts have been made to extend the Jacobi iteration to other classes of matrices and to push
through corresponding convergence results. The case of normal matrices is discussed in:
H.H. Goldstine and L.P. Horowitz (1959). "A Procedure for the Diagonalization of Normal Matrices,"
J. ACM 6, 176-195.
G. Loizou (1972). "On the Quadratic Convergence of the Jacobi Method for Normal Matrices,"
Comput. J. 15, 274-276.

M.H.C. Paardekooper {1971). "An Eigenvalue Algorithm for Skew Symmetric Matrices," Numer.
Math. 17, 189-202.
A. Ruhe {1972). "On the Quadratic Convergence of the .Jacobi Method for Normal Matrices," BIT 7,
305-313.
o. Hacon {1993). "Jacobi's iviethod for Skew-Symmetric Matrices,'' SIAM J. Matrix Anal. Applic.
14, 619-628.
Essentially, the analysis and algorithmic developments presented in the text carry over to the normal
case with minor modification. For non-normal matrices, the situation is considerably more difficult:
J. Greenstadt {1955). "A Method for Finding Roots of Arbitrary Matrices," Math. Tables and Other
Aids to Comp. 9, 47-52.
C.E. Froberg {1965). "On Triangularization of Complex Matrices by Two Dimensional Unitary Tran
formations," BIT 5, 230-234.
J. Boothroyd and P.J. Eberlein {1968). "Solution to the Eigenproblem by a Norm-Reducing Jacobi
Type Method (Handbook)," Numer. Math. 11, 1-12.
A. Ruhe {1968). "Onthe Quadratic Convergence of a Generalization ofthe Jacobi Method to Arbitrary
Matrices," BIT 8, 210-231.
A. Ruhe {1969). "The Norm of a Matrix After a Similarity Transformation," BIT 9, 53-58.
P.J. Eberlein (1970). "Solution to the Complex Eigenproblem by a Norm-Reducing Jacobi-type
Method," Numer. Math. 14, 232-245.
C.P. Huang {1975). "A Jacobi-Type Method for Triangularizing an Arbitrary Matrix," SIAM J.
Numer. Anal. 12, 566-570.
V. Hari {1982). "On the Global Convergence of the Eberlein Method for Real Matrices,'' Numer.
Math. 39, 361-370.
G.W. Stewart {1985). "A Jacobi-Like Algorithm for Computing the Schur Decomposition of a Non
hermitian Matrix," SIAM J. Sci. Stat. Comput. 6, 853-862.
C. Mehl (2008). "On Asymptotic Convergence of Nonsymmetric Jacobi Algorithms," SIAM J. Matrix
Anal. Applic. 30, 291-311.
Jacobi methods for complex symmetric matrices have also been developed, see:
J.J. Seaton (1969). "Diagonalization of Complex Symmetric Matrices Using a Modified .Jacobi Method,''
Comput. J. 12, 156-157.
P.J. Eberlein (1971). "On the Diagonalization of Complex Symmetric Matrices," J. Inst. Math.
Applic. 7, 377-383.
P. Anderson and G. Loizou (1973). "On the Quadratic Convergence of an Algorithm Which Diago
nalizes a Complex Symmetric Matrix,'' J. Inst. Math. Applic. 12, 261-271.
P. Anderson and G. Loizou (1976). "A Jacobi-Type Method for Complex Symmetric Matrices {Hand
book)," Numer. Math. 25, 347-363.
Other extensions include:
N. Mackey (1995). "Hamilton and Jacobi Meet Again: Quaternions and the Eigenvalue Problem,''
A.W. Bojanczyk {2003). "An Implicit Jacobi-like Method for Computing Generalized Hyperbolic
SVD," Lin. Alg. Applic. 358, 293-307.
For a sampling of papers concerned with various aspects of parallel Jacobi, see:
A. Sameh (1971). "On Jacobi and Jacobi-like Algorithms for a Parallel Computer," Math. Comput.
25, 579 590.
D.S. Scott, M.T. Heath, and R.C. Ward (1986). "Parallel Block Jacobi Eigenvalue Algorithms Using
Systolic Arrays," Lin. Alg. Applic. 77, 345-356.
P.J. Eberlein {1987). "On Using the Jacobi Method on a Hypercube,'' in Hypercube Multiprocessors,
M.T. Heath (ed.), SIAM Publications, Philadelphia.
G. Shroff and R. Schreiber (1989). "On the Convergence of the Cyclic Jacobi Method for Parallel
Block Orderings," SIAM J. Matrix Anal. Applic. 10, 326-346.
M.H.C. Paardekooper {1991). "A Quadratically Convergent Parallel Jacobi Process for Diagonally
Dominant Matrices with Nondistinct Eigenvalues," Lin. Alg. Applic. 145, 71-88.
T. Londre and N.H. Rhee {2005). "Numerical Stability of the Parallel Jacobi Method," SIAM J.
Matrix Anal. Applic. 26, 985 1000.

8.6 Computing the SVD
If UTAV = B is the bidiagonal decomposition of A E R.mxn, then VT(ATA)V = BTB
is the tridiagonal decomposition of the symmetric matrix ATA E R.nxn. Thus, there is
an intimate connection between Algorithm 5.4.2 (Householder bidiagonalization) and
Algorithm 8.3.1 (Householder tridiagonalization). In this section we carry this a step
further and show that there is a bidiagonal SVD procedure that corresponds to the
symmetric tridiagonal QR iteration. Before we get into the details, we catalog some
important SVD properties that have algorithmic ramifications.
8.6.1 Connections to the Symmetric Eigenvalue Problem
There are important relationships between the singular value decomposition ofa matrix
A and the Schur decompositions of the symmetric matrices
S1 = ATA,
Indeed, if
UTAV = diag(a1, . . . , an)
is the SVD of A E R.mxn (m 2:: n), then
VT(ATA)V = diag(ar, . . . , a;) E R.nxn
and
Moreover, if
= diag(ar, . . . , a;, 0, . . . , O ) E R.mxm
�
n m-n
m-n
and we define the orthogonal matrix Q E R.(m+n)x(m+n) by
then
QT [ � �T ]Q = diag(ai, . . . , an, -a1, . . . , -an, � ).
m-n
(8.6.1)
(8.6.2)
(8.6.3)
These connections to the symmetric eigenproblem allow us to adapt the mathematical
and algorithmic developments of the previous sections to the singular value problem.
Good references for this section include Lawson and Hanson (SLS) and Stewart and
Sun (MPT).

8.6. Computing the SVD 487
8.6.2 Perturbation Theory and Properties
We first establish perturbation results for the SVD based on the theorems of §8.1.
Recall that ai(A) denotes the ith largest singular value of A.
Theorem 8.6.1. If A E Rmxn, then for k = l:min{m, n}
min
dim(S)=n-k+l
yTAx
-'---- =
II x 11211 Y 112
In this expression, S is a subspace ofRn.
max min
dim(S)=k xES
Proof. The rightmost characterization follows by applying Theorem 8.1.2 to ATA.
For the remainder of the proof see Xiang (2006). 0
Corollary 8.6.2. If A and A + E are in Rmxn with m � n, then for k = l:n
Proof. Define A and E by
A =
[ O AT l
A 0 ' A+ E =
[ 0 (A + E)T l·
A + E 0
(8.6.4)
The corollary follows by applying Corollary 8.1.6 with A replaced by A and A + E
replaced by A+ E. 0
Corollary 8.6.3. Let A = [ ai I · · · I an ) E Rmxn be a column partitioning with m �
n. If Ar = [ ai I · · · I ar ] , then for r = l:n - 1
a1(Ar+i) � a1(Ar) � a2(Ar+i) � · · · � ar(Ar+i) � ar(Ar) � O"r+i(Ar+i)·
Proof. Apply Corollary 8.1.7 to ATA. 0
The next result is a Wielandt-Hoffman theorem for singular values:
Theorem 8.6.4. If A and A + E are in Rmxn with m � n, then
n
L (ak(A + E) - ak(A))2 $ 11 E II! .
k= l
Proof. Apply Theorem 8.1.4 with A and E replaced by the matrices A and E defined
by (8.6.4). 0

For A E Rmxn we say that the k-dimensional subspaces S � Rn and T � Rm
form a singular subspace pair if x E S and y E T imply Ax E T and ATy E S. The
following result is concerned with the perturbation of singular subspace pairs.
Theorem 8.6.5. Let A, E E Rmxn with m � n be given and suppose that V E Rnxn
and U E Rmxm are orthogonal. Assume that
r n-r r m-r
and that ran(Vi) and ran(U1) form a singular subspace pair for A. Let
UTAV =
[ A�1 0
]m�r
urEV =
[ E11 E12
A22 E21 E22
r n-r r n-r
and assume that
8 = min la - 'YI > o.
uEu(A 1 1 )
")'Eu(A22 )
If
8
II E llF ::; 5 '
then there exist matrices P E R(n-r)xr and Q E R(m-r)xr satisfying
II[
� ]L ::; 4 II �11�-
]m�r
such that ran(Vi + ViQ) and ran(U1 + U2P) is a singular subspace pair for A + E.
Proof. See Stewart (1973, Theorem 6.4). D
Roughly speaking, the theorem says that 0(f) changes in A can alter a singular sub
space by an amount f/8, where 8 measures the separation of the associated singular
values.
8.6.3 The SVD Algorithm
We now show how a variant of the QR algorithm can be used to compute the SVD
of an A E Rmxn with m � n. At first glance, this appears straightforward. Equation
(8.6.1) suggests that we proceed as follows:
Step 1. Form C = ATA,
Step 2. Use the symmetric QR algorithm to compute VtCVi = diag(al}.
Step 3. Apply QR with column pivoting to AVi obtaining UT(AVi)II = R.

Since R has orthogonal columns, it follows that U'.l'A(ViII) is diagonal. However, as
we saw in §5.3.2, the formation of ATA can lead to a loss of information. The situation
is not quite so bad here, since the original A is used to compute U.
A preferable method for computing the SVD is described by Golub and Kahan
(1965). Their technique finds U and V simultaneously by implicitly applying the
symmetric QR algorithm to ATA. The first step is to reduce A to upper bidiagonal
form using Algorithm 5.4.2:
di Ji 0
0 d2
U'};AVH [ � ] ' B = E JR''xn .
fn-1
0 0 dn
The remaining problem is thus to compute the SVD of B. To this end, consider applying
an implicit-shift QR step (Algorithm 8.3.2) to the tridiagonal matrix T = BTB:
Step 1. Compute the eigenvalue A of
T(m:n, rn:n) = m m-l
[d2 + !2
dmfm
that is closer to d�, + f/;,.
Step 2. Compute c1 = cos(Oi) and s1 = sin(B1) imch that
and set G1 =
G(l, 2, (Ji).
rn = n- 1,
Step 8. Compute Givens rotations G2, . . . , Gn-l so that if Q = G1 · · · Gn-l then
QTTQ is tridiagonal and Qe1 = G1e1.
Note that these calculations require the explicit formation of BTB, which, as we have
seen, is unwise from the numerical standpoint.
Suppose instead that we apply the Givens rotation G1 above to B directly. Illus
trating with the n = 6 case we have
x x 0 0 0 0
+ x x 0 0 0
B +--- BG1
0 0 x x 0 0
0 0 0 x x 0
0 0 0 0 x x
0 0 0 0 0 x
We then can determine Givens rotations U1, Vi, U2,. . ., f,,_1 , and U,._1 to chase the
unwanted nonzero element down the bidiagonal:

x x + 0 0 0 x x 0 0 0 0
0 x x 0 0 0 0 x x 0 0 0
B +- U[B =
0 0 x x 0 0
0 0 0 x x 0
0 + x x 0 0
0 0 0 x x 0
0 0 0 0 x x 0 0 0 0 x x
0 0 0 0 0 x 0 0 0 0 0 x
x x 0 0 0 0 x x 0 0 0 0
0 x x + 0 0 0 x x 0 0 0
B +- U[B =
0 0 x x 0 0
0 0 0 x x 0
0 0 x x 0 0
0 0 + x x 0
B +- BVi
0 0 0 0 x x 0 0 0 0 x x
0 0 0 0 0 x 0 0 0 0 0 x
and so on. The process terminates with a new bidiagonal fJ that is related to B as
follows:
- T T - T -
B = (Un-1 . . . U1 )B(G1 Vi . . . Vn-d = u BV.
Since each Vi has the form Vi = G(i, i + l, Oi) where i = 2:n - 1, it follows that
Ve1 = Qe1 . By the Implicit Q theorem we can assert that V and Q are essentially the
same. Thus, we can implicitly effect the transition from T to f' = f3TB by working
directly on the bidiagonal matrix B.
Of course, for these claims to hold it is necessary that the underlying tridiagonal
matrices be unreduced. Since the subdiagonal entries of BTB are of the form ddi, it
is clear that we must search the bidiagonal band for zeros. If fk = 0 for some k, then
B =
[ B1 0 ] k
0 B2 n-k
k n-k
and the original SVD problem decouples into two smaller problems involving the ma
trices B1and B2. If dk = 0 for some k < n, then premultiplication by a sequence of
Givens transformations can zero fk· For example, if n = 6 and k = 3, then by rotating
in row planes (3,4), (3,5), and (3,6) we can zero the entire third row:
x x 0 0 0 0 x x 0 0 0 0
0 x x 0 0 0 0 x x 0 0 0
B
0 0 0 x 0 0
�
0 0 0 0 + 0
0 0 0 x x 0 0 0 0 x x 0
0 0 0 0 x x 0 0 0 0 x x
0 0 0 0 0 x 0 0 0 0 0 x
x x 0 0 0 0 x x 0 0 0 0
0 x x 0 0 0 0 x x 0 0 0
� 0 0 0 0 0 + (3,6) 0 0 0 0 0 0
0 0 0 0
--+
0 0 0 0
x x x x
0 0 0 0 x x 0 0 0 0 x x
0 0 0 0 0 x 0 0 0 0 0 x

If dn = 0, then the last column can be zeroed with a series ofcolumn rotations in planes
(n - 1, n), (n - 2, n), . . . , (1, n). Thus, we can decouple if Ji · · · fn-1 = 0 or di · · · dn =
o. Putting it all together we obtain the following SVD analogue of Algorithm 8.3.2.
Algorithm 8.6.1 (Golub-Kahan SVD Step) Given a bidiagonal matrix B E 1Rmxn
having no zeros on its diagonal or superdiagonal, the following algorithm overwrites
B with the bidiagonal matrix fJ = (jTBV where (j and V are orthogonal and V is
essentially the orthogonal matrix that would be obtained by applying Algorithm 8.3.2
to T = BTB.
Let µ be the eigenvalue of the trailing 2-by-2 submatrix of T = BTB
that is closer to tnn·
y = tu - µ
z = ti2
for k = l:n - 1
end
Determine c = cos(O) and s = sin(O) such that
[ y z ] [ -� � ] = [ * 0 ] .
B = B·G(k, k + l, O)
y = bkk
z = bk+l,k
Determine c = cos(O) and s = sin(O) such that
B = G(k, k + 1, 0fB
if k < n - 1
y = bk,k+l
z = bk,k+2
end
An efficient implementation of this algorithm would store B's diagonal and superdiag
onal in vectors d(l:n) and /(l:n - 1), respectively, and would require 30n flops and 2n
square roots. Accumulating U requires 6mn flops. Accumulating V requires 6n2 flops.
Typically, after a few of the above SVD iterations, the superdiagonal entry fn-l
becomes negligible. Criteria for smallness within B's band are usually of the form
I/ii :::; tol·( ldil + ldi+1I ),
ldil :::; tol· II B II,
where tol is a small multiple of the unit roundoff and II · II is some computationally
convenient norm. Combining Algorithm 5.4.2 (bidiagonalization), Algorithm 8.6.1, and
the decoupling calculations mentioned earlier gives the following procedure.

Algorithm 8.6.2 {The SVD Algorithm) Given A E JR'nxn (m ;::: n) and t:, a small
multiple of the unit roundoff, the following algorithm overwrites A with urAV = D+E,
where U E IR'"xm is orthogonal, V E Rnxn is orthogonal, D E Rmxn is diagonal, and
E satisfies 11 E 112 � ull A 112·
Use Algorithm 5.4.2 to compute the bidiagonalization.
until q = n
end
For i = l:n - 1, set bi,i+l to zero if lbi,i+ll � t:(lbiil + lbi+l,i+ll).
Find the largest q and the smallest p such that if
[
Bu 0 0
l
p
B 0 B22 0 n-p-q
0 0 B33 'l
p n-p-q q
then B33 is diagonal and B22 has a nonzero superdiagonal.
if q < n
end
if any diagonal entry in B22 is zero, then zero the
superdiagonal entry in the same row.
else
Apply Algorithm 8.6.1 to B22.
B = diag(Ip, U, Iq+m-n)TB diag(Iv, V, lq)
end
The amount of work required by this algorithm depends on how much of the SVD
is required. For example, when solving the LS problem, ur need never be explicitly
formed but merely applied to b as it is developed. In other applications, only the
matrix U1 = U(:, l:n) is required. Another variable that affects the volume of work
in Algorithm 8.6.2 concerns the R-bidiagonalization idea that we discussed in §5.4.9.
Recall that unless A is "almost square,'' it pays to reduce A to triangular form via QR
and before bidiagonalizing. If R-bidiagonalization is used in the SVD context, then we
refer to the overall process as the R-SVD. Figure 8.6.1 summarizes the work associated
with the various possibilities By comparing the entries in this table (which are meant
only as approximate estimates of work), we conclude that the R-SVD approach is more
efficient unless m � n.
8.6.4 Jacobi SVD Procedures
It is straightforward to adapt the Jacobi procedures of §8.5 to the SVD problem.
Instead of solving a sequence of 2-by-2 symmetric eigenproblems, we solve a sequence

8.6. Computing the SVD
Required Golub-Reinsch SVD R-SVD
E 4mn2 - 4n3/3 2mn2 + 2n3
E, V 4mn2 + 8n3 2mn2 + lln3
E, U 4m2n - 8mn2 4m2n + 13n3
E, U1 14mn2 - 2n3 6mn2 + lln3
E, U, V 4m2n + 8mn2 + 9n3 4m2n + 22n3
E, Ui, V 14mn2 + 8n3 6mn2 + 20n3
Figure 8.6.1. Work associated with various SVD-related calculations
493
of 2-by-2 SVD problems. Thus, for a given index .pair (p, q) we compute a pair of
rotations such that
[_:: :: r[::: :::i [
See P8.6.5. The resulting algorithm is referred to as two-sided because each update
involves a pre- and a post-multiplication.
A one-sided Jacobi algorithm involves a sequence of pairwise column orthogo
nalizations. For a given index pair (p, q) a Jacobi rotation J(p, q, 0) is determined so
that columns p and q of AJ(p, q, 0) are orthogonal to each other. See P8.6.8. Note
that this corresponds to zeroing the (p, q) and (q,p) entries in ATA. Once AV has
sufficiently orthogonal columns, the rest of the SVD (U and E) follows from column
scaling: AV = UE.
Problems
PB.6.1 Give formulae for the eigenvectors of
S =
[ : A
OT ]
in terms of the singular vectors of AE nm x n where m 2'. n.
PB.6.2 Relate the singular values and vectors of A=B + iC (B, C E irn x n) to those of
A =
[ � -
� ].
PB.6.3 Suppose B E nn x n is upper bidiagonal with diagonal entries d(l:n) and superdiagonal entries
/(l:n - 1). State and prove a singular value version of Theorem 8.3.1.
PB.6.4 Assume that n =2m and that S E Rn x n is skew-symmetric and tridiagonal. Show that there
exists a permutation P E Rn x n such that
[ O -BT ]
pTsp =
B 0
where B E Rm x m. Describe the structure of B and show how to compute the eigenvalues and eigen
vectors of S via the SVD of B. Repeat for the case n = 2m + 1.

PB.6.5 (a) Let
be real. Give a stable algorithm for computing c and s with c2 + s2 = 1 such that
B = [ c s ]c
-s c
is symmetric. (b) Combine (a) with Algorithm 8.5.1 to obtain a stable algorithm for computing
the SYD of C. (c) Part (b) can be used to develop a Jacobi-like algorithm for computing the SYD
of A E R"x n. For a given (p, q) with p < q, Jacobi transformations J(p, q, lh ) and J(p, q, 92) are
determined such that if
B = J(p, q, 91 )TAJ(p, q, 92),
then bpq = bqp = 0. Show
off(B)2 = off(A)
2
- a�9 - a�p·
(d) Consider one sweep of a cyclic-by-row Jacobi SYD procedure applied to A E Rnxn:
for p = l:n - 1
for q = p + l:n
A = J(p, q, 91 )TAJ(p, q, 62)
end
end
Assume that the Jacobi rotation matrices are chosen so that apq = aqp = 0 after the (p, q) update.
Show that if A is upper (lower) triangular at the beginning of the sweep, then it is lower (upper)
triangular after the sweep is completed. See Kogbetliantz (1955). (e) How could these Jacobi ideas
be used to compute the SYD of a rectangular matrix?
PB.6.6 Let :x and y be in R,.,. and define the orthogonal matrix Q by
Q =
[ c s ]·
-s c
Give a stable algorithm for computing c and s such that the columns of [:x I y] Q are orthogonal to
each other.
For a general perspective and overview of the SYD we recommend:
G.W. Stewart (1993). "On the Early History of the Singular Value Decomposition," SIAM Review
35, 551-566.
A.K. Cline and I.S. Dhillon (2006). "Computation of the Singular Value Decomposition," in Handbook
of Linear Algebra, L. Hogben (ed.), Chapman and Hall, London, §45-1.
A perturbation theory for the SYD is developed in Stewart and Sun (MPT). See also:
P.A. Wedin (1972). "Perturbation Bounds in Connection with the Singular Value Decomposition,"
BIT 12, 99-111.
G.W. Stewart (1973). "Error and Perturbation Bounds for Subspaces Associated with Certain Eigen
A. Ruhe (1975). "On the Closeness of Eigenvalues and Singular Values for Almost Normal Matrices,"
G.W. Stewart (1979). "A Note on the Perturbation of Singular Values," Lin. Alg. Applic. 28,
213-216.
G.W. Stewart (1984). "A Second Order Perturbation Expansion for Small Singular Values," Lin. Alg.
Applic. 56, 231-236.
S. Chandrasekaren and I.C.F. Ipsen (1994). "Backward Errors for Eigenvalue and Singular Value
Decompositions," Numer. Math. 68, 215-223.
R.J. Vaccaro (1994). "A Second-Order Perturbation Expansion for the SVD," SIAM J. Matrix Anal.
Applic. 15, 661-671.

J. Sun (1996). "Perturbation Analysis of Singular Subspaces and Deflating Subspaces," Numer. Math.
79, 235-263.
F.M. Dopico (2000). "A Note on Sin T Theorems for Singular Subspace Variations BIT 40, 395-403.
R.-C. Li and G. W. Stewart (2000). "A New Relative Perturbation Theorem for Singular Subspaces,''
.
c.-K. Li and R. Mathias (2002). "Inequalities on Singular Values of Block Triangular Matrices," SIAM
F.M. Dopico and J. Moro (2002). "Perturbation Theory for Simultaneous Bases of Singular Sub
spaces,'' BIT 42, 84-109.
K.A. O'Neil (2005). "Critical Points of the Singular Value Decomposition,'' SIAM J. Matrix Anal.
Applic. 27, 459-473.
M. Stewart (2006). "Perturbation of the SVD in the Presence of Small Singular Values," Lin. Alg.
Applic. 419, 53-77.
H. Xiang (2006). "A Note on the Minimax Representation for the Subspace Distance and Singular
Values," Lin. Alg. Applic. 414, 470-473.
W. Li and W. Sun (2007). "Combined Perturbation Bounds: I. Eigensystems and Singular Value
Decompositions,'' SIAM J. Matrix Anal. Applic. 29, 643-655.
J. Mateja.S and V. Hari (2008). "Relative Eigenvalues and Singular Value Perturbations of Scaled
Diagonally Dominant Matrices,'' BIT 48, 769-781.
Classical papers that lay out the ideas behind the SVD algorithm include:
G.H. Golub and W. Kahan (1965). "Calculating the Singular Values and Pseudo-Inverse of a Matrix,"
P.A. Businger and G.H. Golub (1969). "Algorithm 358: Singular Value Decomposition of the Complex
Matrix," Commun. A CM 12, 564-565.
G.H. Golub and C. Reinsch (1970). "Singular Value Decomposition and Least Squares Solutions,''
Numer. Math. 14, 403-420.
For related algorithmic developments and analysis, see:
T.F. Chan (1982). "An Improved Algorithm for Computing the Singular Value Decomposition," ACM
'.lhins. Math. Softw. 8, 72-83.
J.J.M. Cuppen (1983). "The Singular Value Decomposition in Product Form," SIAM J. Sci. Stat.
Comput. 4, 216-222.
J.J. Dongarra (1983). "Improving the Accuracy of Computed Singular Values," SIAM J. Sci. Stat.
Comput. 4, 712-719.
S. Van Ruffel, J. Vandewalle, and A. Haegemans (1987). "An Efficient and Reliable Algorithm for
Computing the Singular Subspace of a Matrix Associated with its Smallest Singular Values,'' J.
Comp. Appl. Math. 19, 313-330.
P. Deift, J. Demmel, L.-C. Li, and C. Tomei (1991). "The Bidiagonal Singular Value Decomposition
and Hamiltonian Mechanics," SIAM J. Numer. Anal. 28, 1463-1516.
R. Mathias and G.W. Stewart (1993). "A Block QR Algorithm and the Singular Value Decomposi
tion," Lin. Alg. Applic. 182, 91-100.
V. Mehrmann and W. Rath (1993). "Numerical Methods for the Computation of Analytic Singular
Value Decompositions,'' ETNA 1, 72-88.
A. Bjorck, E. Grimme, and P. Van Dooren (1994). "An Implicit Shift Bidiagonalization Algorithm for
Ill-Posed Problems,'' BIT 94, 510-534.
K.V. Fernando and B.N. Parlett (1994). "Accurate Singular Values and Differential qd Algorithms,"
Numer. Math. 67, 191-230.
S. Chandrasekaran and l.C.F. Ipsen (1995). "Analysis of a QR Algorithm for Computing Singular
Values,'' SIAM J. Matrix Anal. Applic. 16, 520-535.
U. von Matt (1997). "The Orthogonal qd-Algorithm,'' SIAM J. Sci. Comput. 18, 1163-1186.
K.V. Fernando (1998). "Accurately Counting Singular Values of Bidiagonal Matrices and Eigenvalues
of Skew-Symmetric Tridiagonal Matrices," SIAM J. Matrix Anal. Applic. 20, 373-399.
N.J. Higham (2000). "QR factorization with Complete Pivoting and Accurate Computation of the
SVD,'' Lin. Alg. Applic. 909, 153-174.
Divide-and-conquer methods for the bidiagonal SVD problem have been developed that are analogous
to the tridiagonal eigenvalue strategies outlined in §8.4.4:
J.W. Demmel and W. Kahan (1990). "Accurate Singular Values of Bidiagonal Matrices," SIAM J.
Sci. Stat. Comput. 11, 873-912.

E.R. Jessup and D.C. Sorensen (1994). "A Parallel Algorithm for Computing the Singular Value
Decomposition of a Matrix,'' SIAM J. Matrix Anal. Applic. 15, 530-548.
M. Gu and S.C. Eisenstat (1995). "A Divide-and-Conquer Algorithm for the Bidiagonal SVD,'' SIAM
P.R. Willems, B. Lang, and C. Vomel (2006). "Computing the Bidiagonal SVD Using Multiple
Relatively Robust Representations,'' SIAM J. Matrix Anal. Applic. 28, 907-926.
T. Konda and Y. Nakamura (2009). "A New Algorithm for Singular Value Decomposition and Its
Parallelization,'' Parallel Comput. 35, 331-344.
For structured SVD problems, there are interesting, specialized results, see:
S. Van Ruffel and H. Park (1994). "Parallel Tri- and Bidiagonalization of Bordered Bidiagonal Ma,.
trices,'' Parallel Comput. 20, 1107-1128.
J. Demmel and P. Koev (2004). "Accurate SVDs of Weakly Diagonally Dominant M-matrices,'' Num.
Math. 98, 99-104.
N. Mastronardi, M. Van Barel, and R. Vandebril (2008). "A Fast Algorithm for the Recursive Calcu-
lation of Dominant Singular Subspaces," J. Comp. Appl. Math. 218, 238-246.
Jacobi methods for the SVD fall into two categories. The two-sided Jacobi algorithms repeatedly
perform the update A +-- urAV producing a sequence of iterates that are increasingly diagonal.
E.G. Kogbetliantz (1955). "Solution of Linear Equations by Diagonalization of Coefficient Matrix,''
Quart. Appl. Math. 13, 123-132.
G.E. Forsythe and P. Henrici (1960). "The Cyclic Jacobi Method for Computing the Principal Values
of a Complex Matrix,'' '.lhlns. AMS 94, 1-23.
C.C. Paige and P. Van Dooren (1986). "On the Quadratic Convergence of Kogbetliantz's Algorithm
for Computing the Singular Value Decomposition,'' Lin. Alg. Applic. 77, 301-313.
J.P. Charlier and P. Van Dooren (1987). "On Kogbetliantz's SVD Algorithm in the Presence of
Clusters," Lin. Alg. Applic. 95, 135-160.
Z. Bai (1988). "Note on the Quadratic Convergence of Kogbetliantz's Algorithm for Computing the
Singular Value Decomposition," Lin. Alg. Applic. 104, 131-140.
J.P. Charlier, M. Vanbegin, and P. Van Dooren (1988). "On Efficient Implementation of Kogbetliantz's
Algorithm for Computing the Singular Value Decomposition," Numer. Math. 52, 279-300.
K.V. Fernando (1989). "Linear Convergence of the Row-Cyclic .Jacobi and Kogbetliantz Methods,"
Numer. Math. 56, 73-92.
Z. Drmae and K. Veselic (2008). "New Fast and Accurate Jacobi SVD Algorithm I," SIAM J. Matri:i:
Anal. Applic. 29, 1322-1342.
The one-sided Jacobi SVD procedures repeatedly perform the update A +-- AV producing a sequence
of iterates with columns that are increasingly orthogonal, see:
J.C. Nash (1975). "A One-Sided Tranformation Method for the Singular Value Decomposition and
Algebraic Eigenproblem," Comput. J. 18, 74-76.
P.C. Hansen (1988). "Reducing the Number of Sweeps in Hcstenes Method," in Singular Value
Decomposition and Signal Processing, E.F. Deprettere (ed.) North Holland, Amsterdam.
K. Veselic and V. Hari (1989). "A Note on a One-Sided Jacobi Algorithm," Numer. Math. 56,
627-633.
Careful implementation and analysis has shown that Jacobi SVD has remarkably accuracy:
J. Demmel, M. Gu, S. Eiscnstat, I. Slapnicar, K. Veselic, and Z. Drmae (1999). "Computing the
Singular Value Decomposition with High Relative Accuracy," Lin. Alg. Applic. 299, 21-80.
Z Drmae (1999). "A Posteriori Computation of the Singular Vectors in a Preconditioned Jacobi SYD
Algorithm," IMA J. Numer. Anal. 19, 191-213.
z. Drmac (1997). "Implementation of Jacobi Rotations for Accurate Singular Value Computation in
Floating Point Arithmetic," SIAM .J. Sci. Comput. 18, 1200-1222.
F.M. Dopico and J. Moro (2004). "A Note on Multiplicative Backward Errors of Accurate SYD
Algorithms," SIAM .J. Matrix Anal. Applic. 25, 1021-1031.
The parallel implementation of the Jacobi SVD has a long and interesting history:
F.T. Luk (1980). "Computing the Singular Value Decomposition on the ILLIAC IV,'' ACM '.lhins.
Math. Softw. 6, 524-539.

8.7. Generalized Eigenvalue Problems with Symmetry 497
R.P. Brent and F.T. Luk (1985). "The Solution of Singular Value and Symmetric Eigenvalue Problems
on Multiprocessor Arrays," SIAM J. Sci. Stat. Comput. 6, 69-84.
R.P. Brent, F.T. Luk, and C. Van Loan (1985). "Computation of the Singular Value Decomposition
Using Mesh Connected Processors," J. VLSI Computer Systems 1, 242-270.
F.T. Luk (1986). "A Triangular Processor Array for Computing Singular Values," Lin. Alg. Applic.
77, 259-274.
M. Berry and A. Sameh (1986). "Multiprocessor Jacobi Algorithms for Dense Symmetric Eigen
value and Singular Value Decompositions," in Proceedings International Conference on Parallel
Processing, 433-440.
R. Schreiber (1986). "Solving Eigenvalue and Singular Value Problems on an Undersized Systolic
Array," SIAM J. Sci. Stat. Comput. 7, 441-451.
C.H. Bischof and C. Van Loan (1986). "Computing the SVD on a Ring of Array Processors," in Large
Scale Eigenvalue Problems, J. Cullum and R. Willoughby (eds.), North Holland, Amsterdam, 51-
66.
C.H. Bischof (1987). "The Two-Sided Block Jacobi Method on Hypercube Architectures,'' in Hyper
cube Multiproce.,sors, M.T. Heath (ed.), SIAM Publications, Philadelphia, PA.
C.H. Bischof (1989). "Computing the Singular Value Decomposition on a Distributed System of Vector
Processors," Parallel Comput. 11, 171-186.
M. Beca, G. Oksa, M. Vajtersic, and L. Grigori (2010). "On Iterative QR Pre-Processing in the
Parallel Block-Jacobi SVD Algorithm," Parallel Comput. 36, 297-307.
8. 7 Generalized Eigenvalue Problems with Symmetry
This section is mostly about a pair of symmetrically structured versions of the general
ized eigenvalue problem that we considered in §7.7. In the symmetric-definite problem
we seek nontrivial solutions to the problem
Ax = >.Bx (8.7.1)
where A E R''xn is symmetric and B E Rnxn is symmetric positive definite. The gen
eralized singular value problem has the form
ATAx = µ2BTBx (8.7.2)
where A E Rmixn and B E Rm2xn. By setting B = In we see that these problems are
(respectively) generalizations of the symmetric eigenvalue problem and the singular
value problem.
8.7.1 The Symmetric-Definite Generalized Eigenproblem
The generalized eigenvalues of the symmetric-definite pair {A, B} are denoted by
A(A, B) where
>.(A, B) = { >. I det(A - >.B) = O }.
If A E >.(A, B) and x is a nonzero vector that satisfies Ax = >.Bx, then x is a generalized
eigenvector.
A symmetric-definite problem can be transformed to an equivalent symmetric
definite problem with a congruence transformation:
A - >.B is singular ¢:? (XTAX) - >.(XTBX) is singular.
Thus, if X is nonsingular, then >.(A, B) = >.(XTAX, XTBX).

For a symmetric-definite pair {A, B}, it is possible to choose a real nonsingular
X so that XTAX and XTBX are diagonal. This follows from the next result.
Theorem 8.7.1. Suppose A and B are n-by-n symmetric matrices, and define C(µ)
by
C(µ) = µA + (1 - µ)B µ E JR. (8.7.3)
If there exists a µ E [O, 1] such that C(µ) is nonnegative definite and
null(C(µ)) = null(A) n null(B)
then there exists a nonsingular X such that both XTAX and XTBX are diagonal.
Proof. Let µ E [O, 1] be chosen so that C(µ) is nonnegative definite with the property
that null(C(µ)) = null(A) n null(B). Let
T
[D 0 l
Qi C(µ)Qi =
0 0 '
be the Schur decomposition of C(µ) and define Xi= Qi ·diag(n-i/2, In-k)· If
C1 = X[C(µ)Xi ,
then
Since
span{ek+l• . . . , en} = null(C1) = null(A1) n null(Bi)
it follows that A1 and B1 have the following block structure:
[ Au 0 ] k
'
0 0 n-k
[ B
0
u o ] k
O n-k
k n-k
Moreover Ik = µAu + (1 - µ)Bu .
Suppose µ =f 0. It then follows that if zTBuZ
decomposition of Bii and we set
X Xi ·diag(Z, In-k)
then
and
k n-k
diag(bi , . . . , bk) is the Schur
xTAx = .!xT (c(µ) - (1 - µ)B) x = .! ([h
0
]- (1 - µ)DB)= DA.
µ µ () 0

On the other hand, if µ = 0, then let zrA11Z = diag(a1, . . . , ak) be the Schur decom
position of An and set X = X1diag(Z, In-k)· It is easy to verify that in this case as
well, both xrAX and xrBX are diagonal. 0
Frequently, the conditions in Theorem 8.7.1 are satisfied because either A or B is
positive definite.
Corollary 8.7.2. If A - AB E lRnxn is symmetric-definite, then there exists a non
singular
such that
and
Moreover, Axi = AiBxi for i = l:n where Ai = aifbi.
Proof. By setting µ = 0 in Theorem 8.7.1 we see that symmetric-definite pencils can
be simultaneously diagonalized. The rest of the corollary is easily verified. D
Stewart (1979) has worked out a perturbation theory for symmetric pencils A-AB
that satisfy
c(A, B) = min (xTAx)2 + (xTBx)2 > O.
llxll2=l
The scalar c(A, B) is called the Crawford number of the pencil A - AB.
(8.7.4)
Theorem 8.7.3. Suppose A - AB is an n-by-n symmetric-definite pencil with eigen
values
Ai 2: A2 2: · · · 2: An·
Suppose EA and Ea are symmetric n-by-n matrices that satisfy
E2 = II EA II� + II Ea II� < c(A, B).
Then (A + EA) - >..(B + Eu) is symmetric-definite with eigenvalues
that satisfy
larctan(>..i) - arctan(µi)I < arctan(E/c(A, B))
for i = l:n.
Proof. See Stewart (1979). 0

8.7.2 Simultaneous Reduction of A and B
Turning to algorithmic matters, we first present a method for solving the symmetric
definite problem that utilizes both the Cholesky factorization and the symmetric QR
algorithm.
Algorithm 8.7.1 Given A = AT E IRnxn and B = BT E IRnxn with B positive definite,
the following algorithm computes a nonsingular X such that XTAX = diag(a1 , • • • , an)
and XTBX = In.
Compute the Cholesky factorization B = GGT using Algorithm 4.2.2.
Compute C = c-1AQ-T.
Use the symmetric QR algorithm to compute the Schur decomposition
QTCQ = diag(a1 , . . . , an)·
Set x = c-TQ.
This algorithm requires about 14n3 flops. In a practical implementation, A can be
overwritten by the matrix C. See Martin and Wilkinson (1968) for details. Note that
If ai is a computed eigenvalue obtained by Algorithm 8.7.1, then it can be shown that
where
Thus, if B is ill-conditioned, then ai may be severely contaminated with roundoff error
even if ai is a well-conditioned generalized eigenvalue. The problem, of course, is that
in this case, the matrix C = c-1Ac-T can have some very large entries if B, and hence
G, is ill-conditioned. This difficulty can sometimes be overcome by replacing the matrix
G in Algorithm 8.7.1 with Vn-112 where VTBV = D is the Schur decomposition of B.
If the diagonal entries of D are ordered from smallest to largest, then the large entries
in C are concentrated in the upper left-hand corner. The small eigenvalues of C can
then be computed without excessive roundoff error contamination (or so the heuristic
goes). For further discussion, consult Wilkinson (AEP, pp. 337-38).
The condition of the matrix X in Algorithm 8.7.1 can sometimes be improved by
replacing B with a suitable convex combination of A and B. The connection between
the eigenvalues of the modified pencil and those of the original are detailed in the proof
of Theorem 8.7.1.
Other difficulties concerning Algorithm 8.7.1 relate to the fact that c-1Ac-T is
generally full even when A and B are sparse. This is a serious problem, since many
of the symmetric-definite problems arising in practice are large and sparse. Crawford
(1973) has shown how to implement Algorithm 8.7.1 effectively when A and B are
banded. Aside from this case, however, the simultaneous diagonalization approach is
impractical for the large, sparse symmetric-definite problem. Alternate strategies are
discussed in Chapter 10.

8.7.3 Other Methods
Many ofthe symmetric eigenvalue methods presented in earlier sections have symmetric
definite generalizations. For example, the Rayleigh quotient iteration (8.2.6) can be
extended as follows:
xo given with II xo 112 = 1
for k = 0, 1, . . .
µk = xrAxk/xrBxk
Solve (A - µkB)zk+1 = Bxk for Zk+i·
Xk+l = Zk+i/11 Zk+l 112
end
The main idea behind this iteration is that
minimizes
A =
xTAx
xTBx
f(.A) = II Ax - .ABx Ila
(8.7.5)
(8.7.6)
(8.7.7)
where 11 · lln is defined by llzll� = zTB-1z. The mathematical properties of (8.7.5) are
similar to those of (8.2.6). Its applicability depends on whether or not systems of the
form (A - µB)z = x can be readily solved. Likewise, the same comment pertains to
the generalized orthogonal iteration:
Qo E R.nxp given with Q'{;Qo = Ip
for k = 1, 2, . . . (8.7.8)
Solve BZk = AQk-1 for Zk
Zk = QkRk (QR factorization, Qk E R.nxv, Rk E wxv)
end
This is mathematically equivalent to (7.3.6) with A replaced by B-1A. Its practicality
strongly depends on how easy it is to solve linear systems of the form Bz = y.
8.7.4 The Generalized Singular Value Problem
We now turn our attention to the generalized singular value decomposition introduced
in §6.1.6. This decomposition is concerned with the simultaneous diagonalization oftwo
rectangular matrices A and B that are assumed to have the same number of columns.
We restate the decomposition here with a simplification that both A and B have at
least as many rows as columns. This assumption is not necessary, but it serves to
unclutter our presentation of the GSVD algorithm.
Theorem 8.7.4 (Tall Rectangular Version). If A E R.mixn and B E R.m2xn have
at least as many rows as columns, then there exists an orthogonal matrix U1 E R.m1xmi,
an orthogonal matrix U2 E IR"i2xm2, and a nonsingular matrix X E R.nxn such that
U'{AX
U'{BX diag(,81 , . . . ,.Bn)·

Proof. See Theorem 6.1.1. 0
The generalized singular values of the matrix pair {A, B} are defined by
We give names to the columns of X, U1, and U2• The columns of X are the right gen
eralized singular vectors, the columns of U1 are the left-A generalized singular vectors,
and the columns of U2 are the left-B generalized singular vectors. Note that
for k = l:n.
AX(:, k) = akU1(:, k),
BX(:, k) = fAU2(:, k),
There is a connection between the GSVD of the matrix pair {A, B} and the
"symmetric-definite-definite" pencil ATA - >..BTB. Since
it follows that the right generalized singular vectors of {A, B} are the generalized
eigenvectors for ATA - >..BTB and the eigenvalues of the pencil ATA - >..BTB are
squares of the generalized singular values of {A, B}.
All these GSVD facts revert to familiar SVD facts by setting B = In . For example,
if B = In , then we can set X = U2 and U'[AX = DA is the SVD.
We mention that the generalized singular values of {A, B} are the stationary
values of
II Ax 112
<PA,B(x) =
II Bx 112
and the right generalized singular vectors are the associated stationary vectors. The
left-A and left-B generalized singular vectors are stationary vectors associated with the
quotient II y 112/ll x 112 subject to the constraints
See Chu, Funderlic, and Golub (1997).
A GSVD perturbation theory has been developed by Sun (1983, 1998, 2000),
Paige (1984), and Li (1990).
8.7.5 Computing the GSVD Using the CS Decomposition
Our proof of the GSVD in Theorem 6.1.1 is constructive and makes use of the CS
decomposition. In practice, computing the GSVD via the CS decomposition is a viable
strategy.
Algorithm 8.7.2 (GSVD (Tall, Full-Rank Version)) Assume that A E JR.mi xn and
B E 1R.m2 xn, with m1 ;::: n, m2 ;::: n, and null(A) n null(B) = 0. The following algorithm
computes an orthogonal matrix U1 E JR.mi xmi , an orthogonal matrix U2 E JR.m2 xm2• a
nonsingular matrix x E JR.nxn, and diagonal matrices DA E JR.mi xn and Ds E nm1 x "
such that urAX = DA and u[BX = Ds.

Compute the the QR factorization
Compute the CS decomposition
U[Q1V = DA = diag(ai , . . . , a:n),
U[Q2V = DB = diag(,Bi , . . . , .Bn )·
Solve RX = V for X.
The assumption that null(A) n null(B) = 0 is not essential. See Van Loan (1985).
Regardless, the condition of the matrix X is an issue that affects accuracy. However,
we point out that it is possible to compute designated right generalized singular vector
subspaces without having to compute explicitly selected columns of the matrix X =
VR-1. For example, suppose that we wish to compute an orthonormal basis for the
subspace S = span{xi , . . . xk } where Xi = X(:, i). If we compute an orthogonal Z and
upper triangular T so TzT = yTR, then
zT-1 = R-1v = x
and S = span {zi , . . . zk } where Zi = Z(:, i). See P5.2.2 concerning the computation of
Z and T.
8.7.6 Computing the CS Decomposition
At first glance, the computation of the CS decomposition looks easy. After all, it is
just a collection of SVDs. However, there are some complicating numerical issues that
need to be addressed. To build an appreciation for this, we step through the ''thin"
version of the algorithm developed by Van Loan (1985) for the case
x x x x x
x x x x x
x x x x x
x x x x x
[�:]
x x x x x
Q
x x x x x
x x x x x
x x x x x
x x x x x
x x x x x
In exact arithmetic, the goal is to compute 5-by-5 orthogonal matrices U1, U2, and V
so that
U[QiV = C = diag(c1, c2, c3, c4, c5),
Ui'Q2V = S = diag(si , s2, s3, s4, s5).

In floating point, we strive to compute matrices fi2, fi2 and V that arc orthogonal to
working precision and which transform Qi and Q2 into nearly diagonal form:
ll E1 II � u,
ll E2 II � u.
(8.7.9)
(8.7.10)
In what follows, it will be obvious that the computed versions of U1, U2 and V are
orthogonal to working precision, as they will be "put together" from numerically sound
QR factorizations and SVDs. The challenge is to affirm (8.7.9) and (8.7.10).
We start by computing the SYD
Eii = O(u),
8ii = O(u),
Since the columns of this matrix are orthonormal to machine precision, it follows that
j = 2:5.
Note that if lrul = 0(1), then we may conclude that lr1il � u for j = 2:5. This will
be the case if (for example) s1 � 1/./2 for then
With this in mind, let us assume that the singular values s1 , . . . , s5 are ordered from
little to big and that
(8.7.11)

Working with the near-orthonormality of the columns of Q, we conclude that
C1 E12 €13 €14 E15
E21 C2 E23 E24 E25
€31 €32 T33 T34 T35 Eij = O(u),
f.41 E42 €43 T44 T45
f.51 f.52 €53 E54 T55
Q
S1 812 813 814 825
821 s2 823 824 825
831 832 83 834 835
8i1 = O(u).
841 842 843 84 845
851 852 853 854 85
Note that
Since s3 can be close to 1, we cannot guarantee that r34 is sufficiently small. Similar
comments apply to r35 and r45.
To rectify this we compute the SVD of Q(3:5, 3:5), taking care to apply the U
matrix across rows 3 to 5 and the V matrix across columns 3:5. This gives
C1 E12 E13 E14 E15
E21 C2 E23 E24 E25
f.31 f.32 C3 E34 €35 Eij = O(u),
E41 E42 E43 C4 €45
E51 E52 E53 E54 C5
Q
81 812 813 814 825
821 82 823 824 825
831 832 t33 t34 t35
8i1 = O(u).
841 842 t43 t44 t45
851 852 t53 t54 t55
Thus, by diagonalizing the (2,2) block of Q1 we fill the (2,2) block of Q2. However, if
we compute the QR factorization of Q(8:10, 3:5) and apply the orthogonal factor across

rows 8:10, then we obtain
C1 E12 E13 E14 E15
E21 C2 E23 E24 E25
f31 f32 C3 €34 €35 Eij = O(u),
f41 f42 €43 C4 €45
f51 f52 €53 €54 C5
Q =
81 012 013 014 025
021 82 d23 024 025
031 032 t33 t34 t35
Oij = O(u).
041 042 043 t44 t45
051 052 053 054 t55
Using the near-orthonormality of the columns of Q and the fact that c3, c4, and c5 are
all less than 1/./2, we can conclude (for example) that
Using similar arguments we may conclude that both t35 and t45 are O(u). It follows
that the updated Q1 and Q2 are diagonal to within the required tolerance and that
(8.7.9) and (8.7.10) are achieved as a result.
8.7.7 The Kogbetliantz Approach
Paige (1986) developed a method for computing the GSVD based on the Kogbetliantz
Jacobi SVD procedure. At each step a 2-by-2 GSVD problem is solved, a calculation
that we briefly examine. Suppose F and G are 2-by-2 and that G is nonsingular. If
is the SVD of FG-1 , then u(F, G) = {u1 , u2 } and
U'[F = (UiG)E.
This says that the rows of U'[F are parallel to the corresponding rows ofU!G. Thus, if
Z is orthogonal so that U[GZ = G1 is upper triangular, then U[FZ = F1 is also upper
triangular. In the Paige algorithm, these 2-by-2 calculations resonate with the preser
vation of the triangular form that is key to the Kogbetliantz procedure. Moreover, the
A and B input matrices are separately updated and the updates only involve orthog
onal transformations. Although some of the calculations are very delicate, the overall
procedure is tantamount to applying Kogbetliantz implicitly to the matrix AB-1•

8.7.8 Other Generalizations of the SVD
What we have been calling the "generalized singular value decomposition" is sometimes
referred to as the quotient singular value decomposition or QSVD. A key feature of the
decomposition is that it separately transforms the input matrices A and B in such a
way that the generalized singular values and vectors are exposed, sometimes implicitly.
It turns out that there are other ways to generalize the SVD. In the product
singular value decomposition problem we are given A E 1Rmxni and B E 1Rmxn2 and
require the SVD of ATB. The challenge is to compute UT(ATB)V = E without
actually forming ATB as that operation can result in a significant loss of information.
See Drmac (1998, 2000).
The restricted singular value decomposition involves three matrices and is best
motivated from a a variational point of view. If A E 1Rmxn, B E 1Rmxq, and C E 1Rnxp,
then the restricted singular values of the triplet {A, B, C} are the stationary values of
yTAx
1/JA,B,c(X, y) =
II By 11211 Cx 112
See Zha (1991), De Moor and Golub (1991), and Chu, De Lathauwer, and De Moor
(2000). As with the product SVD, the challenge is to compute the required quantities
without forming inverses and products.
All these ideas can be extended to chains of matrices, e.g., the computation of
the SVD of a matrix product A = A1A2 · • · Ak without explicitly forming A. See De
Moor and Zha (1991) and De Moor and Van Dooren (1992).
8.7.9 A Note on the Quadratic Eigenvalue Problem
We build on our §7.7.9 discussion of the polynomial eigenvalue problem and briefly
consider some structured. versions of the quadratic case,
M, C, K E 1Rnxn. (8.7.12)
We recommend the excellent survey by Tisseur and Meerbergen (2001) for more detail.
Note that the eigenvalue in (8.7.12) solves the quadratic equation
and thus
-(x11Cx) ± V(x11Cx)2 - 4(x11Mx)(x11Kx)
.A = ,
2(x11Mx)
assuming that x11Mx # 0. Linearized versions of (8.7.12) include
and
[-K
0
(8.7.13)
(8.7.14)
(8.7.15)
(8.7.16)

where N E Rnxn is nonsingular.
In many applications, the matrices M and C arc symmetric and positive definite
and K is symmetric and positive semidefinite. It follows from (8.7.14) that in this case
the eigenvalues have nonpositive real part. If we set N = K in (8.7.15), then we obtain
the following generalized eigenvalue problem:
[; � l [: l A
[� -� l [: l·
This is not a symmetric-definite problem. However, if the overdamping condition
min (xTCx)2 - 4(xTMx)(xTKx) = 12 > 0
xTx=l
holds, then it can be shown that there is a scalar µ > 0 so that
[µK K
l
A(µ) = K C - µM
is positive definite. It follows from Theorem 8.7.1 that (8.7.16) can be diagonalized by
congruence. See Vescelic (1993).
A quadratic eigenvalue problem that arises in the analysis of gyroscopic systems
has the property that M = MT (positive definite), K = KT, and C = -CT. It is easy
to see from (8.7.14) that the eigenvalues are all purely imaginary. For this problem we
have the structured linearization
[A: -
: l [: l = A
[� � l [: l·
Notice that this is a Hamiltonian/skew-Hamiltonian generalized eigenvalue problem.
In the quadratic palindomic problem, K = MT and c = er and the eigenvalues
come in reciprocal pairs, i.e., if Q(A) is singular then so is Q(l/A). In addition, we
have the linearization
[
MT MT l [y l [-M MT - C
l [y
l
c - M MT z
A
-M -M z
.
Note that if this equation holds, then
(8.7.17)
(8.7.18)
For a systematic treatment of linearizations for structured polynomial eigenvalue prob
lems, see Mackey, Mackey, Mehl, and Mehrmann (2006).
Problems
P8.7.1 Suppose A E Rnxn is symmetric and G E Rnxn is lower triangular and nonsingular. Give an
efficient algorithm for computing C = c-1Ac-T .
P8.7.2 Suppose A E Rnxn is symmetric and B E Rnxn is symmetric positive definite. Give an algo
rithm for computing the eigenvalues of AB that uses the Cholesky factorization and the symmetric

QR algorithm.
p&.7.3 Relate the principal angles and vectors between ran(A) and ran(B) to the eigenvalues and
eigenvectors of the generalized eigenvalue problem
] [ � ].
p&.7.4 Show that if C is real and diagonalizable, then there exist symmetric matrices A and B, B
nonsingular, such that C = AB- 1 • This shows that symmetric pencils A - >.B are essentially general.
p&.7.5 Show how to convert an Ax = >.Bx problem into a generalized singular value problem if A and
B are both symmetric and nonnegative definite.
PB.7.6 Given Y E Rnx n show how to compute Householder matrices H2,...,Hn so that YHn · · · H2
== T is upper triangular. Hint: Hk zeros out the kth row.
PB.7.7 Suppose
where A E R"' xn, B1 E R"' x m, and B2 E Rnx n. Assume that Bt and B2 are positive definite with
Cholesky triangles G1 and G2 respectively. Relate the generalized eigenvalues of this problem to the
singular values of G} 1 AG2T
PB.7.8 Suppose A and B are both symmetric positive definite. Show how to compute >.(A, B) and the
corresponding eigenvectors using the Cholesky factorization and CS decomposition.
PB.7.9 Consider the problem
min
:17 Bx=/32
x
T
Cx=")'2
Assume that B and C are positive definite and that Z E Rnx n is a nonsingular matrix with the property
that zT BZ = diag(>.i , . . . , :An) and zTcz = In. Assume that >.1 � · · · � An. (a) Show that the the
set of feasible x is empty unless >.n � fJ2/"y2 � >.1 . (b) Using Z, show how the two-constraint problem
can be converted to a single-constraint problem of the form
where W = diag(>.1, . . . , >.,.) ->.nI.
min II Ax-bll2
y
T
Wy=/32 - A n "Y2
PB.7.10 Show that (8.7.17) implies (8.7.18).
Just how far one can simplify a symmetric pencil A - >.B via congruence is thoroughly discussed in:
P. Lancaster and L. Rodman (2005). "Canonical Forms for Hermitian Matrix Pairs under Strict
Equivalence and Congruence," SIAM Review 41, 407-443.
The sensitivity of the symmetric-definite eigenvalue problem is covered in Stewart and Sun (MPT,
Chap. 6). See also:
C.R. Crawford (1976). "A Stable Generalized Eigenvalue Problem," SIAM J. Nu.mer. Anal. 13,
854-860.
C.-K. Li and R. Mathias (1998). "Generalized Eigenvalues of a Definite Hermitian Matrix Pair," Lin.
Alg. Applic. 211, 309-321.
S.H. Cheng and N.J. Higham (1999). "The Nearest Definite Pair for the Hermitian Generalized
Eigenvalue Problem," Lin. Alg. Applic. 302 3, 63-76.
C.-K. Li and R. Mathias (2006). "Distances from a Hermitian Pair to Diagonalizable and Nondiago
nalizable Hermitian Pairs," SIAM J. Matrix Anal. Applic. 28, 301-305.
Y. Nakatsukasa (2010). "Perturbed Behavior of a Multiple Eigenvalue in Generalized Hermitian
Eigenvalue Problems," BIT 50, 109-121.

R.-C. Li, Y. Nakatsukasa, N. Truhar, and S. Xu (2011). "Perturbation of Partitioned Hermitian
Definite Generalized Eigenvalue Problems," SIAM J. Matrix Anal. Applic. 32, 642-663.
Although it is possible to diagonalize a symmetric-definite pencil, serious numerical issues arise if the
congruence transformation is ill-conditioned. Various methods for "controlling the damage" have been
proposed including:
R.S. Martin and J.H. Wilkinson (1968). "Reduction of a Symmetric Eigenproblem Ax = >.Bx and
Related Problems to Standard Form," Numer. Math. 11, 99-110.
G. Fix and R. Heiberger (1972). "An Algorithm for the Ill-Conditioned Generalized Eigenvalue Prob
lem," SIAM J. Numer. Anal. 9, 78-88.
A. Bunse-Gerstner (1984). "An Algorithm for the Symmetric Generalized Eigenvalue Problem," Lin.
Alg. Applic. 58, 43-68.
S. Chandrasekaran (2000). "An Efficient and Stable Algorithm for the Symmetric-Definite Generalized
Eigenvalue Problem," SIAM J. Matrix Anal. Applic. 21, 1202-1228.
P.I. Davies, N.J. Higham, and F. Tisseur (2001). "Analysis of the Cholesky Method with Iterative
Refinement for Solving the Symmetric Definite Generalized Eigenproblem," SIAM J. Matrix Anal.
Applic. 23, 472-493.
F. Tisseur (2004). "Tridiagonal-Diagonal Reduction of Symmetric Indefinite Pairs," SIAM J. Matrix
Anal. Applic. 26, 215-232.
Exploiting handedness in A and B can be important, see:
G. Peters and J.H. Wilkinson (1969). "Eigenvalues of Ax = >.Bx with Band Symmetric A and B,"
Comput. J. 12, 398-404.
C.R. Crawford (1973). "Reduction of a Band Symmetric Generalized Eigenvalue Problem," Commun.
ACM 1 6, 41-44.
L. Kaufman (1993). "An Algorithm for the Banded Symmetric Generalized Matrix Eigenvalue Prob
lem," SIAM J. Matrix Anal. Applic. 14, 372-389.
K. Li, T-Y. Li, and Z. Zeng (1994). "An Algorithm for the Generalized Symmetric Tridiagonal
Eigenvalue Problem," Numer. Algorithms 8, 269-291.
The existence of a positive semidefinite linear combination of A and B was central to Theorem 8.7.1.
Interestingly, the practical computation of such a combination has been addressed, see:
C.R. Crawford (1986). "Algorithm 646 PDFIND: A Routine to Find a Positive Definite Linear Com
bination of Two Real Symmetric Matrices," A CM Trans. Math. Softw. 12, 278--282.
C.-H. Guo, N.J. Higham, and F. Tisseur (2009). "An Improved Arc Algorithm for Detecting Definite
Hermitian Pairs," SIAM J. Matrix Anal. Applic. 31, 1131-1151.
As we mentioned, many techniques for the symmetric eigenvalue problem have natural extensions to
the symmetric-definite problem. These include methods based on the Rayleigh quotient idea:
E. Jiang(1990). "An Algorithm for Finding Generalized Eigenpairs of a Symmetric Definite Matrix
Pencil," Lin. Alg. Applic. 132, 65-91.
R-C. Li (1994). "On Eigenvalue Variations of Rayleigh Quotient Matrix Pencils of a Definite Pencil,"
Lin. Alg. Applic. 208/209, 471-483.
There are also generalizations of the Jacobi method:
K. Veselil: (1993). "A Jacobi Eigenreduction Algorithm for Definite Matrix Pairs," Numer. Math. 64.
241-268.
C. Mehl (2004). "Jacobi-like Algorithms for the Indefinite Generalized Hermitian Eigenvalue Prob
lem," SIAM J. Matrix Anal. Applic. 25, 964-985.
Homotopy methods have also found application:
K. Li and T-Y. Li (1993). "A Homotopy Algorithm for a Symmetric Generalized Eigenproblem:·
Numer. Algorithms 4, 167--195.
T. Zhang and K.H. Law, and G.H. Golub (1998). "On the Homotopy Method for Perturbed Symmetric
Generalized Eigenvalue Problems," SIAM J. Sci. Comput. 19, 1625-1645.
We shall have more to say about symmetric-definite problems with general sparsity in Chapter 10. If
the matrices are banded, then it is possible to implement an effective a generalization of simultaneous
iteration, see:

H . Zhang and W.F. Moss (1994). "Using Parallel Banded Linear System Solvers in Generalized
Eigenvalue Problems," Parallel Comput. 20, 1089-1106.
Turning our attention to the GSVD literature, the original references include:
c.F. Van Loan (1976). "Generalizing the Singular Value Decomposition," SIAM J. Numer. Anal.
19, 76-83.
c.C. Paige and M. Saunders (1981). "Towards A Generalized Singular Value Decomposition," SIAM
J. Numer. Anal. 18, 398-405.
The sensitivity of the GSVD is detailed in Stewart and Sun (MPT) as as well in the following papers:
J.-G. Sun (1983). "Perturbation Analysis for the Generalized Singular Value Problem," SIAM J.
Numer. Anal. 20, 611-625.
C.C. Paige (1984). "A Note on a Result of Sun J.-Guang: Sensitivity of the CS and GSV Decompo
sitions," SIAM J. Numer. Anal. 21, 186-191.
R-C. Li (1993). "Bounds on Perturbations of Generalized Singular Values and of Associated Sub
spaces," SIAM J. Matrix Anal. Applic. 14, 195-234.
J.-G. Sun (1998). "Perturbation Analysis of Generalized Singular Subspaces," Numer. Math. 79,
615-641.
J.-G. Sun (2000). "Condition Number and Backward Error for the Generalized Singular Value De
composition," SIAM J. Matrix Anal. Applic. 22, 323-341.
X.S. Chen and W. Li (2008). "A Note on Backward Error Analysis of the Generalized Singular Value
Decomposition," SIAM J. Matrix Anal. Applic. 90, 1358-1370.
The variational characterization of the GSVD is analyzed in:
M.T. Chu, R.F Funderlic, and G.H. Golub (1997). "On a Variational Formulation of the Generalized
Singular Value Decomposition," SIAM J. Matrix Anal. Applic. 18, 1082-1092.
Connections between GSVD and the pencil A - >.B arc discussed in:
B. KAgstrom (1985). "The Generalized Singular Value Decomposition and the General A - >.B Prob
lem," BIT 24, 568-583.
Stable methods for computing the CS and generalized singular value decompositions are described in:
G.W. Stewart (1982). "Computing the C-S Decomposition of a Partitioned Orthonormal Matrix,"
Numer. Math. 40, 297-306.
G.W. Stewart (1983). "A Method for Computing the Generalized Singular Value Decomposition," in
Matrix Pencils , B. Kagstrom and A. Ruhe (eds.), Springer-Verlag, New York, 207-220.
C.F. Van Loan (1985). "Computing the CS and Generalized Singular Value Decomposition," Numer.
Math. 46, 479-492.
B.D. Sutton (2012). "Stable Computation ofthe CS Decomposition: Simultaneous Bidiagonalization,"
SIAM. J. Matrix Anal. Applic. 33, 1-21.
The idea of using the Kogbetliantz procedure for the GSVD problem is developed in:
C.C. Paige (1986). "Computing the Generalized Singular Value Decomposition," SIAM J. Sci. Stat.
Comput. 7, 1126--1146.
Z. Bai and H. Zha (1993). "A New Preprocessing Algorithm for the Computation of the Generalized
Singular Value Decomposition," SIAM J. Sci. Comp. 14, 1007-1012.
Z. Bai and J.W. Demmel (1993). "Computing the Generalized Singular Value Decomposition," SIAM
J. Sci. Comput. 14, 1464-1486.
Other methods for computing the GSVD include:
Z. Drmae (1998). "A Tangent Algorithm for Computing the Generalized Singular Value Decomposi
tion," SIAM J. Numer. Anal. 35, 1804-1832.
Z. Drmae and E.R. Jessup (2001). "On Accurate Quotient Singular Value Computation in Floating
Point Arithmetic," SIAM J. Matrix Anal. Applic. 22, 853-873.
S. Friedland (2005). "A New Approach to Generalized Singular Value Decomposition," SIAM J.
Stable methods for computing the product and restricted SVDs are discussed in the following papers:

M.T. Heath, A.J. Laub, C.C. Paige, and R.C. Ward (1986). "Computing the Singular Value Decom
position of a Product of Two Matrices," SIAM J. Sci. Stat. Comput. 7, 1147-1159.
K.V. Fernando and S. Hammarling (1988). "A Product-Induced Singular Value Decomposition for
Two Matrices and Balanced Realization," in Linear Algebra in Systems and Control, B.N. Datta
et al (eds), SIAM Publications, Philadelphia, PA.
B. De Moor and H. Zha (1991). "A Tree of Generalizations of the Ordinary Singular Value Decom
position," Lin. Alg. Applic. 147, 469-500.
H. Zha (1991). "The Restricted Singular Value Decomposition of Matrix Triplets," SIAM J. Matri:x
Anal. Applic. 12, 172-194.
B. De Moor and G.H. Golub (1991). "The Restricted Singular Value Decomposition: Properties and
Applications," SIAM J. Matri:x Anal. Applic. 12, 401-425.
B. De Moor and P. Van Dooren (1992). "Generalizing the Singular Value and QR Decompositions,"
H. Zha (1992). "A Numerical Algorithm for Computing the Restricted Singular Value Decomposition
of Matrix Triplets," Lin. Alg. Applic. 168, 1-25.
G.E. Adams, A.W. Bojanczyk, and F.T. Luk (1994). "Computing the PSVD of Two 2x2 Triangular
Z. Drma.C (1998). "Accurate Computation ofthe Product-Induced Singular Value Decomposition with
Applications," SIAM J. Numer. Anal. 35, 1969- 1994.
D. Chu, L. De Lathauwer, and 13. De Moor (2000). "On the Computation of the Restricted Singular
Value Decomposition via the Cosine-Sine Decomposition," SIAM J. Matrix Anal. Applic. 22,
580-601.
D. Chu and B.De Moor (2000). "On a variational formulation of the QSVD and the RSVD," Lin.
Alg. Applic. 311, 61-78.
For coverage of structured quadratic eigenvalue problems, see:
P. Lancaster (1991). "Quadratic Eigenvalue Problems," Lin. Alg. Applic. 150, 499-506.
F. Tisseur and N.J. Higham (2001). "Structured Pseudospectra for Polynomial Eigenvalue Problems,
with Applications,'' SIAM J. Matrix Anal. Applic. 23, 187-208.
F. Tisseur and K. Meerbergen (2001). "The Quadratic Eigenvalue Problem,'' SIAM Review 43, 235-
286.
V. Mehrmann and D. Watkins (2002). "Polynomial Eigenvalue Problems with Hamiltonian Structure,"
Electr. TI-ans. Numer. Anal. 13, 106-118.
U.B. Holz, G.H. Golub, and K.H. Law (2004). "A Subspace Approximation Method for the Quadratic
Eigenvalue Problem,'' SIAM J. Matrix Anal. Applic. 26, 498-521.
D.S. Mackey, N. Mackey, C. Mehl, and V. Mehrmann (2006). "Structured Polynomial Eigenvalue
Problems: Good Vibrations from Good Linearizations,'' SIAM. J. Matrix Anal. Applic. 28, 1029-
1051.
B. Plestenjak (2006). "Numerical Methods for the Tridiagonal Hyperbolic Quadratic Eigenvalue Prob
lem,'' SIAM J. Matrix Anal. Applic. 28, 1157-1172.
E.K.-W. Chu, T.-M. Hwang, W.-W. Lin, and C.-T. Wu (2008). "Vibration ofFast Trains, Palindromic
Eigenvalue Problems, and Structure-Preserving Doubling Algorithms,'' J. Comp. Appl. Math. 219,
237--252.

Chapter 9
Functions of Matrices
9.1 Eigenvalue Methods
9.2 Approximation Methods
9.3 The Matrix Exponential
9.4 The Sign, Square Root, and Log of a Matrix
Computing a function f(A) of an n-by-n matrix A is a common problem in many
application areas. Roughly speaking, if the scalar function f(z) is defined on .X(A), then
f(A) is defined by substituting "A" for "z" in the "formula" for f(z). For example, if
f(z) = {1 + z)/{1 - z) and 1 fj. .X(A), then f(A) = (I + A)(I - A)-1 .
The computations get particularly interesting when the function f is transcen
dental. One approach in this more complicated situation is to compute an eigenvalue
decomposition A = YBY-1 and use the formula f(A) = Yf(B)Y-1. If B is suffi
ciently simple, then it is often possible to calculate f(B) directly. This is illustrated in
§9.1 for the Jordan and Schur decompositions.
Another class of methods involves the approximation of the desired function f(A)
with an easy-to-calculate function g(A). For example, g might be a truncated Taylor
series approximation to f. Error bounds associated with the approximation of matrix
functions are given in §9.2.
In §9.3 we discuss the special and very important problem of computing the
matrix exponential eA. The matrix sign, square root, and logarithm functions and
connections to the polar decomposition are treated in §9.4.
Reading Notes
Knowledge of Chapters 3 and 7 is assumed. Within this chapter there are the
following dependencies:
§9.1 -+ §9.2 -+ §9.3
-!.
§9.4
513

514 Chapter 9. Functions of Matrices
Complementary references include Horn and Johnson (TMA) and the definitive text
by Higham (FOM). We mention that aspects of the f(A)-times-a-vector problem are
treated in §10.2.
9.1 Eigenvalue Methods
Here are some examples of matrix functions:
p(A) = I + A,
r(A) = (1-
�)-1 (1+ �),
oo
Ak
A - "'
e - L..,, kf ·
k=O
2 ¢ A(A),
Obviously, these are the matrix versions of the scalar-valued functions
p(z) = 1 + z,
r(z) = (1 - (z/2))-1(1 + (z/2)),
00 k
z "' z
e = L..,, kl ·
k=O
2 "I- z,
Given an n-by-n matrix A, it appears that all we have to do to define f(A) is to substi
tute A into the formula for f. However, to make subsequent algorithmic developments
precise, we need to be a little more formal. It turns out that there are several equiv
alent ways to define a function of a matrix. See Higham (FOM, §1.2). Because of its
prominence in the literature and its simplicity, we take as our "base" definition one
that involves the Jordan canonical form (JCF).
9.1.1 A Jordan-Based Definition
Suppose A E ccnxn and let
be its JCF with
Ai
0
Ji
0
A
1
Ai 1
0
(9.1.1)
0
E ccn; Xn;
, i = l:q. (9.1.2)
1
Ai

9.1. Eigenvalue Methods
The matrix function f(A) is defined by
f(A) = X·diag(F1, • . . , Fq) ·X-1
where
f(>.i) J(ll(>.i)
0 f(>.i)
0
J(n;-1)(>.i)
(ni - 1)!
assuming that all the required derivative evaluations exist.
9.1.2 The Taylor Series Representation
515
(9.1.3)
i = l:q, (9.1.4)
Iffcan be represented by a Taylor series on A's spectrum, then f(A) can be represented
by the same Taylor series in A. To fix ideas, assume that f is analytic in a neighborhood
of z0 E <C and that for some r > 0 we have
oo J(k)(zo) k
f(z) = L k! (z - zo) ,
k=O
Our first result applies to a single Jordan block.
lz - zol < r. (9.1.5)
Lemma 9.1.1. Suppose B E <Cmxm is a Jordan block and write B = >.Im + E where
E is its strictly upper bidiagonal part. Given (9.1.5), if I>. - zol < r, then
00
J(k)( )
f(B) = L kt
o
(B - zolm)k.
k=O
Proof. Note that powers of E are highly structured, e.g.,
[� � � �l E2
E =
0 0 0 1 '
0 0 0 0
In terms of the Kronecker delta, if 0 :::; p :::; m - 1, then [EP]ij = (c5i,j-p)· It follows
from (9.1.4) that
f(B) (9.1.6)

On the other hand, if p > m, then EP = 0. Thus, for any k 2: 0 we have
If N is a nonnegative integer, then
N
f(k)(zo) - k -
min{k,m-1} dP
(N
f(k)(zo) - k)EP
L k! (B zoI) - L d)..P L k! (>. zo) p! .
k=O p=O k=O
The lemma follows by taking limits with respect to N and using both (9.1.6) and the
Taylor series representation of f(z). D
A similar result holds for general matrices.
Theorem 9.1.2. Iff has the Taylor series representation {9. 1.5) and I>. - zol < r for
all >. E >.(A) where A E <Cnxn, then
00
f(k)( )
f(A) = L k!
zo (A - zoI)k.
k=O
Proof. Let the JCF of A be given by (9.1.1) and (9.1.2). From Lemma 9.1.1 we have
00
f(Ji) = L ak(Ji - zoI)k,
k=O
f(k)(zo)
k!
for i = l:q. Using the definition (9.1.3) and (9.1.4) we see that
f(A) = X · diag (fak(J1 - zoin,)k, . . . ,fak(Jq - zoin")k)-x-1
k=O k=O
= X· (fak(J - zoin)k).x-1
k=O
00
= L ak (X(J - z0In)x-1)
k
k=O
00
L ak(A - zoin)k,
k=O
Important matrix functions that have simple Taylor series definitions include

oo
Ak
exp(A) = L kl '
k=O
oo
Ak
log(J - A) = L T
•
k=l
sin(A)
oo
A2k
cos(A) = I)-l)
k
(2k)! '
k=O
517
IAI < 1, A E A(A),
For clarity in this section and the next, we consider only matrix functions that have a
Taylor series representation. In that case it is easy to verify that
A · f(A) = f(A) · A (9.1.7)
and
f(x-1AX) = x . f(A) . x-1. (9.1.8)
9.1.3 An Eigenvector Approach
If A E <Cnxn is diagonalizable, then it is particularly easy to specify f(A) in terms of
A's eigenvalues and eigenvectors.
Corollary 9.1.3. If A E <Cnxn, A = X · diag(A1 , . . . , An) ·x-1 , and f(A) is defined,
then
(9.1.9)
Proof. This result is an easy consequence of Theorem 9.1.2 since all the Jordan blocks
are l-by-1. D
Unfortunately, if the matrix of eigenvectors is ill-conditioned, then computing f(A) via
(9.1.8) is likely introduce errors of order u 11:2(X) because of the required solution of a
linear system that involves the eigenvector matrix X. For example, if
A =
[1 + 10-s 1 l
0 1 - 10-5 '
then any matrix of eigenvectors is a column-scaled version of
[1 -1
l
X =
0 2(1 - 10-5)

and has a 2-norm condition number of order 105. Using a computer with machine
precision u :::::: 10-7, we find
=
[2.718307 2.750000 l
fl (x-idiag(exp{l + 10-s), exp{l - 10-s))x)
0.000000 2.718254
while
eA =
[2.718309 2.718282 l·
0.000000 2.718255
The example suggests that ill-conditioned similarity transformations should be avoided
when computing a function of a matrix. On the other hand, if A is a normal matrix,
then it has a perfectly conditioned matrix of eigenvectors. In this situation, computa
tion of f(A) via diagonalization is a recommended strategy.
9.1.4 A Schur Decomposition Approach
Some of the difficulties associated with the Jordan approach to the matrix function
problem can be circumvented by relying upon the Schur decomposition. If A = QTQH
is the Schur decomposition of A, then by {9.1.8),
f(A) = Qf(T)QH.
For this to be effective, we need an algorithm for computing functions of upper trian
gular matrices. Unfortunately, an explicit expression for f(T) is very complicated.
Theorem 9.1.4. Let T = (tij) be an n-by-n upper triangular matrix with Ai = tii and
assume f(T) is defined. If f(T) = (fij), then fij = 0 if i > j, fo = f(Ai) for i = j,
and for all i < j we have
fo = (9.1.10)
(so , ...,sk )E S;;
where Sij is the set of all strictly increasing sequences of integers that start at i and
end at j, and f [As0 , • • • , Ask ] is the kth order divided difference off at {As0 , • • • , Ask }.
Proof. See Descloux {1963), Davis {1973), or Van Loan {1975). D
To illustrate the theorem, if
then
f(A1 )
f(T) 0
0
[�I ti2 •1• ]
T = � A2 t23
0 Aa
t12 . f(A2) - f(Ai)
A2 - A1
f(A2)
0
F13
t23 . f(Aa) - f(A2)
Aa - A2
f(Aa)

where
J(>..3) - J(>..2) - J(>..2) - J(>..1)
!(>.. ) f(>.. ) >..3 - >..2 >..2 - >..1
F13 = l}3· 3 - 1 + ti2t23· -----------
A3 - >..1 >..3 - >..1
519
The recipes for the upper triangular entries get increasing complicated as we move away
from the diagonal. Indeed, if we explicitly use (9.1.10) to evaluate f(T), then 0(2n)
flops are required. However, Parlett (1974) has derived an elegant recursive method for
determining the strictly upper triangular portion of the matrix F = f(T). It requires
only 2n3/3 flops and can be derived from the commutivity equation FT = TF. Indeed,
by comparing (i, j) entries in this equation, we find
j
:L 1iktkj
k=i
and thus, if tii and t11 are distinct,
j
:Ltikfkj1
k=i
j > i,
j-1
f· · - f·· + :L t.k 1k1· - f·ktk3·
.
f. . - t. . JJ ii • J ' •
•J - •J t t t
3·1· - t.. . . - . .
• •
k=i+l JJ ii
(9.1.11)
From this we conclude that fij is a linear combination of its neighbors in the matrix
F that are to its left and below. For example, the entry f25 depends upon /22, f23,
f24, /55, /45, and f35. Because of this, the entire upper triangular portion of F can
be computed superdiagonal by superdiagonal beginning with diag(f(tu), . . . , f(tnn)).
The complete procedure is as follows:
Algorithm 9.1.1 (Schur-Parlett) This algorithm computes the matrix function F =
f(T) where T is upper triangular with distinct eigenvalues and f is defined on >..(T).
for i = l:n
fii = f(tii)
end
for p = l:n - 1
end
for i = l:n - p
j = i + p
s = tij(fjj - Iii)
for k = i + l:j - 1
s = s + tik/kj - fiktkj
end
fij = s/(t11 - tii)
end
This algorithm requires 2n3/3 flops. Assuming that A = QTQH is the Schur decompo
sition of A, f(A) = QFQH where F = f(T). Clearly, most of the work in computing
f(A) by this approach is in the computation of the Schur decomposition, unless f is
extremely expensive to evaluate.

9.1.5 A Block Schur-Parlett Approach
If A has multiple or nearly multiple eigenvalues, then the divided differences associated
with Algorithm 9.1.1 become problematic and it is advisable to use a block version of
the method. We outline such a procedure due to Parlett (1974). The first step is to
choose Q in the Schur decomposition so that we have a partitioning
T = [T
�,
�� ::: �::l
0 0 Tpp
where .X(Tii) n .X(Tjj) = 0 and each diagonal block is associated with an eigenvalue
cluster. The methods of §7.6 are applicable for this stage of the calculation.
Partition F = f(T) conformably
and notice that
[�
1
:�� : :: :�: l
F - . . . . '
. . . .
. . . .
0 0 Fpp
Fii = f(Tii), i = l:p.
Since the eigenvalues of Tii are clustered, these calculations require special methods.
Some possibilities are discussed in the next section.
Once the diagonal blocks of F are known, the blocks in the strict upper triangle
of F can be found recursively, as in the scalar case. To derive the governing equations,
we equate (i, j) blocks in FT = TF for i < j and obtain the following generalization
of (9.1.11):
j-1
FijTjj - TiiFij = TijFjj - FiiTij + L (TikFkj - FikTkj)·
k=i+1
(9.1.12)
This is a Sylvester system whose unknowns are the elements of the block Fij and whose
right-hand side is "known" if we compute the Fij one block superdiagonal at a time.
We can solve (9.1.12) using the Bartels-Stewart algorithm (Algorithm 7.6.2). For more
details see Higham (FOM, Chap. 9).
9.1.6 Sensitivity of Matrix Functions
Does the Schur-Parlett algorithm avoid the pitfalls associated with the diagonalization
approach when the matrix of eigenvectors is ill-conditioned? The proper comparison
of the two solution frameworks requires an appreciation for the notion of condition as
applied to the f(A) problem. Toward that end we define the relative condition of f at
matrix A E <Cnxn is given as
condrel(!, A) lim sup
E--tO llEll :$ E llAll
II f(A + E) -
f(A) II
€11 J(A) II

9.1. Eigenvalue Methods 521
This quantity is essentially a normalized Frechet derivative of the mapping A -t f(A)
and various heuristic methods have been developed for estimating its value.
It turns out that the careful implementation of the block Schur-Parlett algorithm
is usually forward stable in the sense that
II P - J(A) II
II f(A) II
� u·condre1(f, A)
where P is the computed version of f(A). The same cannot be said of the diagonal
ization framework when the matrix of eigenvectors is ill-conditioned. For more details,
see Higham (FOM, Chap. 3).
Problems
P9.1.1 Suppose
Use the power series definitions to develop closed form expressions for exp(A), sin(A), and cos(A).
P9.1.2 Rewrite Algorithm 9.1.1 so that f(T) is computed column by column.
P9.1.3 Suppose A = Xdiag(.>.;)x-1 where x = [ x1 I · . · I Xn J and x-1 = [Yl I · . · I Yn J H. Show
that if f(A) is defined, then
n
f(A) Lf(>-;)x;yf.
k=l
P9.1.4 Show that
T [ T
�1 T12
]: => J(T) [ F�1
T22
p q
where Fu = f(Tn) and F22 = f(T22). Assume f(T) is defined.
p
F12
J:
F22
q
As we discussed, other definitions of f(A) are possible. However, for the matrix functions typically
encountered in practice, all these definitions are equivalent, see:
R.F. Rinehart {1955). "The Equivalence of Definitions of a Matric Function,'' Amer. Math. Monthly
62, 395-414.
The following papers are concerned with the Schur decomposition and its relationship to the J(A)
problem:
C. Davis {1973). "Explicit Functional Calculus," Lin. Alg. Applic. 6, 193-199.
J. Descloux (1963). "Bounds for the Spectral Norm of Functions of Matrices," Numer. Math. 5,
185 -190.
C.F. Van Loan (1975). "A Study of the Matrix Exponential,'' Numerical Analysis Report No. 10,
Department of Mathematics, University of Manchester, England. Available as Report 2006.397
from http://eprints.ma.man.ac.uk/.
Algorithm 9.1.1 and the various computational difficulties that arise when it is applied to a matrix
having close or repeated eigenvalues are discuss
B.N. Parlett (1976). "A Recurrence among the Elements of Functions of Triangular Matrices," Lin.
Alg. Applic. 14, 117-121.
P.I. Davies and N.J. Higham (2003). "A Schur-Parlett Algorithm for Computing Matrix Functions,''
SIAM .J. Matrix Anal. Applic. 25, 464-485.

A compromise between the Jordan and Schur approaches to the J(A) problem results if A is reduced
to block diagonal form as described in §7.6.3, see:
B. Kli.gstrom (1977). "Numerical Computation of Matrix Functions," Department of Information
Processing Report UMINF-58.77, University of Umeii., Sweden.
E.B. Davies (2007). "Approximate Diagonalization," SIAM J. Matrix Anal. Applic. 29, 1051-1064.
The sensitivity of matrix functions to perturbation is discussed in:
C.S. Kenney and A.J. Laub (1989). "Condition Estimates for Matrix Functions," SIAM J. Matrix
Anal. Applic. 10, 191-209.
C.S. Kenney and A.J. Laub (1994). "Small-Sample Statistical Condition Estimates for General Matrix
Functions," SIAM J. Sci. Comput. 15, 36-61.
R. Mathias (1995). "Condition Estimation for Matrix Functions via the Schur Decomposition," SIAM
9.2 Approximation Methods
We now consider a class of methods for computing matrix functions which at first
glance do not appear to involve eigenvalues. These techniques are based on the idea
that, if g(z) approximates f(z) on A(A), then f(A) approximates g(A), e.g.,
A2 Aq
eA � I + A + -
21 + · · · + -
1 •
. q.
We begin by bounding II f(A) - g(A) II using the Jordan and Schur matrix function
representations. We follow this discussion with some comments on the evaluation of
matrix polynomials.
9.2.1 A Jordan Analysis
The Jordan representation of matrix functions (Theorem 9.1.2) can be used to bound
the error in an approximant g(A) of f(A).
Theorem 9.2.1. Assume that
A X · diag(Ji . . . . , Jq) . x-1
is the JCF ofA E <Cnxn with
0
0
1
Ai 1
0
ni-by-ni,
1
Ai
for i = l:q. If f(z) and g(z) are analytic on an open set containing A(A), then
II f(A) - g(A) 112 ::; K2(X)

9.2. Approximation Methods 523
Proof. Defining h(z) = f(z) - g(z) we have
II f(A) - g(A) 112 = II Xdiag(h(J1), . . . , h(Jq))X-1 112 :S "'2(X) max II h(Ji) 112·
l�i�q
Using Theorem 9.1.2 and equation (2.3.8) we conclude that
thereby proving the theorem. D
9.2.2 A Schur Analysis
If we use the Schur decomposition A = QTQH instead of the Jordan decomposition,
then the norm of T's strictly upper triangular portion is involved in the discrepancy
between f(A) and g(A).
Theorem 9.2.2. Let QHAQ = T = diag(.Xi) + N be the Schur decomposition of
A E {!nxn, with N being the strictly upper triangular portion of T. If f(z) and g(z)
are analytic on a closed convex set n whose interior contains .X(A), then
where
n-l II INl
r
ll F
II f(A) - g(A) llF :S L Or I
r=O
r.
sup
zEO
Proof. Let h(z) = f(z) - g(z) and set H = (hij) = h(A). Let st» denote the set
of strictly increasing integer sequences (s0 , • • • , Sr) with the property that s0 = i and
Sr = j. Notice that
j-i
Sij = LJ st·)
r=l
and so from Theorem 9.1.3, we obtain the following for all i < j:
j-1
hij = L L nso,s1 ns1 ,s2 . . . ns,._1 ,srh [.Xso, . . . ' As,.] .
r=l sES�;>
Now since n is convex and h analytic, we have
lh [.Xso ' · · · ' As,.]I :S sup
zEO
(9.2.1)

Furthermore if INlr= (n�;)) for r � 1, then it can be shown that
j < i + r,
j � i + r. (9.2.2)
The theorem now follows by taking absolute values in the expression for hii and then
using (9.2.1) and (9.2.2). D
There can be a pronounced discrepancy between the Jordan and Schur error bounds.
For example, if
[-.01 1 1 l
A = 0 0 1 .
0 0 .01
If f(z) = ez and g(z) = 1 + z + z2/2, then II f(A) - g(A) II ::::::: 10-5 in either the
Frobenius norm or the 2-norm. Since 1t2(X) ::::::: 107, the error predicted by Theorem
9.2.1 is 0(1), rather pessimistic. On the other hand, the error predicted by the Schur
decomposition approach is 0(10-2).
Theorems 9.2.1 and 9.2.2 remind us that approximating a function ofa nonnormal
matrix is more complicated than approximating a function of a scalar. In particular, we
see that if the eigensystem of A is ill-conditioned and/or A's departure from normality
is large, then the discrepancy between f(A) and g(A) may be considerably larger than
the maximum of lf(z) - g(z) I on A(A). Thus, even though approximation methods
avoid eigenvalue computations, they evidently appear to be influenced by the structure
of A's eigensystem. It is a perfect venue for pseudospectral analysis.
9.2.3 Taylor Approximants
A common way to approximate a matrix function such as eA is by truncating its Taylor
series. The following theorem bounds the errors that arise when matrix functions such
as these are approximated via truncated Taylor series.
Theorem 9.2.3. If f(z) has the Taylor series
00
f(z) = :�:::c�kZk
k=O
on an open disk containing the eigenvalues ofA E <Cn xn,
then
Proof. Define the matrix E(s) by
q
max II Aq+lf(q+1l(As) 112 .
O�s�l
f(As) = L ak(As)k + E(s), O � s � l.
k=O
(9.2.3)

If fi;(s) is the (i,j) entry of /(As), then it is necessarily analytic and so
(9.2.4)
where Eij satisfies 0 :::; Eij :::; s :::; 1.
By comparing powers of s in (9.2.3) and (9.2.4) we conclude that eij(s), the (i,j)
entry of E(s), has the form
f(q+I)( )
. ·
( ) _ ij Eij q+l
e,3 s -
(q + l)!
s
Now /i�q-l)(s) is the (i,j) entry of Aq+lf(q+l)(As) and therefore
max
0:5s9
The theorem now follows by applying (2.3.8). 0
II Aq+If(q+l)(As) 112
(q + 1)!
We mention that the factor of n in the upper bound can be removed with more careful
analysis. See Mathias (1993).
In practice, it does not follow that greater accuracy results by taking a longer
Taylor approximation. For example, if
then it can be shown that
A = [ -49 24 l
-64 31 '
eA =
[-0.735759 .0551819 l·
-1.471518 1.103638
For q = 59, Theorem 9.2.3 predicts that
However, if u � 10-1, then we find
fl(�Ak) =
[-22.25880 -1.4322766 l·
� k! -61.49931 -3.474280
The problem is that some of the partial sums have large elements. For example, the
matrix I + A + · · · + A17/17! has entries of order 107• Since the machine precision is
approximately 10-7, rounding errors larger than the norm ofthe solution a.re sustained.

The example highlights the a well known shortcoming of truncated Taylor series
approximation-it tends to be effcetive only near the origin. The problem can sometimes
be circumvented through a change of scale. For example, by repeatedly using the double
angle formulae
cos(2A) = 2 cos(A)2 - I, sin(2A) = 2 sin(A) cos(A),
the cosine and sine of a matrix can be built up from Taylor approximations to cos(A/2k)
and sin(A/2k):
So = Taylor approximate to sin(A/2k)
Co = Taylor approximate to cos(A/2k)
for j = l:k
Si = 2Sj-1Cj-1
Ci = 2CJ_1 - I
end
Here k is a positive integer chosen so that, say, II A 1100 � 2k. See Serbin and Blalock
(1979), Higham and Smith (2003), and Hargreaves and Higham (2005).
9.2.4 Evaluating Matrix Polynomials
Since the approximation of transcendental matrix functions usually involves the eval
uation of polynomials, it is worthwhile to look at the details of computing
where the scalars bo, . . . , bq E R are given. The most obvious approach is to invoke
Horner's scheme:
Algorithm 9.2.1 Given a matrix A and b(O:q), the following algorithm computes the
polynomial F = bqAq + · · · + biA + bol.
F = bqA + bq-11
for k = q - 2: - 1:0
F = AF + bkl
end
This requires q - 1 matrix multiplications. However, unlike the scalar case, this sum
mation process is not optimal. To see why, suppose q = 9 and observe that
p(A) = A3(A3(bgA3 + (bsA2 + b1A + b6I)) + (bsA2 + b4A + b3I)) + b2A2 + biA + bol.
Thus, F = p(A) can be evaluated with only four matrix multiplications:
A2 = A2,
A3 = AA2,
Fi = bgA3 + bsA2 + b1A + b6I,
F2 = A3F1 + bsA2 + b4A + b3I,
F = A3F2 + �A2 + biA + bol.

In general, if s is any integer that satisfies 1 :::; s :::; J'Q, then
r
p(A) = L Bk · (A8)k, r = floor(q/s), (9.2.5)
k=O
where
k = O:r - 1,
9.2.5 Computing Powers of a Matrix
The problem of raising a matrix to a given power deserves special mention. Suppose it
is required to compute A13. Noting that A4 = (A2)2, A8 = (A4)2, and A13 = ABA4A,
we see that this can be accomplished with just five matrix multiplications. In general
we have
Algorithm 9.2.2 (Binary Powering) The following algorithm computes F = A8 where
s is a positive integer and A E nrxn.
t
Let s = Lf3k2k be the binary expansion of s with f3t =f. 0
k=O
Z = A; q = 0
while /3q = 0
z = z2; q = q + 1
end
F = Z
for k = q + l:t
Z = Z2
end
if /3k =I- 0
F = FZ
end
This algorithm requires at most 2 floor[log2(s)] matrix multiplications. If s is a power
of 2, then only log2(s) matrix multiplications are needed.
9.2.6 Integrating Matrix Functions
We conclude this section with some remarks about the integration of a parameterized
matrix function. Suppose A E IRnxn and that J(At) is defined for all t E [a, b]. We can

approximate
F = 1b
f(At)dt [F ]ii = 1b
[ f(At) ]ii dt
by applying any suitable quadrature rule. For example, with Simpson's rule, we have
h m
F � F = 3 L Wkf(A(a + kh))
k=O
where m is even, h = (b - a)/m, and
k = O, m,
k odd,
k even, k =/:- 0, m.
(9.2.6)
If (d4/dz4)f(zt) = J<4>(zt) is continuous for t E [a, b] and if J<4>(At) is defined on this
same interval, then it can be shown that F = F + E where
{9.2.7)
Let fij and eij denote the {i, j) entries of F and E, respectively. Under the above
assumptions we can apply the standard error bounds for Simpson's rule and obtain
h4(b a)
le· ·I <
-
max le'!'f<4>(At)e ·I
&J -
180 ' J •
a::=;t::=;b
The inequality (9.2.7) now follows since II E 112 ::::; n max leiiI and
max lefJ<4>(At)eil ::::; max 11 f<4>(At) 112.
a::=;t::=;b a::=;t::=;b
Of course, in a practical application of {9.2.6), the function evaluations f(A(a + kh))
normally have to be approximated. Thus, the overall error involves the error in ap
proximating f(A(a + kh) as well as the Simpson rule error.
9.2.7 A Note on the Cauchy Integral Formulation
Yet another way to define a function of a matrix C E <Cnxn is through the Cauchy
integral theorem. Suppose f(z) is analytic inside and on a closed contour r which
encloses A(A). We can define f(A) to be the matrix
f(A) = -
2
1
. J f(z)(zl - A)-1dz.
7ri lr
The integral is defined on an element-by-element basis:
(9.2.8)

Notice that the entries of (zl-A)-1 are analytic on r and that f(A) is defined whenever
f(z) is analytic in a neighborhood of A(A). Using quadrature and other tools, Hale,
Higham, and Trefethen (2007) have shown how this characterization can be used in
practice to compute certain types of matrix functions.
Problems
P9.2.1 Verify (9.2.2).
P9.2.2 Show that if II A 112 < 1, then log(l + A) exists and satisfies the bound
II log(I + A) 112 � II A 112/(l - II A 112).
P9.2.3 Using Theorem 9.2.3, bound the error in the following approximations:
q
A2k+1 q
A2k
sin(A) :::::: L(-l)k
( )
' ' cos(A) :::::: L(-l)k-
( )'"
2k + 1 . 2k .
k=O k=O
P9.2.4 Suppose A E R"xn is nonsingular and Xo E nnxn is given. The iteration defined by
Xk+1 = Xk(21-AXk)
is the matrix analogue of Newton's method applied to the function f(x) = a -(1/x). Use the SYD to
analyze this iteration. Do the iterates converge to A-1

2013 Matrix Computations 4th.pdf

More Related Content

What's hot (7)

Similar to 2013 Matrix Computations 4th.pdf (20)

Recently uploaded (20)

2013 Matrix Computations 4th.pdf